Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation (Abstract Reprint)

Xinyan Wang; Jinshuo Liu; Juan Deng; Meng Wang; Qian Deng; Youcheng Yan; Lina Wang; Yunsong Ma; Jeff Z. Pan

doi:10.1609/aaai.v40i47.41418

Authors

Xinyan Wang School of Cyber Science and Engineering, Wuhan University
Jinshuo Liu School of Cyber Science and Engineering, Wuhan University
Juan Deng School of Cyber Science and Engineering, Wuhan University
Meng Wang School of Cyber Science and Engineering, Wuhan University
Qian Deng School of Cyber Science and Engineering, Wuhan University
Youcheng Yan School of Cyber Science and Engineering, Wuhan University
Lina Wang School of Cyber Science and Engineering, Wuhan University
Yunsong Ma School of Computer Science, University of Sydney
Jeff Z. Pan The University of Edinburgh, Edinburgh

DOI:

https://doi.org/10.1609/aaai.v40i47.41418

Abstract

The identification and filtration of malicious texts in social media environments represent a significant technical challenge aimed at protecting users from online violence and disinformation. This complexity stems from the diversity and innovativeness of social media texts, which include unique expressions and special sentence structures. Particularly, malicious texts in interrogative forms pose alignment challenges with traditional corpora due to existing methods’ failure to exploit the text’s deep global semantic representations. This issue is compounded by the scant research on Chinese texts, leading to inefficiencies in recognition accuracy. To mitigate these challenges, we introduce an innovative framework based on a Global Contrastive Semantic Network (GCSN), designed to enhance malicious text recognition efficiency and accuracy by deeply learning global semantic knowledge. It comprises an encoder for global semantic information modelling and a graph-matching network for semantic similarity evaluation between question pairs, enabling the accurate identification and filtering of malicious texts with complex structures. Furthermore, we introduce a semantic consistency-based data augmentation method (COMBINE), using real-world data to generate balanced positive and negative samples, enriching the dataset and enhancing the model’s ability to distinguish semantic consistency through contrastive learning. Experimental validation on two Chinese datasets demonstrates our model’s exceptional performance, affirming its applicationa value in social media malicious text recognition. Our code is available at https://github.com/Wxy13131313131/GCSN-COMBINE

Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation (Abstract Reprint)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information