Proceedings of the AAAI Conference on Artificial Intelligence

https://ojs.aaai.org/index.php/AAAI/issue/feed Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-25T06:50:00-07:00 Publications Manager publications@aaai.org Open Journal Systems <p>The proceedings of the AAAI Conference on Artificial Intelligence (AAAI) provides an archival record of the annual AAAI Conference on Artificial Intelligence, which serves as AAAI's primary conference. The meeting provides a forum that promotes theoretical and applied AI research as well as intellectual interchange among researchers and practitioners. The technical papers in the proceedings are selected through a rigorous, blind, peer-review process.</p> https://ojs.aaai.org/index.php/AAAI/article/view/27749 A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation 2024-03-24T00:06:12-07:00 Yongkang Wang wyky481@webmail.hzau.edu.cn Xuan Liu lx666@webmail.hzau.edu.cn Feng Huang fhuang233@webmail.hzau.edu.cn Zhankun Xiong xiongzk@webmail.hzau.edu.cn Wen Zhang zhangwen@mail.hzau.edu.cn

Therapeutic peptides represent a unique class of pharmaceutical agents crucial for the treatment of human diseases. Recently, deep generative models have exhibited remarkable potential for generating therapeutic peptides, but they only utilize sequence or structure information alone, which hinders the performance in generation. In this study, we propose a Multi-Modal Contrastive Diffusion model (MMCD), fusing both sequence and structure modalities in a diffusion framework to co-generate novel peptide sequences and structures. Specifically, MMCD constructs the sequence-modal and structure-modal diffusion models, respectively, and devises a multi-modal contrastive learning strategy with inter-contrastive and intra-contrastive in each diffusion timestep, aiming to capture the consistency between two modalities and boost model performance. The inter-contrastive aligns sequences and structures of peptides by maximizing the agreement of their embeddings, while the intra-contrastive differentiates therapeutic and non-therapeutic peptides by maximizing the disagreement of their sequence/structure embeddings simultaneously. The extensive experiments demonstrate that MMCD performs better than other state-of-the-art deep generative methods in generating therapeutic peptides across various metrics, including antimicrobial/anticancer score, diversity, and peptide-docking.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27750 Towards Automated RISC-V Microarchitecture Design with Reinforcement Learning 2024-03-24T00:06:13-07:00 Chen Bai cbai@cse.cuhk.edu.hk Jianwang Zhai zhaijw@bupt.edu.cn Yuzhe Ma yuzhema@ust.hk Bei Yu byu@cse.cuhk.edu.hk Martin D. F. Wong mdfwong@cuhk.edu.hk

Microarchitecture determines the implementation of a microprocessor. Designing a microarchitecture to achieve better performance, power, and area (PPA) trade-off has been increasingly difficult. Previous data-driven methodologies hold inappropriate assumptions and lack more tightly coupling with expert knowledge. This paper proposes a novel reinforcement learning-based (RL) solution that addresses these limitations. With the integration of microarchitecture scaling graph, PPA preference space embedding, and proposed lightweight environment in RL, experiments using commercial electronic design automation (EDA) tools show that our method achieves an average PPA trade-off improvement of 16.03% than previous state-of-the-art approaches with 4.07× higher efficiency. The solution qualities outperform human implementations by at most 2.03× in the PPA trade-off.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27751 Generating Novel Leads for Drug Discovery Using LLMs with Logical Feedback 2024-03-24T00:06:15-07:00 Shreyas Bhat Brahmavar shreyasbhat2001@gmail.com Ashwin Srinivasan ashwin@goa.bits-pilani.ac.in Tirtharaj Dash td522@cam.ac.uk Sowmya Ramaswamy Krishnan sowmya.rk1@tcs.com Lovekesh Vig lovekesh.vig@tcs.com Arijit Roy roy.arijit3@tcs.com Raviprasad Aduri aduri@goa.bits-pilani.ac.in

Large Language Models (LLMs) can be used as repositories of biological and chemical information to generate pharmacological lead compounds. However, for LLMs to focus on specific drug targets typically requires experimentation with progressively more refined prompts. Results thus become dependent not just on what is known about the target, but also on what is known about the prompt- engineering. In this paper, we separate the prompt into domain-constraints that can be written in a standard logical form and a simple text-based query. We investigate whether LLMs can be guided, not by refining prompts manually, but by refining the logical component automatically, keeping the query unchanged. We describe an iterative procedure LMLF (“Language Model with Logical Feedback”) in which the constraints are progressively refined using a logical notion of generalisation. On any iteration, newly generated instances are verified against the constraint, providing "logical-feedback" for the next iteration's refinement of the constraints. We evaluate LMLF using two well-known targets (inhibition of the Janus Kinase 2; and Dopamine Receptor D2); and two different LLMs (GPT-3 and PaLM). We show that LMLF, starting with the same logical constraints and query text, can be used to guide both LLMs to generate potential leads. We find: (a) Binding affinities of LMLF-generated molecules are skewed towards higher binding affinities than those from existing baselines; (b) LMLF results in generating molecules that are skewed towards higher binding affinities than without logical feedback; (c) Assessment by a computational chemist suggests that LMLF generated compounds may be novel inhibitors. These findings suggest that LLMs with logical feedback may provide a mechanism for generating new leads without requiring the domain-specialist to acquire sophisticated skills in prompt-engineering.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27752 SeGA: Preference-Aware Self-Contrastive Learning with Prompts for Anomalous User Detection on Twitter 2024-03-24T00:06:17-07:00 Ying-Ying Chang cindy88409.cs10@nycu.edu.tw Wei-Yao Wang sf1638.cs05@nctu.edu.tw Wen-Chih Peng wcpengcs@nycu.edu.tw

In the dynamic and rapidly evolving world of social media, detecting anomalous users has become a crucial task to address malicious activities such as misinformation and cyberbullying. As the increasing number of anomalous users improves the ability to mimic normal users and evade detection, existing methods only focusing on bot detection are ineffective in terms of capturing subtle distinctions between users. To address these challenges, we proposed SeGA, preference-aware self-contrastive learning for anomalous user detection, which leverages heterogeneous entities and their relations in the Twittersphere to detect anomalous users with different malicious strategies. SeGA utilizes the knowledge of large language models to summarize user preferences via posts. In addition, integrating user preferences with prompts as pseudo-labels for preference-aware self-contrastive learning enables the model to learn multifaceted aspects for describing the behaviors of users. Extensive experiments on the proposed TwBNT benchmark demonstrate that SeGA significantly outperforms the state-of-the-art methods (+3.5% ∼ 27.6%) and empirically validate the effectiveness of the model design and pre-training strategies. Our code and data are publicly available at https://github.com/ying0409/SeGA.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27753 Neural Embeddings for kNN Search in Biological Sequence 2024-03-24T00:06:18-07:00 Zhihao Chang changzhihao@zju.edu.cn Linzhu Yu linzhu@zju.edu.cn Yanchao Xu xuyanchao@zju.edu.cn Wentao Hu wthu@zju.edu.cn

Biological sequence nearest neighbor search plays a fundamental role in bioinformatics. To alleviate the pain of quadratic complexity for conventional distance computation, neural distance embeddings, which project sequences into geometric space, have been recognized as a promising paradigm. To maintain the distance order between sequences, these models all deploy triplet loss and use intuitive methods to select a subset of triplets for training from a vast selection space. However, we observed that such training often enables models to distinguish only a fraction of distance orders, leaving others unrecognized. Moreover, naively selecting more triplets for training under the state-of-the-art network not only adds costs but also hampers model performance. In this paper, we introduce Bio-kNN: a kNN search framework for biological sequences. It includes a systematic triplet selection method and a multi-head network, enhancing the discernment of all distance orders without increasing training expenses. Initially, we propose a clustering-based approach to partition all triplets into several clusters with similar properties, and then select triplets from these clusters using an innovative strategy. Meanwhile, we noticed that simultaneously training different types of triplets in the same network cannot achieve the expected performance, thus we propose a multi-head network to tackle this. Our network employs a convolutional neural network(CNN) to extract local features shared by all clusters, and then learns a multi-layer perception(MLP) head for each cluster separately. Besides, we treat CNN as a special head, thereby integrating crucial local features which are neglected in previous models into our model for similarity recognition. Extensive experiments show that our Bio-kNN significantly outperforms the state-of-the-art methods on two large-scale datasets without increasing the training cost.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27754 i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance 2024-03-24T00:06:20-07:00 Haoyang Chen 213200606@seu.edu.cn Peiyan Sun 213200380@seu.edu.cn Qiyuan Song 213200349@seu.edu.cn Wanyuan Wang wywang@seu.edu.cn Weiwei Wu weiweiwu@seu.edu.cn Wencan Zhang wencanz@u.nus.edu Guanyu Gao gygao@njust.edu.cn Yan Lyu lvyanly@seu.edu.cn

Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27755 GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion 2024-03-24T00:06:21-07:00 Le Cheng chengle@mail.nwpu.edu.cn Peican Zhu ericcan@nwpu.edu.cn Keke Tang tangbohutbh@gmail.com Chao Gao cgao@nwpu.edu.cn Zhen Wang w-zhen@nwpu.edu.cn

Source detection in graphs has demonstrated robust efficacy in the domain of rumor source identification. Although recent solutions have enhanced performance by leveraging deep neural networks, they often require complete user data. In this paper, we address a more challenging task, rumor source detection with incomplete user data, and propose a novel framework, i.e., Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion (GIN-SD), to tackle this challenge. Specifically, our approach utilizes a positional embedding module to distinguish nodes that are incomplete and employs a self-attention mechanism to focus on nodes with greater information transmission capacity. To mitigate the prediction bias caused by the significant disparity between the numbers of source and non-source nodes, we also introduce a class-balancing mechanism. Extensive experiments validate the effectiveness of GIN-SD and its superiority to state-of-the-art methods.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27756 Deep Quantum Error Correction 2024-03-24T00:06:23-07:00 Yoni Choukroun choukroun.yoni@gmail.com Lior Wolf wolf@cs.tau.ac.il

Quantum error correction codes (QECC) are a key component for realizing the potential of quantum computing. QECC, as its classical counterpart (ECC), enables the reduction of error rates, by distributing quantum logical information across redundant physical qubits, such that errors can be detected and corrected. In this work, we efficiently train novel end-to-end deep quantum error decoders. We resolve the quantum measurement collapse by augmenting syndrome decoding to predict an initial estimate of the system noise, which is then refined iteratively through a deep neural network. The logical error rates calculated over finite fields are directly optimized via a differentiable objective, enabling efficient decoding under the constraints imposed by the code. Finally, our architecture is extended to support faulty syndrome measurement, by efficient decoding of repeated syndrome sampling. The proposed method demonstrates the power of neural decoders for QECC by achieving state-of-the-art accuracy, outperforming for small distance topological codes, the existing end-to-end neural and classical decoders, which are often computationally prohibitive.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27757 Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection 2024-03-24T00:06:24-07:00 Chaoqun Cui 13698603020@163.com Caiyan Jia cyjia@bjtu.edu.cn

Rumor detection on social media has become increasingly important. Most existing graph-based models presume rumor propagation trees (RPTs) have deep structures and learn sequential stance features along branches. However, through statistical analysis on real-world datasets, we find RPTs exhibit wide structures, with most nodes being shallow 1-level replies. To focus learning on intensive substructures, we propose Rumor Adaptive Graph Contrastive Learning (RAGCL) method with adaptive view augmentation guided by node centralities. We summarize three principles for RPT augmentation: 1) exempt root nodes, 2) retain deep reply nodes, 3) preserve lower-level nodes in deep sections. We employ node dropping, attribute masking and edge dropping with probabilities from centrality-based importance scores to generate views. A graph contrastive objective then learns robust rumor representations. Extensive experiments on four benchmark datasets demonstrate RAGCL outperforms state-of-the-art methods. Our work reveals the wide-structure nature of RPTs and contributes an effective graph contrastive learning approach tailored for rumor detection through principled adaptive augmentation. The proposed principles and augmentation techniques can potentially benefit other applications involving tree-structured graphs.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27758 Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning 2024-03-24T00:06:26-07:00 Longchao Da ld49@njit.edu Minquan Gao minchiuan.gao@gmail.com Hao Mei hmei7@asu.edu Hua Wei hua.wei@asu.edu

Numerous solutions are proposed for the Traffic Signal Control (TSC) tasks aiming to provide efficient transportation and alleviate traffic congestion. Recently, promising results have been attained by Reinforcement Learning (RL) methods through trial and error in simulators, bringing confidence in solving cities' congestion problems. However, performance gaps still exist when simulator-trained policies are deployed to the real world. This issue is mainly introduced by the system dynamic difference between the training simulators and the real-world environments. In this work, we leverage the knowledge of Large Language Models (LLMs) to understand and profile the system dynamics by a prompt-based grounded action transformation to bridge the performance gap. Specifically, this paper exploits the pre-trained LLM's inference ability to understand how traffic dynamics change with weather conditions, traffic states, and road types. Being aware of the changes, the policies' action is taken and grounded based on realistic dynamics, thus helping the agent learn a more realistic policy. We conduct experiments on four different scenarios to show the effectiveness of the proposed PromptGAT's ability to mitigate the performance gap of reinforcement learning from simulation to reality (sim-to-real).

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27759 Multitarget Device-Free Localization via Cross-Domain Wi-Fi RSS Training Data and Attentional Prior Fusion 2024-03-24T00:06:27-07:00 Na Fan nfanaa@connect.ust.hk Zeyue Tian ztianad@connect.ust.hk Amartansh Dubey adubey@connect.ust.hk Samruddhi Deshmukh ssdeshmukh@connect.ust.hk Ross Murch eermurch@ust.hk Qifeng Chen cqf@ust.hk

Device-free localization (DFL) using easily-obtained Wi-Fi received signal strength (RSS) has wide real-world applications for not requiring people to carry trackable devices. However, accurate multitarget DFL remains challenging due to the unknown number of targets, multipath interference (MPI), especially between nearby targets, and limited real-world data. In this study, we pioneeringly propose a transformer-based learning method with Wi-Fi RSS as input, and an attentional prior fusion module, to simultaneously locate an unknown number of people at random positions. To overcome the multitarget data collection challenges, we contribute a large-scale cross-domain real-simulation-augmentation training dataset with one and two real-world nearby non-person objects at limited positions and up to five simulated and augmented randomly distributed targets. Experimental results demonstrate our method's improved accuracy, generalization ability, and robustness with fewer Wi-Fi nodes than previous methods.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27760 Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables 2024-03-24T00:06:29-07:00 Haisong Gong gonghaisong2021@ia.ac.cn Weizhi Xu weizhi.xu@cripac.ia.ac.cn Shu Wu shu.wu@nlpr.ia.ac.cn Qiang Liu qiang.liu@nlpr.ia.ac.cn Liang Wang wangliang@nlpr.ia.ac.cn

Fact checking aims to predict claim veracity by reasoning over multiple evidence pieces. It usually involves evidence retrieval and veracity reasoning. In this paper, we focus on the latter, reasoning over unstructured text and structured table information. Previous works have primarily relied on fine-tuning pretrained language models or training homogeneous-graph-based models. Despite their effectiveness, we argue that they fail to explore the rich semantic information underlying the evidence with different structures. To address this, we propose a novel word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information, namely HeterFC. Our approach leverages a heterogeneous evidence graph, with words as nodes and thoughtfully designed edges representing different evidence properties. We perform information propagation via a relational graph neural network, facilitating interactions between claims and evidence. An attention-based method is utilized to integrate information, combined with a language model for generating predictions. We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval. Comprehensive experiments on the large fact checking dataset FEVEROUS demonstrate the effectiveness of HeterFC. Code will be released at: https://github.com/Deno-V/HeterFC.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27761 Text-Guided Molecule Generation with Diffusion Language Model 2024-03-24T00:06:31-07:00 Haisong Gong gonghaisong2021@ia.ac.cn Qiang Liu qiang.liu@nlpr.ia.ac.cn Shu Wu shu.wu@nlpr.ia.ac.cn Liang Wang wangliang@nlpr.ia.ac.cn

Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27762 Adversarial Robust Safeguard for Evading Deep Facial Manipulation 2024-03-24T00:06:33-07:00 Jiazhi Guan guanjz20@mails.tsinghua.edu.cn Yi Zhao yizhao.tsinghua@gmail.com Zhuoer Xu xuzhuoer.xze@antgroup.com Changhua Meng changhua.mch@antgroup.com Ke Xu xuke@tsinghua.edu.cn Youjian Zhao zhaoyoujian@tsinghua.edu.cn

The non-consensual exploitation of facial manipulation has emerged as a pressing societal concern. In tandem with the identification of such fake content, recent research endeavors have advocated countering manipulation techniques through proactive interventions, specifically the incorporation of adversarial noise to impede the manipulation in advance. Nevertheless, with insufficient consideration of robustness, we show that current methods falter in providing protection after simple perturbations, e.g., blur. In addition, traditional optimization-based methods face limitations in scalability as they struggle to accommodate the substantial expansion of data volume, a consequence of the time-intensive iterative pipeline. To solve these challenges, we propose a learning-based model, Adversarial Robust Safeguard (ARS), to generate desirable protection noise in a single forward process, concurrently exhibiting a heightened resistance against prevalent perturbations. Specifically, our method involves a two-way protection design, characterized by a basic protection component responsible for generating efficacious noise features, coupled with robust protection for further enhancement. In robust protection, we first fuse image features with spatially duplicated noise embedding, thereby accounting for inherent information redundancy. Subsequently, a combination comprising a differentiable perturbation module and an adversarial network is devised to simulate potential information degradation during the training process. To evaluate it, we conduct experiments on four manipulation methods and compare recent works comprehensively. The results of our method exhibit good visual effects with pronounced robustness against varied perturbations at different levels.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27763 FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework 2024-03-24T00:06:35-07:00 Dongyue Guo dongyueguo@stu.scu.edu.cn Zheng Zhang zhaeng@stu.scu.edu.cn Zhen Yan tankzhen@163.com Jianwei Zhang zhangjianwei@scu.edu.cn Yi Lin yilin@scu.edu.cn

Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently. Existing approaches generally perform multi-horizon FTP tasks in an autoregressive manner, thereby suffering from error accumulation and low-efficiency problems. In this paper, a novel framework, called FlightBERT++, is proposed to i) forecast multi-horizon flight trajectories directly in a non-autoregressive way, and ii) improve the limitation of the binary encoding (BE) representation in the FlightBERT. Specifically, the FlightBERT++ is implemented by a generalized encoder-decoder architecture, in which the encoder learns the temporal-spatial patterns from historical observations and the decoder predicts the flight status for the future horizons. Compared with conventional architecture, an innovative horizon-aware contexts generator is dedicatedly designed to consider the prior horizon information, which further enables non-autoregressive multi-horizon prediction. Moreover, a differential prompted decoder is proposed to enhance the capability of the differential predictions by leveraging the stationarity of the differential sequence. The experimental results on a real-world dataset demonstrated that the FlightBERT++ outperformed the competitive baselines in both FTP performance and computational efficiency.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27764 LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection 2024-03-24T00:06:37-07:00 Hongcheng Guo hongchengguo@buaa.edu.cn Jian Yang jiaya@buaa.edu.cn Jiaheng Liu liujiaheng@buaa.edu.cn Jiaqi Bai bjq@buaa.edu.cn Boyang Wang wangboyang@buaa.edu.cn Zhoujun Li lizj@buaa.edu.cn Tieqiao Zheng steven.zheng@cloudwise.com Bo Zhang bowen.zhang@cloudwise.com Junran Peng jrpeng4ever@126.com Qi Tian tian.qi1@huawei.com

Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters. Besides, the Log-Attention module is proposed to supplement the information ignored by the log-paring. The proposed method is evaluated on three public datasets and one real-world dataset. Experimental results on multiple benchmarks demonstrate the effectiveness of our LogFormer with fewer trainable parameters and lower training costs.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27765 ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing 2024-03-24T00:06:39-07:00 Zhi Jin 20214227052@stu.suda.edu.cn Sheng Xu shengxu@link.cuhk.edu.hk Xiang Zhang xzhang23@ualberta.ca Tianze Ling ltz20@mails.tsinghua.edu.cn Nanqing Dong dongnanqing@pjlab.org.cn Wanli Ouyang wanli.ouyang@sydney.edu.au Zhiqiang Gao gao_zhi_qiang@126.com Cheng Chang changchengbio@163.com Siqi Sun intersun2@gmail.com

De novo peptide sequencing from mass spectrometry (MS) data is a critical task in proteomics research. Traditional de novo algorithms have encountered a bottleneck in accuracy due to the inherent complexity of proteomics data. While deep learning-based methods have shown progress, they reduce the problem to a translation task, potentially overlooking critical nuances between spectra and peptides. In our research, we present ContraNovo, a pioneering algorithm that leverages contrastive learning to extract the relationship between spectra and peptides and incorporates the mass information into peptide decoding, aiming to address these intricacies more efficiently. Through rigorous evaluations on two benchmark datasets, ContraNovo consistently outshines contemporary state-of-the-art solutions, underscoring its promising potential in enhancing de novo peptide sequencing.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27766 Inducing Point Operator Transformer: A Flexible and Scalable Architecture for Solving PDEs 2024-03-24T00:06:41-07:00 Seungjun Lee seungjun.lee@alsemy.com TaeiL Oh taeil.oh@alsemy.com

Solving partial differential equations (PDEs) by learning the solution operators has emerged as an attractive alternative to traditional numerical methods. However, implementing such architectures presents two main challenges: flexibility in handling irregular and arbitrary input and output formats and scalability to large discretizations. Most existing architectures are limited by their desired structure or infeasible to scale large inputs and outputs. To address these issues, we introduce an attention-based model called an inducing point operator transformer (IPOT). Inspired by inducing points methods, IPOT is designed to handle any input function and output query while capturing global interactions in a computationally efficient way. By detaching the inputs/outputs discretizations from the processor with a smaller latent bottleneck, IPOT offers flexibility in processing arbitrary discretizations and scales linearly with the size of inputs/outputs. Our experimental results demonstrate that IPOT achieves strong performances with manageable computational complexity on an extensive range of PDE benchmarks and real-world weather forecasting scenarios, compared to state-of-the-art methods. Our code is publicly available at https://github.com/7tl7qns7ch/IPOT.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27767 MASTER: Market-Guided Stock Transformer for Stock Price Forecasting 2024-03-24T00:06:43-07:00 Tong Li 2017lt@sjtu.edu.cn Zhaoyang Liu jingmu.lzy@alibaba-inc.com Yanyan Shen shenyy@sjtu.edu.cn Xue Wang wxie91@gmail.com Haokun Chen chenhaokun1994@gmail.com Sen Huang huangsen47@gmail.com

Stock price forecasting has remained an extremely challenging problem for many decades due to the high volatility of the stock market. Recent efforts have been devoted to modeling complex stock correlations toward joint stock price forecasting. Existing works share a common neural architecture that learns temporal patterns from individual stock series and then mixes up temporal representations to establish stock correlations. However, they only consider time-aligned stock correlations stemming from all the input stock features, which suffer from two limitations. First, stock correlations often occur momentarily and in a cross-time manner. Second, the feature effectiveness is dynamic with market variation, which affects both the stock sequential patterns and their correlations. To address the limitations, this paper introduces MASTER, a MArkert-guided Stock TransformER, which models the momentary and cross-time stock correlation and leverages market information for automatic feature selection. MASTER elegantly tackles the complex stock correlation by alternatively engaging in intra-stock and inter-stock information aggregation. Experiments show the superiority of MASTER compared with previous works and visualize the captured realistic stock correlation to provide valuable insights.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27768 Learning from Polar Representation: An Extreme-Adaptive Model for Long-Term Time Series Forecasting 2024-03-24T00:06:44-07:00 Yanhong Li yli20@scu.edu Jack Xu jxu@valleywater.org David Anastasiu danastasiu@scu.edu

In the hydrology field, time series forecasting is crucial for efficient water resource management, improving flood and drought control and increasing the safety and quality of life for the general population. However, predicting long-term streamflow is a complex task due to the presence of extreme events. It requires the capture of long-range dependencies and the modeling of rare but important extreme values. Existing approaches often struggle to tackle these dual challenges simultaneously. In this paper, we specifically delve into these issues and propose Distance-weighted Auto-regularized Neural network (DAN), a novel extreme-adaptive model for long-range forecasting of stremflow enhanced by polar representation learning. DAN utilizes a distance-weighted multi-loss mechanism and stackable blocks to dynamically refine indicator sequences from exogenous data, while also being able to handle uni-variate time-series by employing Gaussian Mixture probability modeling to improve robustness to severe events. We also introduce Kruskal-Wallis sampling and gate control vectors to handle imbalanced extreme data. On four real-life hydrologic streamflow datasets, we demonstrate that DAN significantly outperforms both state-of-the-art hydrologic time series prediction methods and general methods designed for long-term time series prediction.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27769 The Causal Impact of Credit Lines on Spending Distributions 2024-03-24T00:06:47-07:00 Yijun Li yijunli5-c@my.cityu.edu.hk Cheuk Hang Leung chleung87@cityu.edu.hk Xiangqian Sun xqsun4-c@my.cityu.edu.hk Chaoqun Wang cqwang5-c@my.cityu.edu.hk Yiyan Huang yiyhuang3-c@my.cityu.edu.hk Xing Yan xingyan@ruc.edu.cn Qi Wu qiwu55@cityu.edu.hk Dongdong Wang wangdongdong9@jd.com Zhixiang Huang huangzhixiang@jd.com

Consumer credit services offered by electronic commerce platforms provide customers with convenient loan access during shopping and have the potential to stimulate sales. To understand the causal impact of credit lines on spending, previous studies have employed causal estimators, (e.g., direct regression (DR), inverse propensity weighting (IPW), and double machine learning (DML)) to estimate the treatment effect. However, these estimators do not treat the spending of each individual as a distribution that can capture the range and pattern of amounts spent across different orders. By disregarding the outcome as a distribution, valuable insights embedded within the outcome distribution might be overlooked. This paper thus develops distribution valued estimators which extend from existing real valued DR, IPW, and DML estimators within Rubin’s causal framework. We establish their consistency and apply them to a real dataset from a large electronic commerce platform. Our findings reveal that credit lines generally have a positive impact on spending across all quantiles, but consumers would allocate more to luxuries (higher quantiles) than necessities (lower quantiles) as credit lines increase.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27770 Improving PTM Site Prediction by Coupling of Multi-Granularity Structure and Multi-Scale Sequence Representation 2024-03-24T00:06:49-07:00 Zhengyi Li lzy_gnn@webmail.hzau.edu.cn Menglu Li mengluli@webmail.hzau.edu.cn Lida Zhu 190272318@qq.com Wen Zhang zhangwen@mail.hzau.edu.cn

Protein post-translational modification (PTM) site prediction is a fundamental task in bioinformatics. Several computational methods have been developed to predict PTM sites. However, existing methods ignore the structure information and merely utilize protein sequences. Furthermore, designing a more fine-grained structure representation learning method is urgently needed as PTM is a biological event that occurs at the atom granularity. In this paper, we propose a PTM site prediction method by Coupling of Multi-Granularity structure and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically, multigranularity structure-aware representation learning is designed to learn neighborhood structure representations at the amino acid, atom, and whole protein granularity from AlphaFold predicted structures, followed by utilizing contrastive learning to optimize the structure representations. Additionally, multi-scale sequence representation learning is used to extract context sequence information, and motif generated by aligning all context sequences of PTM sites assists the prediction. Extensive experiments on three datasets show that PTM-CMGMS outperforms the state-of-the-art methods. Source code can be found at https://github.com/LZY-HZAU/PTM-CMGMS.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27771 Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification 2024-03-24T00:06:51-07:00 Minghui Liao minghui@whu.edu.cn Guojia Wan guojiawan@whu.edu.cn Bo Du dubo@whu.edu.cn

Determining the types of neurons within a nervous system plays a significant role in the analysis of brain connectomics and the investigation of neurological diseases. However, the efficiency of utilizing anatomical, physiological, or molecular characteristics of neurons is relatively low and costly. With the advancements in electron microscopy imaging and analysis techniques for brain tissue, we are able to obtain whole-brain connectome consisting neuronal high-resolution morphology and connectivity information. However, few models are built based on such data for automated neuron classification. In this paper, we propose NeuNet, a framework that combines morphological information of neurons obtained from skeleton and topological information between neurons obtained from neural circuit. Specifically, NeuNet consists of three components, namely Skeleton Encoder, Connectome Encoder, and Readout Layer. Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results. We reprocess and release two new datasets for neuron classification task from volume electron microscopy(VEM) images of human brain cortex and Drosophila brain. Experiments on these two datasets demonstrated the effectiveness of our model with accuracies of 0.9169 and 0.9363, respectively. Code and data are available at: https://github.com/WHUminghui/NeuNet.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27772 Root Cause Analysis in Microservice Using Neural Granger Causal Discovery 2024-03-24T00:06:53-07:00 Cheng-Ming Lin zmlin1998.cs10@nycu.edu.tw Ching Chang blacksnail789521@gmail.com Wei-Yao Wang sf1638.cs05@nctu.edu.tw Kuang-Da Wang gdwang.cs10@nycu.edu.tw Wen-Chih Peng wcpengcs@nycu.edu.tw

In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site reliability engineers (SREs) to pinpoint the root cause due to the complex relationship in microservices when facing system malfunctions. Previous research employed structure learning methods (e.g., PC-algorithm) to establish causal relationships and derive root causes from causal graphs. Nevertheless, they ignored the temporal order of time series data and failed to leverage the rich information inherent in the temporal relationships. For instance, in cases where there is a sudden spike in CPU utilization, it can lead to an increase in latency for other microservices. However, in this scenario, the anomaly in CPU utilization occurs before the latency increases, rather than simultaneously. As a result, the PC-algorithm fails to capture such characteristics. To address these challenges, we propose RUN, a novel approach for root cause analysis using neural Granger causal discovery with contrastive learning. RUN enhances the backbone encoder by integrating contextual information from time series and leverages a time series forecasting model to conduct neural Granger causal discovery. In addition, RUN incorporates Pagerank with a personalization vector to efficiently recommend the top-k root causes. Extensive experiments conducted on the synthetic and real-world microservice-based datasets demonstrate that RUN noticeably outperforms the state-of-the-art root cause analysis methods. Moreover, we provide an analysis scenario for the sock-shop case to showcase the practicality and efficacy of RUN in microservice-based applications. Our code is publicly available at https://github.com/zmlin1998/RUN.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27773 Model-Driven Deep Neural Network for Enhanced AoA Estimation Using 5G gNB 2024-03-24T00:06:55-07:00 Shengheng Liu s.liu@seu.edu.cn Xingkang Li xingkangli@seu.edu.cn Zihuan Mao mzh@seu.edu.cn Peng Liu herolp@gmail.com Yongming Huang huangym@seu.edu.cn

High-accuracy positioning has become a fundamental enabler for intelligent connected devices. Nevertheless, the present wireless networks still rely on model-driven approaches to achieve positioning functionality, which are susceptible to performance degradation in practical scenarios, primarily due to hardware impairments. Integrating artificial intelligence into the positioning framework presents a promising solution to revolutionize the accuracy and robustness of location-based services. In this study, we address this challenge by reformulating the problem of angle-of-arrival (AoA) estimation into image reconstruction of spatial spectrum. To this end, we design a model-driven deep neural network (MoD-DNN), which can automatically calibrate the angular-dependent phase error. The proposed MoD-DNN approach employs an iterative optimization scheme between a convolutional neural network and a sparse conjugate gradient algorithm. Simulation and experimental results are presented to demonstrate the effectiveness of the proposed method in enhancing spectrum calibration and AoA estimation.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27774 MID-FiLD: MIDI Dataset for Fine-Level Dynamics 2024-03-24T00:06:56-07:00 Jesung Ryu jesung@pozalabs.com Seungyeon Rhyu seungyeon@pozalabs.com Hong-Gyu Yoon honggyu@pozalabs.com Eunchong Kim eunchong@pozalabs.com Ju Young Yang juyoung.yang@duke.edu Taehyun Kim taehyun@pozalabs.com

One of the challenges in generating human-like music is articulating musical expressions such as dynamics, phrasing, and timbre, which are difficult for computational models to mimic. Previous efforts to tackle this problem have been insufficient due to a fundamental lack of data containing information about musical expressions. In this paper, we introduce MID-FiLD, a MIDI dataset for learning fine-level dynamics control. Notable properties of MID-FiLD are as follows: (1) All 4,422 MIDI samples are constructed by professional music writers with a strong understanding of composition and musical expression. (2) Each MIDI sample contains four different musical metadata and control change \#1 (CC\#1) value. We verify that our metadata is a key factor in MID-FiLD, exerting a substantial influence over produced CC\#1 values. In addition, we demonstrate the applicability of MID-FiLD to deep learning models by suggesting a token-based encoding methodology and reveal the potential for generating controllable, human-like musical expressions.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27775 PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations 2024-03-24T00:06:58-07:00 Rui She rui.she@ntu.edu.sg Sijie Wang wang1679@e.ntu.edu.sg Qiyu Kang kang0080@e.ntu.edu.sg Kai Zhao kai.zhao@ntu.edu.sg Yang Song yang.song@connect.polyu.hk Wee Peng Tay wptay@ntu.edu.sg Tianyu Geng tianyu.geng@ntu.edu.sg Xingchao Jian xingchao001@e.ntu.edu.sg

Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and point-level correspondence. We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features and position embeddings for point clouds. We incorporate position embeddings into a Transformer module based on a neural ordinary differential equation (ODE) to efficiently represent patches within points. We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds. Subsequently, we use registration methods such as SVD-based algorithms to predict the transformation using corresponding point pairs. We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations. The implementation code of experiments is available at https://github.com/AI-IT-AVs/PosDiffNet.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27776 StegaStyleGAN: Towards Generic and Practical Generative Image Steganography 2024-03-24T00:06:59-07:00 Wenkang Su swk1004@163.com Jiangqun Ni issjqni@mail.sysu.edu.cn Yiyan Sun sunyy27@mail2.sysu.edu.cn

The recent advances in generative image steganography have drawn increasing attention due to their potential for provable security and bulk embedding capacity. However, existing generative steganographic schemes are usually tailored for specific tasks and are hardly applied to applications with practical constraints. To address this issue, this paper proposes a generic generative image steganography scheme called Steganography StyleGAN (StegaStyleGAN) that meets the practical objectives of security, capacity, and robustness within the same framework. In StegaStyleGAN, a novel Distribution-Preserving Secret Data Modulator (DP-SDM) is used to achieve provably secure generative image steganography by preserving the data distribution of the model inputs. Additionally, a generic and efficient Secret Data Extractor (SDE) is invented for accurate secret data extraction. By choosing whether to incorporate the Image Attack Simulator (IAS) during the training process, one can obtain two models with different parameters but the same structure (both generator and extractor) for lossless and lossy channel covert communication, namely StegaStyleGAN-Ls and StegaStyleGAN-Ly. Furthermore, by mating with GAN inversion, conditional generative steganography can be achieved as well. Experimental results demonstrate that, whether for lossless or lossy communication channels, the proposed StegaStyleGAN can significantly outperform the corresponding state-of-the-art schemes.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27777 Dual-Channel Learning Framework for Drug-Drug Interaction Prediction via Relation-Aware Heterogeneous Graph Transformer 2024-03-24T00:07:01-07:00 Xiaorui Su suxiaorui19@mails.ucas.ac.cn Pengwei Hu hpw@ms.xjb.ac.cn Zhu-Hong You zhuhongyou@nwpu.edu.cn Philip S. Yu psyu@uic.edu Lun Hu hulun@ms.xjb.ac.cn

Identifying novel drug-drug interactions (DDIs) is a crucial task in pharmacology, as the interference between pharmacological substances can pose serious medical risks. In recent years, several network-based techniques have emerged for predicting DDIs. However, they primarily focus on local structures within DDI-related networks, often overlooking the significance of indirect connections between pairwise drug nodes from a global perspective. Additionally, effectively handling heterogeneous information present in both biomedical knowledge graphs and drug molecular graphs remains a challenge for improved performance of DDI prediction. To address these limitations, we propose a Transformer-based relatIon-aware Graph rEpresentation leaRning framework (TIGER) for DDI prediction. TIGER leverages the Transformer architecture to effectively exploit the structure of heterogeneous graph, which allows it direct learning of long dependencies and high-order structures. Furthermore, TIGER incorporates a relation-aware self-attention mechanism, capturing a diverse range of semantic relations that exist between pairs of nodes in heterogeneous graph. In addition to these advancements, TIGER enhances predictive accuracy by modeling DDI prediction task using a dual-channel network, where drug molecular graph and biomedical knowledge graph are fed into two respective channels. By incorporating embeddings obtained at graph and node levels, TIGER can benefit from structural properties of drugs as well as rich contextual information provided by biomedical knowledge graph. Extensive experiments conducted on three real-world datasets demonstrate the effectiveness of TIGER in DDI prediction. Furthermore, case studies highlight its ability to provide a deeper understanding of underlying mechanisms of DDIs.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27778 Molecular Optimization Model with Patentability Constraint 2024-03-24T00:07:02-07:00 Sally Turutov turutovsally@campus.technion.ac.il Kira Radinsky kirar@cs.technion.ac.il

In drug development, molecular optimization is a crucial challenge that involves generating novel molecules given a lead molecule as input. The task requires maintaining molecular similarity to the original molecule while simultaneously optimizing multiple chemical attributes. To aid in this process, numerous generative models have been proposed. However, in practical applications, it is crucial for these models not only to generate novel molecules with the above constraints but also to generate molecules that significantly differ from any existing patented compounds. In this work, we present a multi-optimization molecular framework to address this challenge. Our framework trains a model to prioritize both enhanced properties and substantial dissimilarity from patented compounds. By jointly learning continuous representations of optimized and patentable molecules, we ensure that the generated molecules are significantly distant from any patented compounds while improving chemical properties. Through empirical evaluation, we demonstrate the superior performance of our approach compared to state-of-the-art molecular optimization methods both in chemical property optimization and patentability.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27779 Generalizable Sleep Staging via Multi-Level Domain Alignment 2024-03-24T00:07:04-07:00 Jiquan Wang wangjiquan@zju.edu.cn Sha Zhao szhao@zju.edu.cn Haiteng Jiang h.jiang@zju.edu.cn Shijian Li shijianli@zju.edu.cn Tao Li litaozjusc@zju.edu.cn Gang Pan gpan@zju.edu.cn

Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27780 Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks 2024-03-24T00:07:05-07:00 Tong Wang mg20330065@smail.nju.edu.cn Yuan Yao y.yao@nju.edu.cn Feng Xu xf@nju.edu.cn Miao Xu miao.xu@uq.edu.au Shengwei An an93@purdue.edu Ting Wang ting@psu.edu

Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTInspector against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTInspector then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27781 Conformal Crystal Graph Transformer with Robust Encoding of Periodic Invariance 2024-03-24T00:07:07-07:00 Yingheng Wang yw2349@cornell.edu Shufeng Kong sk2299@cornell.edu John M. Gregoire gregoire@caltech.edu Carla P. Gomes gomes@cs.cornell.edu

Machine learning techniques, especially in the realm of materials design, hold immense promise in predicting the properties of crystal materials and aiding in the discovery of novel crystals with desirable traits. However, crystals possess unique geometric constraints—namely, E(3) invariance for primitive cell and periodic invariance—which need to be accurately reflected in crystal representations. Though past research has explored various construction techniques to preserve periodic invariance in crystal representations, their robustness remains inadequate. Furthermore, effectively capturing angular information within 3D crystal structures continues to pose a significant challenge for graph-based approaches. This study introduces novel solutions to these challenges. We first present a graph construction method that robustly encodes periodic invariance and a strategy to capture angular information in neural networks without compromising efficiency. We further introduce CrystalFormer, a pioneering graph transformer architecture that emphasizes angle preservation and enhances long-range information. Through comprehensive evaluation, we verify our model's superior performance in 5 crystal prediction tasks, reaffirming the efficiency of our proposed methods.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27782 SuperJunction: Learning-Based Junction Detection for Retinal Image Registration 2024-03-24T00:07:08-07:00 Yu Wang wangy2@i2r.a-star.edu.sg Xiaoye Wang wangxiaoye951@gmail.com Zaiwang Gu gu_zaiwang@i2r.a-star.edu.sg Weide Liu weide001@e.ntu.edu.sg Wee Siong Ng wsng@i2r.a-star.edu.sg Weimin Huang wmhuang@i2r.a-star.edu.sg Jun Cheng cheng_jun@i2r.a-star.edu.sg

Keypoints-based approaches have shown to be promising for retinal image registration, which superimpose two or more images from different views based on keypoint detection and description. However, existing approaches suffer from ineffective keypoint detector and descriptor training. Meanwhile, the non-linear mapping from 3D retinal structure to 2D images is often neglected. In this paper, we propose a novel learning-based junction detection approach for retinal image registration, which enhances both the keypoint detector and descriptor training. To improve the keypoint detection, it uses a multi-task vessel detection to regularize the model training, which helps to learn more representative features and reduce the risk of over-fitting. To achieve effective training for keypoints description, a new constrained negative sampling approach is proposed to compute the descriptor loss. Moreover, we also consider the non-linearity between retinal images from different views during matching. Experimental results on FIRE dataset show that our method achieves mean area under curve of 0.850, which is 12.6% higher than 0.755 by the state-of-the-art method. All the codes are available at https://github.com/samjcheng/SuperJunction.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27783 Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations 2024-03-24T00:07:10-07:00 Zilin Wang wangzl21@mails.tsinghua.edu.cn Haolin Zhuang zhuanghl21@mails.tsinghua.edu.cn Lu Li lilu21@mails.tsinghua.edu.cn Yinmin Zhang yinmin.zhang@icloud.com Junjie Zhong junjiezhong@ruri.waseda.jp Jun Chen y-chen21@mails.tsinghua.edu.cn Yu Yang yy20@mails.tsinghua.edu.cn Boshi Tang tbs22@mails.tsinghua.edu.cn Zhiyong Wu zywu@se.cuhk.edu.hk

This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models. Current models often generate monotonous and simplistic dance sequences that misalign with human preferences because they lack exploration capabilities.The E3D2 framework involves a reward model trained from automatically-ranked dance demonstrations, which then guides the reinforcement learning process. This approach encourages the agent to explore and generate high quality and diverse dance movement sequences. The soundness of the reward model is both theoretically and experimentally validated. Empirical experiments demonstrate the effectiveness of E3D2 on the AIST++ dataset.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27784 PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction 2024-03-24T00:07:11-07:00 Lirong Wu wulirong@westlake.edu.cn Yufei Huang huangyufei@westlake.edu.cn Cheng Tan tancheng@westlake.edu.cn Zhangyang Gao gaozhangyang@westlake.edu.cn Bozhen Hu hubozhen@westlake.edu.cn Haitao Lin linhaitao@westlake.edu.cn Zicheng Liu liuzicheng@westlake.edu.cn Stan Z. Li stan.zq.li@westlake.edu.cn

Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures and lack the co-modeling of the joint distribution of the two modalities, which may lead to significant performance drops in complex real-world scenarios due to various factors, e.g., modality missing and domain shifting. More importantly, these methods only model protein sequences and structures at a single fixed scale, neglecting more fine-grained multi-scale information, such as those embedded in key protein fragments. In this paper, we propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction (PSC-CPI), which captures the dependencies between protein sequences and structures through both intra-modality and cross-modality contrasting. We further apply length-variable protein augmentation to allow contrasting to be performed at different scales, from the amino acid level to the sequence level. Finally, in order to more fairly evaluate the model generalizability, we split the test data into four settings based on whether compounds and proteins have been observed during the training stage. Extensive experiments have shown that PSC-CPI generalizes well in all four settings, particularly in the more challenging ``Unseen-Both" setting, where neither compounds nor proteins have been observed during training. Furthermore, even when encountering a situation of modality missing, i.e., inference with only single-modality protein data, PSC-CPI still exhibits comparable or even better performance than previous approaches.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27785 Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution 2024-03-24T00:07:13-07:00 Tailin Wu tailin@cs.stanford.edu Willie Neiswanger willie.neiswanger+cmt@gmail.com Hongtao Zheng zhenghongtao@westlake.edu.cn Stefano Ermon ermon@cs.stanford.edu Jure Leskovec jure@cs.stanford.edu

Deep learning-based surrogate models have demonstrated remarkable advantages over classical solvers in terms of speed, often achieving speedups of 10 to 1000 times over traditional partial differential equation (PDE) solvers. However, a significant challenge hindering their widespread adoption in both scientific and industrial domains is the lack of understanding about their prediction uncertainties, particularly in scenarios that involve critical decision making. To address this limitation, we propose a method that integrates efficient and precise uncertainty quantification into a deep learning-based surrogate model. Our method, termed Latent Evolution of PDEs with Uncertainty Quantification (LE-PDE-UQ), endows deep learning-based surrogate models with robust and efficient uncertainty quantification capabilities for both forward and inverse problems. LE-PDE-UQ leverages latent vectors within a latent space to evolve both the system's state and its corresponding uncertainty estimation. The latent vectors are decoded to provide predictions for the system's state as well as estimates of its uncertainty. In extensive experiments, we demonstrate the accurate uncertainty quantification performance of our approach, surpassing that of strong baselines including deep ensembles, Bayesian neural network layers, and dropout. Our method excels at propagating uncertainty over extended auto-regressive rollouts, making it suitable for scenarios involving long-term predictions. Our code is available at: https://github.com/AI4Science-WestlakeU/le-pde-uq.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27786 Multilevel Attention Network with Semi-supervised Domain Adaptation for Drug-Target Prediction 2024-03-24T00:07:14-07:00 Zhousan Xie waduhek@sjtu.edu.cn Shikui Tu tushikui@sjtu.edu.cn Lei Xu leixu@sjtu.edu.cn

Prediction of drug-target interactions (DTIs) is a crucial step in drug discovery, and deep learning methods have shown great promise on various DTI datasets. However, existing approaches still face several challenges, including limited labeled data, hidden bias issue, and a lack of generalization ability to out-of-domain data. These challenges hinder the model's capacity to learn truly informative interaction features, leading to shortcut learning and inferior predictive performance on novel drug-target pairs. To address these issues, we propose MlanDTI, a semi-supervised domain adaptive multilevel attention network (Mlan) for DTI prediction. We utilize two pre-trained BERT models to acquire bidirectional representations enriched with information from unlabeled data. Then, we introduce a multilevel attention mechanism, enabling the model to learn domain-invariant DTIs at different hierarchical levels. Moreover, we present a simple yet effective semi-supervised pseudo-labeling method to further enhance our model's predictive ability in cross-domain scenarios. Experiments on four datasets show that MlanDTI achieves state-of-the-art performances over other methods under intra-domain settings and outperforms all other approaches under cross-domain settings. The source code is available at https://github.com/CMACH508/MlanDTI.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27787 Geometric-Facilitated Denoising Diffusion Model for 3D Molecule Generation 2024-03-24T00:07:16-07:00 Can Xu leoxc1571@163.com Haosen Wang haosenwang@seu.edu.cn Weigang Wang wangweigang@zjgsu.edu.cn Pengfei Zheng zpf2021@zhejianglab.com Hongyang Chen dr.h.chen@ieee.org

Denoising diffusion models have shown great potential in multiple research areas. Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges. Since majority heavy atoms in molecules allow connections to multiple atoms through single bonds, solely using pair-wise distance to model molecule geometries is insufficient. Therefore, the first one involves proposing an effective neural network as the denoising kernel that is capable to capture complex multi-body interatomic relationships and learn high-quality features. Due to the discrete nature of graphs, mainstream diffusion-based methods for molecules heavily rely on predefined rules and generate edges in an indirect manner. The second challenge involves accommodating molecule generation to diffusion and accurately predicting the existence of bonds. In our research, we view the iterative way of updating molecule conformations in diffusion process is consistent with molecular dynamics and introduce a novel molecule generation method named Geometric-Facilitated Molecular Diffusion (GFMDiff). For the first challenge, we introduce a Dual-track Transformer Network (DTN) to fully excevate global spatial relationships and learn high quality representations which contribute to accurate predictions of features and geometries. As for the second challenge, we design Geometric-facilitated Loss (GFLoss) which intervenes the formation of bonds during the training period, instead of directly embedding edges into the latent space. Comprehensive experiments on current benchmarks demonstrate the superiority of GFMDiff.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27788 GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking 2024-03-24T00:07:18-07:00 Shu Yin yinshu@mail.nwpu.edu.cn Peican Zhu ericcan@nwpu.edu.cn Lianwei Wu wlw@nwpu.edu.cn Chao Gao cgao@nwpu.edu.cn Zhen Wang w-zhen@nwpu.edu.cn

With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection. However, they primarily focus on content and do not consider social context during news propagation. Graph-based techniques have incorporated the social context but are limited by the need for large labeled datasets. To address these challenges, this paper introduces GAMC, an unsupervised fake news detection technique using the Graph Autoencoder with Masking and Contrastive learning. By leveraging both the context and content of news propagation as self-supervised signals, our method reduces the dependency on labeled datasets. Specifically, GAMC begins by applying data augmentation to the original news propagation graphs. Subsequently, these augmented graphs are encoded using a graph encoder and subsequently reconstructed via a graph decoder. Finally, a composite loss function that encompasses both reconstruction error and contrastive loss is designed. Firstly, it ensures the model can effectively capture the latent features, based on minimizing the discrepancy between reconstructed and original graph representations. Secondly, it aligns the representations of augmented graphs that originate from the same source. Experiments on the real-world dataset validate the effectiveness of our method.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27789 Unsupervised Gene-Cell Collective Representation Learning with Optimal Transport 2024-03-24T00:07:22-07:00 Jixiang Yu jixiang.yu@my.cityu.edu.hk Nanjun Chen nanjuchen2-c@my.cityu.edu.hk Ming Gao gm@dufe.edu.cn Xiangtao Li lixt314@jlu.edu.cn Ka-Chun Wong kc.w@cityu.edu.hk

Cell type identification plays a vital role in single-cell RNA sequencing (scRNA-seq) data analysis. Although many deep embedded methods to cluster scRNA-seq data have been proposed, they still fail in elucidating the intrinsic properties of cells and genes. Here, we present a novel end-to-end deep graph clustering model for single-cell transcriptomics data based on unsupervised Gene-Cell Collective representation learning and Optimal Transport (scGCOT) which integrates both cell and gene correlations. Specifically, scGCOT learns the latent embedding of cells and genes simultaneously and reconstructs the cell graph, the gene graph, and the gene expression count matrix. A zero-inflated negative binomial (ZINB) model is estimated via the reconstructed count matrix to capture the essential properties of scRNA-seq data. By leveraging the optimal transport-based joint representation alignment, scGCOT learns the clustering process and the latent representations through a mutually supervised self optimization strategy. Extensive experiments with 14 competing methods on 15 real scRNA-seq datasets demonstrate the competitive edges of scGCOT.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27790 MCSSME: Multi-Task Contrastive Learning for Semi-supervised Singing Melody Extraction from Polyphonic Music 2024-03-24T00:07:25-07:00 Shuai Yu shuai_yu@dhu.edu.cn

Singing melody extraction is an important task in the field of music information retrieval (MIR). The development of data-driven models for this task have achieved great successes. However, the existing models have two major limitations: firstly, most of the existing singing melody extraction models have formulated this task as a pixel-level prediction task. The lack of labeling data has limited the model for further improvements. Secondly, the generalization of the existing models are prone to be disturbed by the music genres. To address the issues mentioned above, in this paper, we propose a multi-Task contrastive learning framework for semi-supervised singing melody extraction, termed as MCSSME. Specifically, to deal with data scarcity limitation, we propose a self-consistency regularization (SCR) method to train the model on the unlabeled data. Transformations are applied to the raw signal of polyphonic music, which makes the network to improve its representation capability via recognizing the transformations. We further propose a novel multi-task learning (MTL) approach to jointly learn singing melody extraction and classification of transformed data. To deal with generalization limitation, we also propose a contrastive embedding learning, which strengthens the intra-class compactness and inter-class separability. To improve the generalization on different music genres, we also propose a domain classification method to learn task-dependent features by mapping data from different music genres to shared subspace. MCSSME evaluates on a set of well-known public melody extraction datasets with promising performances. The experimental results demonstrate the effectiveness of the MCSSME framework for singing melody extraction from polyphonic music using very limited labeled data scenarios.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27791 RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction 2024-03-24T00:07:26-07:00 Yemin Yu yeminyu.nb@gmail.com Luotian Yuan 3180105619@zju.edu.cn Ying Wei judyweiying@gmail.com Hanyu Gao hanyugao@ust.hk Fei Wu wufei@zju.edu.cn Zhihua Wang zhihua.wang@zju.edu.cn Xinhai Ye yexinhai@zju.edu.cn

Machine learning-assisted retrosynthesis prediction models have been gaining widespread adoption, though their performances oftentimes degrade significantly when deployed in real-world applications embracing out-of-distribution (OOD) molecules or reactions. Despite steady progress on standard benchmarks, our understanding of existing retrosynthesis prediction models under the premise of distribution shifts remains stagnant. To this end, we first formally sort out two types of distribution shifts in retrosynthesis prediction and construct two groups of benchmark datasets. Next, through comprehensive experiments, we systematically compare state-of-the-art retrosynthesis prediction models on the two groups of benchmarks, revealing the limitations of previous in-distribution evaluation and re-examining the advantages of each model. More remarkably, we are motivated by the above empirical insights to propose two model-agnostic techniques that can improve the OOD generalization of arbitrary off-the-shelf retrosynthesis prediction algorithms. Our preliminary experiments show their high potential with an average performance improvement of 4.6%, and the established benchmarks serve as a foothold for further retrosynthesis prediction research towards OOD generalization.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27792 Designing Biological Sequences without Prior Knowledge Using Evolutionary Reinforcement Learning 2024-03-24T00:07:28-07:00 Xi Zeng xizeng@mail.nwpu.edu.cn Xiaotian Hao xiaotianhao@tju.edu.cn Hongyao Tang bluecontra@tju.edu.cn Zhentao Tang tangzhentao1@huawei.com Shaoqing Jiao jiaosq22@mail.nwpu.edu.cn Dazhi Lu dzlu@mail.nwpu.edu.cn Jiajie Peng jiajiepeng@nwpu.edu.cn

Designing novel biological sequences with desired properties is a significant challenge in biological science because of the extra large search space. The traditional design process usually involves multiple rounds of costly wet lab evaluations. To reduce the need for expensive wet lab experiments, machine learning methods are used to aid in designing biological sequences. However, the limited availability of biological sequences with known properties hinders the training of machine learning models, significantly restricting their applicability and performance. To fill this gap, we present ERLBioSeq, an Evolutionary Reinforcement Learning algorithm for BIOlogical SEQuence design. ERLBioSeq leverages the capability of reinforcement learning to learn without prior knowledge and the potential of evolutionary algorithms to enhance the exploration of reinforcement learning in the large search space of biological sequences. Additionally, to enhance the efficiency of biological sequence design, we developed a predictor for sequence screening in the biological sequence design process, which incorporates both the local and global sequence information. We evaluated the proposed method on three main types of biological sequence design tasks, including the design of DNA, RNA, and protein. The results demonstrate that the proposed method achieves significant improvement compared to the existing state-of-the-art methods.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27793 Adversarial Socialbots Modeling Based on Structural Information Principles 2024-03-24T00:07:30-07:00 Xianghua Zeng xiaozeng721@gmail.com Hao Peng penghao@buaa.edu.cn Angsheng Li angsheng@buaa.edu.cn

The importance of effective detection is underscored by the fact that socialbots imitate human behavior to propagate misinformation, leading to an ongoing competition between socialbots and detectors. Despite the rapid advancement of reactive detectors, the exploration of adversarial socialbot modeling remains incomplete, significantly hindering the development of proactive detectors. To address this issue, we propose a mathematical Structural Information principles-based Adversarial Socialbots Modeling framework, namely SIASM, to enable more accurate and effective modeling of adversarial behaviors. First, a heterogeneous graph is presented to integrate various users and rich activities in the original social network and measure its dynamic uncertainty as structural entropy. By minimizing the high-dimensional structural entropy, a hierarchical community structure of the social network is generated and referred to as the optimal encoding tree. Secondly, a novel method is designed to quantify influence by utilizing the assigned structural entropy, which helps reduce the computational cost of SIASM by filtering out uninfluential users. Besides, a new conditional structural entropy is defined between the socialbot and other users to guide the follower selection for network influence maximization. Extensive and comparative experiments on both homogeneous and heterogeneous social networks demonstrate that, compared with state-of-the-art baselines, the proposed SIASM framework yields substantial performance improvements in terms of network influence (up to 16.32%) and sustainable stealthiness (up to 16.29%) when evaluated against a robust detector with 90% accuracy.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27794 NondBREM: Nondeterministic Offline Reinforcement Learning for Large-Scale Order Dispatching 2024-03-24T00:07:32-07:00 Hongbo Zhang zhanghongbo@mail.ustc.edu.cn Guang Wang guang@cs.fsu.edu Xu Wang wx309@mail.ustc.edu.cn Zhengyang Zhou zzy0929@ustc.edu.cn Chen Zhang zhangchenzc@mail.ustc.edu.cn Zheng Dong dong@wayne.edu Yang Wang angyan@ustc.edu.cn

One of the most important tasks in ride-hailing is order dispatching, i.e., assigning unserved orders to available drivers. Recent order dispatching has achieved a significant improvement due to the advance of reinforcement learning, which has been approved to be able to effectively address sequential decision-making problems like order dispatching. However, most existing reinforcement learning methods require agents to learn the optimal policy by interacting with environments online, which is challenging or impractical for real-world deployment due to high costs or safety concerns. For example, due to the spatiotemporally unbalanced supply and demand, online reinforcement learning-based order dispatching may significantly impact the revenue of the ride-hailing platform and passenger experience during the policy learning period. Hence, in this work, we develop an offline deep reinforcement learning framework called NondBREM for large-scale order dispatching, which learns policy from only the accumulated logged data to avoid costly and unsafe interactions with the environment. In NondBREM, a Nondeterministic Batch-Constrained Q-learning (NondBCQ) module is developed to reduce the algorithm extrapolation error and a Random Ensemble Mixture (REM) module that integrates multiple value networks with multi-head networks is utilized to improve the model generalization and robustness. Extensive experiments on large-scale real-world ride-hailing datasets show the superiority of our design.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27795 Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery 2024-03-24T00:07:34-07:00 Jialu Zhang sgxjz1@nottingham.edu.cn Xiaoying Yang scxxy1@nottingham.edu.cn Wentao He scxwh1@nottingham.edu.cn Jianfeng Ren jianfeng.ren@nottingham.edu.cn Qian Zhang qian.zhang@nottingham.edu.cn Yitian Zhao yitian.zhao@nimte.ac.cn Ruibin Bai ruibin.bai@nottingham.edu.cn Xiangjian He xiangjian.he@gmail.com Jiang Liu liuj@sustech.edu.cn

Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods. Code is available at https://github.com/UNNC-CV/EvOD/.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27796 Adversarial Attacks on Federated-Learned Adaptive Bitrate Algorithms 2024-03-24T00:07:37-07:00 Rui-Xiao Zhang zhangrx17@mails.tsinghua.edu.cn Tianchi Huang htc19@mails.tsinghua.edu.cn

Learning-based adaptive bitrate (ABR) algorithms have revolutionized video streaming solutions. With the growing demand for data privacy and the rapid development of mobile devices, federated learning (FL) has emerged as a popular training method for neural ABR algorithms in both academia and industry. However, we have discovered that FL-based ABR models are vulnerable to model-poisoning attacks as local updates remain unseen during global aggregation. In response, we propose MAFL (Malicious ABR model based on Federated Learning) to prove that backdooring the learning-based ABR model via FL is practical. Instead of attacking the global policy, MAFL only targets a single ``target client''. Moreover, the unique challenges brought by deep reinforcement learning (DRL) make the attack even more challenging. To address these challenges, MAFL is designed with a two-stage attacking mechanism. Using two representative attack cases with real-world traces, we show that MAFL significantly degrades the model performance on the target client (i.e., increasing rebuffering penalty by 2x and 5x) with a minimal negative impact on benign clients.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27797 Generalize for Future: Slow and Fast Trajectory Learning for CTR Prediction 2024-03-24T00:07:38-07:00 Jian Zhu lbjbx@zju.edu.cn Congcong Liu cliubh@connect.ust.hk Xue Jiang jiangxue@jd.com Changping Peng pengchangping@jd.com Zhangang Lin linzhangang@jd.com Jingping Shao shaojingping@jd.com

Deep neural networks (DNNs) have achieved significant advancements in click-through rate (CTR) prediction by demonstrating strong generalization on training data. However, in real-world scenarios, the assumption of independent and identically distributed (i.i.d.) conditions, which is fundamental to this problem, is often violated due to temporal distribution shifts. This violation can lead to suboptimal model performance when optimizing empirical risk without access to future data, resulting in overfitting on the training data and convergence to a single sharp minimum. To address this challenge, we propose a novel model updating framework called Slow and Fast Trajectory Learning (SFTL) network. SFTL aims to mitigate the discrepancy between past and future domains while quickly adapting to recent changes in small temporal drifts. This mechanism entails two interactions among three complementary learners: (i) the Working Learner, which updates model parameters using modern optimizers (e.g., Adam, Adagrad) and serves as the primary learner in the recommendation system, (ii) the Slow Learner, which is updated in each temporal domain by directly assigning the model weights of the working learner, and (iii) the Fast Learner, which is updated in each iteration by assigning exponentially moving average weights of the working learner. Additionally, we propose a novel rank-based trajectory loss to facilitate interaction between the working learner and trajectory learner, aiming to adapt to temporal drift and enhance performance in the current domain compared to the past. We provide theoretical understanding and conduct extensive experiments on real-world CTR prediction datasets to validate the effectiveness and efficiency of SFTL in terms of both convergence speed and model performance. The results demonstrate the superiority of SFTL over existing approaches.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27798 Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models 2024-03-24T00:07:41-07:00 Yuqi Zhu zhuyuqi1997@126.com Jia Li lijia@stu.pku.edu.cn Ge Li lige@pku.edu.cn YunFei Zhao zhaoyunfei@pku.edu.cn Jia Li lijiaa@pku.edu.cn Zhi Jin zhijin@pku.edu.cn Hong Mei meih@pku.edu.cn

Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27799 Operationalizing Essential Characteristics of Creativity in a Computational System for Music Composition 2024-03-24T00:07:45-07:00 Paul M. Bodily bodipaul@isu.edu Dan Ventura ventura@cs.byu.edu

We address the problem of building and evaluating a computational system whose primary objective is creativity. We illustrate seven characteristics for computational creativity in the context of a system that autonomously composes Western lyrical music. We conduct an external evaluation of the system in which respondents rated the system with regard to each characteristic as well as with regard to overall creativity. Average scores for overall creativity exceeded the ratings for any single characteristic, suggesting that creativity may be an emergent property and that unique research opportunities exist for building CC systems whose design attempts to comprehend all known characteristics of creativity.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27800 Neural Reasoning about Agents’ Goals, Preferences, and Actions 2024-03-24T00:07:46-07:00 Matteo Bortoletto matteo.bortoletto@vis.uni-stuttgart.de Lei Shi lei.shi@vis.uni-stuttgart.de Andreas Bulling andreas.bulling@vis.uni-stuttgart.de

We propose the Intuitive Reasoning Network (IRENE) - a novel neural model for intuitive psychological reasoning about agents' goals, preferences, and actions that can generalise previous experiences to new situations. IRENE combines a graph neural network for learning agent and world state representations with a transformer to encode the task context. When evaluated on the challenging Baby Intuitions Benchmark, IRENE achieves new state-of-the-art performance on three out of its five tasks - with up to 48.9% improvement. In contrast to existing methods, IRENE is able to bind preferences to specific agents, to better distinguish between rational and irrational agents, and to better understand the role of blocking obstacles. We also investigate, for the first time, the influence of the training tasks on test performance. Our analyses demonstrate the effectiveness of IRENE in combining prior knowledge gained during training for unseen evaluation tasks.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27801 An Empirical Study of CLIP for Text-Based Person Search 2024-03-24T00:07:48-07:00 Min Cao caomin0719@126.com Yang Bai ybaibyougert@stu.suda.edu.cn Ziyin Zeng 20225227091@stu.suda.edu.cn Mang Ye mangye16@gmail.com Min Zhang minzhang@suda.edu.cn

Text-based Person Search (TBPS) aims to retrieve the person images using natural language descriptions. Recently, Contrastive Language Image Pretraining (CLIP), a universal large cross-modal vision-language pre-training model, has remarkably performed over various cross-modal downstream tasks due to its powerful cross-modal semantic learning capacity. TPBS, as a fine-grained cross-modal retrieval task, is also facing the rise of research on the CLIP-based TBPS. In order to explore the potential of the visual-language pre-training model for downstream TBPS tasks, this paper makes the first attempt to conduct a comprehensive empirical study of CLIP for TBPS and thus contribute a straightforward, incremental, yet strong TBPS-CLIP baseline to the TBPS community. We revisit critical design considerations under CLIP, including data augmentation and loss function. The model, with the aforementioned designs and practical training tricks, can attain satisfactory performance without any sophisticated modules. Also, we conduct the probing experiments of TBPS-CLIP in model generalization and model compression, demonstrating the effectiveness of TBPS-CLIP from various aspects. This work is expected to provide empirical insights and highlight future CLIP-based TBPS research.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27802 Social Physics Informed Diffusion Model for Crowd Simulation 2024-03-24T00:07:50-07:00 Hongyi Chen chenhy23@mails.tsinghua.edu.cn Jingtao Ding dingjt15@tsinghua.org.cn Yong Li liyong07@tsinghua.edu.cn Yue Wang wangyue@mail.tsinghua.edu.cn Xiao-Ping Zhang xzhang@ryerson.ca

Crowd simulation holds crucial applications in various domains, such as urban planning, architectural design, and traffic arrangement. In recent years, physics-informed machine learning methods have achieved state-of-the-art performance in crowd simulation but fail to model the heterogeneity and multi-modality of human movement comprehensively. In this paper, we propose a social physics-informed diffusion model named SPDiff to mitigate the above gap. SPDiff takes both the interactive and historical information of crowds in the current timeframe to reverse the diffusion process, thereby generating the distribution of pedestrian movement in the subsequent timeframe. Inspired by the well-known social physics model, i.e., Social Force, regarding crowd dynamics, we design a crowd interaction encoder to guide the denoising process and further enhance this module with the equivariant properties of crowd interactions. To mitigate error accumulation in long-term simulations, we propose a multi-frame rollout training algorithm for diffusion modeling. Experiments conducted on two real-world datasets demonstrate the superior performance of SPDiff in terms of both macroscopic and microscopic evaluation metrics. Code and appendix are available at https://github.com/tsinghua-fib-lab/SPDiff.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27803 Trend-Aware Supervision: On Learning Invariance for Semi-supervised Facial Action Unit Intensity Estimation 2024-03-24T00:07:51-07:00 Yingjie Chen chenyingjie@pku.edu.cn Jiarui Zhang zjr954@pku.edu.cn Tao Wang wangtao@pku.edu.cn Yun Liang ericlyun@pku.edu.cn

With the increasing need for facial behavior analysis, semi-supervised AU intensity estimation using only keyframe annotations has emerged as a practical and effective solution to relieve the burden of annotation. However, the lack of annotations makes the spurious correlation problem caused by AU co-occurrences and subject variation much more prominent, leading to non-robust intensity estimation that is entangled among AUs and biased among subjects. We observe that trend information inherent in keyframe annotations could act as extra supervision and raising the awareness of AU-specific facial appearance changing trends during training is the key to learning invariant AU-specific features. To this end, we propose Trend-AwareSupervision (TAS), which pursues three kinds of trend awareness, including intra-trend ranking awareness, intra-trend speed awareness, and inter-trend subject awareness. TAS alleviates the spurious correlation problem by raising trend awareness during training to learn AU-specific features that represent the corresponding facial appearance changes, to achieve intensity estimation invariance. Experiments conducted on two commonly used AU benchmark datasets, BP4D and DISFA, show the effectiveness of each kind of awareness. And under trend-aware supervision, the performance can be improved without extra computational or storage costs during inference.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27804 Enhancing the Robustness of Spiking Neural Networks with Stochastic Gating Mechanisms 2024-03-24T00:07:54-07:00 Jianhao Ding djh01998@stu.pku.edu.cn Zhaofei Yu yuzf12@pku.edu.cn Tiejun Huang tjhuang@pku.edu.cn Jian K. Liu j.liu.22@bham.ac.uk

Spiking neural networks (SNNs) exploit neural spikes to provide solutions for low-power intelligent applications on neuromorphic hardware. Although SNNs have high computational efficiency due to spiking communication, they still lack resistance to adversarial attacks and noise perturbations. In the brain, neuronal responses generally possess stochasticity induced by ion channels and synapses, while the role of stochasticity in computing tasks is poorly understood. Inspired by this, we elaborate a stochastic gating spiking neural model for layer-by-layer spike communication, introducing stochasticity to SNNs. Through theoretical analysis, our gating model can be viewed as a regularizer that prevents error amplification under attacks. Meanwhile, our work can explain the robustness of Poisson coding. Experimental results prove that our method can be used alone or with existing robust enhancement algorithms to improve SNN robustness and reduce SNN energy consumption. We hope our work will shed new light on the role of stochasticity in the computation of SNNs. Our code is available at https://github.com/DingJianhao/StoG-meets-SNN/.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27805 Imitation of Life: A Search Engine for Biologically Inspired Design 2024-03-24T00:07:56-07:00 Hen Emuna hen.emuna@mail.huji.ac.il Nadav Borenstein nadav.borenstein@di.ku.dk Xin Qian xinq@umd.edu Hyeonsu Kang hyeonsuk@cs.cmu.edu Joel Chan joelchan@umd.edu Aniket Kittur nkittur@cs.cmu.edu Dafna Shahaf dshahaf@cs.huji.ac.il

Biologically Inspired Design (BID), or Biomimicry, is a problem-solving methodology that applies analogies from nature to solve engineering challenges. For example, Speedo engineers designed swimsuits based on shark skin. Finding relevant biological solutions for real-world problems poses significant challenges, both due to the limited biological knowledge engineers and designers typically possess and to the limited BID resources. Existing BID datasets are hand-curated and small, and scaling them up requires costly human annotations. In this paper, we introduce BARcode (Biological Analogy Retriever), a search engine for automatically mining bio-inspirations from the web at scale. Using advances in natural language understanding and data programming, BARcode identifies potential inspirations for engineering challenges. Our experiments demonstrate that BARcode can retrieve inspirations that are valuable to engineers and designers tackling real-world problems, as well as recover famous historical BID examples. We release data and code; we view BARcode as a step towards addressing the challenges that have historically hindered the practical application of BID to engineering innovation.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27806 An Efficient Knowledge Transfer Strategy for Spiking Neural Networks from Static to Event Domain 2024-03-24T00:07:59-07:00 Xiang He hexiang2021@ia.ac.cn Dongcheng Zhao zhaodongcheng2016@ia.ac.cn Yang Li liyang2019@ia.ac.cn Guobin Shen shenguobin2021@ia.ac.cn Qingqun Kong qingqun.kong@ia.ac.cn Yi Zeng yi.zeng@ia.ac.cn

Spiking neural networks (SNNs) are rich in spatio-temporal dynamics and are suitable for processing event-based neuromorphic data. However, event-based datasets are usually less annotated than static datasets. This small data scale makes SNNs prone to overfitting and limits their performance. In order to improve the generalization ability of SNNs on event-based datasets, we use static images to assist SNN training on event data. In this paper, we first discuss the domain mismatch problem encountered when directly transferring networks trained on static datasets to event data. We argue that the inconsistency of feature distributions becomes a major factor hindering the effective transfer of knowledge from static images to event data. To address this problem, we propose solutions in terms of two aspects: feature distribution and training strategy. Firstly, we propose a knowledge transfer loss, which consists of domain alignment loss and spatio-temporal regularization. The domain alignment loss learns domain-invariant spatial features by reducing the marginal distribution distance between the static image and the event data. Spatio-temporal regularization provides dynamically learnable coefficients for domain alignment loss by using the output features of the event data at each time step as a regularization term. In addition, we propose a sliding training strategy, which gradually replaces static image inputs probabilistically with event data, resulting in a smoother and more stable training for the network. We validate our method on neuromorphic datasets, including N-Caltech101, CEP-DVS, and N-Omniglot. The experimental results show that our proposed method achieves better performance on all datasets compared to the current state-of-the-art methods. Code is available at https://github.com/Brain-Cog-Lab/Transfer-for-DVS.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27807 Responding to the Call: Exploring Automatic Music Composition Using a Knowledge-Enhanced Model 2024-03-24T00:08:02-07:00 Zhejing Hu 19045203r@connect.polyu.hk Yan Liu yan.liu@polyu.edu.hk Gong Chen gong-cg.chen@polyu.edu.hk Xiao Ma xiao1.ma@polyu.edu.hk Shenghua Zhong csshzhong@szu.edu.cn Qianwen Luo qianwluo@gmail.com

Call-and-response is a musical technique that enriches the creativity of music, crafting coherent musical ideas that mirror the back-and-forth nature of human dialogue with distinct musical characteristics. Although this technique is integral to numerous musical compositions, it remains largely uncharted in automatic music composition. To enhance the creativity of machine-composed music, we first introduce the Call-Response Dataset (CRD) containing 19,155 annotated musical pairs and crafted comprehensive objective evaluation metrics for musical assessment. Then, we design a knowledge-enhanced learning-based method to bridge the gap between human and machine creativity. Specifically, we train the composition module using the call-response pairs, supplementing it with musical knowledge in terms of rhythm, melody, and harmony. Our experimental results underscore that our proposed model adeptly produces a wide variety of creative responses for various musical calls.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27808 Neural Amortized Inference for Nested Multi-Agent Reasoning 2024-03-24T00:08:03-07:00 Kunal Jha kunal.a.jha.24@dartmouth.edu Tuan Anh Le tuananhl@google.com Chuanyang Jin cj2133@nyu.edu Yen-Ling Kuo ylkuo@mit.edu Joshua B. Tenenbaum jbt@mit.edu Tianmin Shu tshu@mit.edu

Multi-agent interactions, such as communication, teaching, and bluffing, often rely on higher-order social inference, i.e., understanding how others infer oneself. Such intricate reasoning can be effectively modeled through nested multi-agent reasoning. Nonetheless, the computational complexity escalates exponentially with each level of reasoning, posing a significant challenge. However, humans effortlessly perform complex social inferences as part of their daily lives. To bridge the gap between human-like inference capabilities and computational limitations, we propose a novel approach: leveraging neural networks to amortize high-order social inference, thereby expediting nested multi-agent reasoning. We evaluate our method in two challenging multi-agent interaction domains. The experimental results demonstrate that our method is computationally efficient while exhibiting minimal degradation in accuracy.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27809 Hidden Follower Detection: How Is the Gaze-Spacing Pattern Embodied in Frequency Domain? 2024-03-24T00:08:05-07:00 Shu Li shuli@stu.xidian.edu.cn Ruimin Hu rmhu@xidian.edu.cn Suhui Li suhuili@stu.xidian.edu.cn Liang Liao liang.liao@ntu.edu.sg

Spatiotemporal social behavior analysis is a technique that studies the social behavior patterns of objects and estimates their risks based on their trajectories. In social public scenarios such as train stations, hidden following behavior has become one of the most challenging issues due to its probability of evolving into violent events, which is more than 25%. In recent years, research on hidden following detection (HFD) has focused on differences in time series between hidden followers and normal pedestrians under two temporal characteristics: gaze and spatial distance. However, the time-domain representation for time series is irreversible and usually causes the loss of critical information. In this paper, we deeply study the expression efficiency of time/frequency domain features of time series, by exploring the recovery mechanism of features to source time series, we establish a fidelity estimation method for feature expression and a selection model for frequency-domain features based on the signal-to-distortion ratio (SDR). Experimental results demonstrate the feature fidelity of time series and HFD performance are positively correlated, and the fidelity of frequency-domain features and HFD performance are significantly better than the time-domain features. On both real and simulated datasets, the accuracy of the proposed method is increased by 3%, and the gaze-only module is improved by 10%. Related research has explored new methods for optimal feature selection based on fidelity, new patterns for efficient feature expression of hidden following behavior, and the mechanism of multimodal collaborative identification.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27810 Music Style Transfer with Time-Varying Inversion of Diffusion Models 2024-03-24T00:08:12-07:00 Sifei Li lisifei2022@ia.ac.cn Yuxin Zhang zhangyuxin2020@ia.ac.cn Fan Tang tfan.108@gmail.com Chongyang Ma chongyangm@gmail.com Weiming Dong weiming.dong@ia.ac.cn Changsheng Xu csxu@nlpr.ia.ac.cn

With the development of diffusion models, text-guided image style transfer has demonstrated great controllable and high-quality results. However, the utilization of text for diverse music style transfer poses significant challenges, primarily due to the limited availability of matched audio-text datasets. Music, being an abstract and complex art form, exhibits variations and intricacies even within the same genre, thereby making accurate textual descriptions challenging. This paper presents a music style transfer approach that effectively captures musical attributes using minimal data. We introduce a novel time-varying textual inversion module to precisely capture mel-spectrogram features at different levels. During inference, we utilize a bias-reduced stylization technique to get stable results. Experimental results demonstrate that our method can transfer the style of specific instruments, as well as incorporate natural sounds to compose melodies. Samples and code are available at https://lsfhuihuiff.github.io/MusicTI/.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27811 A Brain-Inspired Way of Reducing the Network Complexity via Concept-Regularized Coding for Emotion Recognition 2024-03-24T00:08:14-07:00 Han Lu hlu20@fudan.edu.cn Xiahai Zhuang zxh@fudan.edu.cn Qiang Luo qluo@fudan.edu.cn

The human brain can effortlessly and reliably perceive emotions, whereas existing facial emotion recognition (FER) methods suffer from drawbacks such as complex model structures, high storage requirements, and poor interpretability. Inspired by the role of emotion concepts in visual perception coding within the human brain, we propose a dual-pathway framework emulating the neural computation of emotion recognition. Specifically, these two pathways are designed to model the representation of emotion concepts in the brain and the visual perception process, respectively. For the former, we adopt a disentangled approach to extract emotion concepts from complex facial geometric attributes; for the latter, we employ an emotional confidence evaluation strategy to determine which concept is optimal for regularizing the perceptual coding. The proposed concept-regularized coding strategy endows the framework with flexibility and interpretability as well as good performances on several benchmarking FER datasets.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27812 Multi-Energy Guided Image Translation with Stochastic Differential Equations for Near-Infrared Facial Expression Recognition 2024-03-24T00:08:16-07:00 Bingjun Luo luobingjun@gmail.com Zewen Wang wang-zw19@mails.tsinghua.edu.cn Jinpeng Wang wjp21@mails.tsinghua.edu.cn Junjie Zhu zhujj18@mails.tsinghua.edu.cn Xibin Zhao zxb@mails.tsinghua.edu.cn Yue Gao gaoyue@mails.tsinghua.edu.cn

Illumination variation has been a long-term challenge in real-world facial expression recognition (FER). Under uncontrolled or non-visible light conditions, near-infrared (NIR) can provide a simple and alternative solution to obtain high-quality images and supplement the geometric and texture details that are missing in the visible (VIS) domain. Due to the lack of large-scale NIR facial expression datasets, directly extending VIS FER methods to the NIR spectrum may be ineffective. Additionally, previous heterogeneous image synthesis methods are restricted by low controllability without prior task knowledge. To tackle these issues, we present the first approach, called for NIR-FER Stochastic Differential Equations (NFER-SDE), that transforms face expression appearance between heterogeneous modalities to the overfitting problem on small-scale NIR data. NFER-SDE can take the whole VIS source image as input and, together with domain-specific knowledge, guide the preservation of modality-invariant information in the high-frequency content of the image. Extensive experiments and ablation studies show that NFER-SDE significantly improves the performance of NIR FER and achieves state-of-the-art results on the only two available NIR FER datasets, Oulu-CASIA and Large-HFE.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27813 Successive POI Recommendation via Brain-Inspired Spatiotemporal Aware Representation 2024-03-24T00:08:18-07:00 Gehua Ma gene_magh@icloud.com He Wang hewang@tju.edu.cn Jingyuan Zhao jing-yuan.zhao@capgemini.com Rui Yan ryan@zjut.edu.cn Huajin Tang htang@zju.edu.cn

Existing approaches usually perform spatiotemporal representation in the spatial and temporal dimensions, respectively, which isolates the spatial and temporal natures of the target and leads to sub-optimal embeddings. Neuroscience research has shown that the mammalian brain entorhinal-hippocampal system provides efficient graph representations for general knowledge. Moreover, entorhinal grid cells present concise spatial representations, while hippocampal place cells represent perception conjunctions effectively. Thus, the entorhinal-hippocampal system provides a novel angle for spatiotemporal representation, which inspires us to propose the SpatioTemporal aware Embedding framework (STE) and apply it to POIs (STEP). STEP considers two types of POI-specific representations: sequential representation and spatiotemporal conjunctive representation, learned using sparse unlabeled data based on the proposed graph-building policies. Notably, STEP jointly represents the spatiotemporal natures of POIs using both observations and contextual information from integrated spatiotemporal dimensions by constructing a spatiotemporal context graph. Furthermore, we introduce a successive POI recommendation method using STEP, which achieves state-of-the-art performance on two benchmarks. In addition, we demonstrate the excellent performance of the STE representation approach in other spatiotemporal representation-centered tasks through a case study of the traffic flow prediction problem. Therefore, this work provides a novel solution to spatiotemporal representation and paves a new way for spatiotemporal modeling-related tasks.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27814 BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind 2024-03-24T00:08:20-07:00 Yuanyuan Mao 1937927717@qq.com Xin Lin xlin@cs.ecnu.edu.cn Qin Ni niqin@shnu.edu.cn Liang He lhe@cs.ecnu.edu.cn

As a foundational component of cognitive intelligence, theory of mind (ToM) can make AI more closely resemble human thought processes, thereby enhancing their interaction and collaboration with human. In particular, it can significantly improve a model's comprehension of videos in complex scenes. However, current video question answer (VideoQA) datasets focus on studying causal reasoning within events, few of them genuinely incorporating human ToM. Consequently, there is a lack of development in ToM reasoning tasks within the area of VideoQA. This paper presents BDIQA, the first benchmark to explore the cognitive reasoning capabilities of VideoQA models in the context of ToM. BDIQA is inspired by the cognitive development of children's ToM and addresses the current deficiencies in machine ToM within datasets and tasks. Specifically, it offers tasks at two difficulty levels, assessing Belief, Desire and Intention (BDI) reasoning in both simple and complex scenarios. We conduct evaluations on several mainstream methods of VideoQA and diagnose their capabilities with zero-shot, few-shot and supervised learning. We find that the performance of pre-trained models on cognitive reasoning tasks remains unsatisfactory. To counter this challenge, we undertake thorough analysis and experimentation, ultimately presenting two guidelines to enhance cognitive reasoning derived from ablation analysis.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27815 Unveiling the Significance of Toddler-Inspired Reward Transition in Goal-Oriented Reinforcement Learning 2024-03-24T00:08:22-07:00 Junseok Park jspark227@snu.ac.kr Yoonsung Kim yskim227@snu.ac.kr Hee bin Yoo hbyoo@bi.snu.ac.kr Min Whoo Lee mwlee@bi.snu.ac.kr Kibeom Kim kbkim@bi.snu.ac.kr Won-Seok Choi wchoi@bi.snu.ac.kr Minsu Lee mslee@bi.snu.ac.kr Byoung-Tak Zhang btzhang@bi.snu.ac.kr

Toddlers evolve from free exploration with sparse feedback to exploiting prior experiences for goal-directed learning with denser rewards. Drawing inspiration from this Toddler-Inspired Reward Transition, we set out to explore the implications of varying reward transitions when incorporated into Reinforcement Learning (RL) tasks. Central to our inquiry is the transition from sparse to potential-based dense rewards, which share optimal strategies regardless of reward changes. Through various experiments, including those in egocentric navigation and robotic arm manipulation tasks, we found that proper reward transitions significantly influence sample efficiency and success rates. Of particular note is the efficacy of the toddler-inspired Sparse-to-Dense (S2D) transition. Beyond these performance metrics, using Cross-Density Visualizer technique, we observed that transitions, especially the S2D, smooth the policy loss landscape, promoting wide minima that enhance generalization in RL models.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27816 Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks 2024-03-24T00:08:24-07:00 Xuerui Qiu sherry.qiu@std.uestc.edu.cn Rui-Jie Zhu ridger@std.uestc.edu.cn Yuhong Chou moyufang2019@gmail.com Zhaorui Wang zhaorui_wang@std.uestc.edu.cn Liang-Jian Deng liangjian.deng@uestc.edu.cn Guoqi Li guoqi.li@ia.ac.cn

Spiking neural networks (SNNs) are emerging as an energy-efficient alternative to traditional artificial neural networks (ANNs) due to their unique spike-based event-driven nature. Coding is crucial in SNNs as it converts external input stimuli into spatio-temporal feature sequences. However, most existing deep SNNs rely on direct coding that generates powerless spike representation and lacks the temporal dynamics inherent in human vision. Hence, we introduce Gated Attention Coding (GAC), a plug-and-play module that leverages the multi-dimensional gated attention unit to efficiently encode inputs into powerful representations before feeding them into the SNN architecture. GAC functions as a preprocessing layer that does not disrupt the spike-driven nature of the SNN, making it amenable to efficient neuromorphic hardware implementation with minimal modifications. Through an observer model theoretical analysis, we demonstrate GAC's attention mechanism improves temporal dynamics and coding efficiency. Experiments on CIFAR10/100 and ImageNet datasets demonstrate that GAC achieves state-of-the-art accuracy with remarkable efficiency. Notably, we improve top-1 accuracy by 3.10% on CIFAR100 with only 6-time steps and 1.07% on ImageNet while reducing energy usage to 66.9% of the previous works. To our best knowledge, it is the first time to explore the attention-based dynamic coding scheme in deep SNNs, with exceptional effectiveness and efficiency on large-scale datasets. Code is available at https://github.com/bollossom/GAC.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27817 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning 2024-03-24T00:08:26-07:00 Jiangrong Shen jrshen@zju.edu.cn Wenyao Ni 22221209@zju.edu.cn Qi Xu xuqi@dlut.edu.cn Huajin Tang htang@zju.edu.cn

The next generation of machine intelligence requires the capability of continual learning to acquire new knowledge without forgetting the old one while conserving limited computing resources. Spiking neural networks (SNNs), compared to artificial neural networks (ANNs), have more characteristics that align with biological neurons, which may be helpful as a potential gating function for knowledge maintenance in neural networks. Inspired by the selective sparse activation principle of context gating in biological systems, we present a novel SNN model with selective activation to achieve continual learning. The trace-based K-Winner-Take-All (K-WTA) and variable threshold components are designed to form the sparsity in selective activation in spatial and temporal dimensions of spiking neurons, which promotes the subpopulation of neuron activation to perform specific tasks. As a result, continual learning can be maintained by routing different tasks via different populations of neurons in the network. The experiments are conducted on MNIST and CIFAR10 datasets under the class incremental setting. The results show that the proposed SNN model achieves competitive performance similar to and even surpasses the other regularization-based methods deployed under traditional ANNs.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27818 Boosting Neural Cognitive Diagnosis with Student’s Affective State Modeling 2024-03-24T00:08:28-07:00 Shanshan Wang wang.shanshan@ahu.edu.cn Zhen Zeng q21201141@stu.ahu.edu.cn Xun Yang xyang21@ustc.edu.cn Ke Xu xuke@ahu.edu.cn Xingyi Zhang xyzhanghust@gmail.com

Cognitive Diagnosis Modeling aims to infer students' proficiency level on knowledge concepts from their response logs. Existing methods typically model students’ response processes as the interaction between students and exercises or concepts based on hand-crafted or deeply-learned interaction functions. Despite their promising achievements, they fail to consider the relationship between students' cognitive states and affective states in learning, e.g., the feelings of frustration, boredom, or confusion with the learning content, which is insufficient for comprehensive cognitive diagnosis in intelligent education. To fill the research gap, we propose a novel Affect-aware Cognitive Diagnosis (ACD) model which can effectively diagnose the knowledge proficiency levels of students by taking into consideration the affective factors. Specifically, we first design a student affect perception module under the assumption that the affective state is jointly influenced by the student's affect trait and the difficulty of the exercise. Then, our inferred affective distribution is further used to estimate the student's subjective factors, i.e., guessing and slipping, respectively. Finally, we integrate the estimated guessing and slipping parameters with the basic neural cognitive diagnosis framework based on the DINA model, which facilitates the modeling of complex exercising interactions in a more accurate and interpretable fashion. Besides, we also extend our affect perception module in an unsupervised learning setting based on contrastive learning, thus significantly improving the compatibility of our ACD. To the best of our knowledge, we are the first to unify the cognition modeling and affect modeling into the same framework for student cognitive diagnosis. Extensive experiments on real-world datasets clearly demonstrate the effectiveness of our ACD. Our code is available at https://github.com/zeng-zhen/ACD.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27819 DMMR: Cross-Subject Domain Generalization for EEG-Based Emotion Recognition via Denoising Mixed Mutual Reconstruction 2024-03-24T00:08:31-07:00 Yiming Wang yimingwang@stu.xjtu.edu.cn Bin Zhang bzhang82@xjtu.edu.cn Yujiao Tang 3121358009@stu.xjtu.edu.cn

Electroencephalography (EEG) has proven to be effective in emotion analysis. However, current methods struggle with individual variations, complicating the generalization of models trained on data from source subjects to unseen target subjects. To tackle this issue, we propose the Denoising Mixed Mutual Reconstruction (DMMR) model, employing a two-stage pre-training followed by fine-tuning approach. During the pre-training phase, DMMR leverages self-supervised learning through a multi-decoder autoencoder, which encodes and reconstructs features of one subject, aiming to generate features resembling those from other subjects within the same category, thereby encouraging the encoder to learn subject-invariant features. We introduce a hidden-layer mixed data augmentation approach to mitigate the limitations posed by the scarcity of source data, thereby extending the method to a two-stage process. To bolster stability against noise, we incorporate a noise injection method, named “Time Steps Shuffling”, into the input data. During the fine-tuning phase, an emotion classifier is integrated to extract emotion-related features. Experimental accuracy on the SEED and SEED-IV datasets reached 88.27% (±5.62) and 72.70% (±8.01), respectively, demonstrating state-of-the-art and comparable performance, thereby showcasing the superiority of DMMR. The proposed data augmentation and noise injection methods were observed to complementarily enhance accuracy and stability, thus alleviating the aforementioned issues.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27820 Transient Glimpses: Unveiling Occluded Backgrounds through the Spike Camera 2024-03-24T00:08:33-07:00 Jiyuan Zhang jyzhang@stu.pku.edu.cn Shiyan Chen 2001212818@stu.pku.edu.cn Yajing Zheng yj.zheng@pku.edu.cn Zhaofei Yu yuzf12@pku.edu.cn Tiejun Huang tjhuang@pku.edu.cn

The de-occlusion problem, involving extracting clear background images by removing foreground occlusions, holds significant practical importance but poses considerable challenges. Most current research predominantly focuses on generating discrete images from calibrated camera arrays, but this approach often struggles with dense occlusions and fast motions due to limited perspectives and motion blur. To overcome these limitations, an effective solution requires the integration of multi-view visual information. The spike camera, as an innovative neuromorphic sensor, shows promise with its ultra-high temporal resolution and dynamic range. In this study, we propose a novel approach that utilizes a single spike camera for continuous multi-view imaging to address occlusion removal. By rapidly moving the spike camera, we capture a dense stream of spikes from occluded scenes. Our model, SpkOccNet, processes these spikes by integrating multi-view spatial-temporal information via long-short-window feature extractor (LSW) and employs a novel cross-view mutual attention-based module (CVA) for effective fusion and refinement. Additionally, to facilitate research in occlusion removal, we introduce the S-OCC dataset, which consists of real-world spike-based data. Experimental results demonstrate the efficiency and generalization capabilities of our model in effectively removing dense occlusions across diverse scenes. Public project page: https://github.com/Leozhangjiyuan/SpikeDeOcclusion.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27821 Open-Set Facial Expression Recognition 2024-03-24T00:08:35-07:00 Yuhang Zhang zyhzyh@bupt.edu.cn Yue Yao yue.yao@anu.edu.au Xuannan Liu liuxuannan@bupt.edu.cn Lixiong Qin lxqin@bupt.edu.cn Wenjing Wang wwj311@bupt.edu.cn Weihong Deng whdeng@bupt.edu.cn

Facial expression recognition (FER) models are typically trained on datasets with a fixed number of seven basic classes. However, recent research works (Cowen et al. 2021; Bryant et al. 2022; Kollias 2023) point out that there are far more expressions than the basic ones. Thus, when these models are deployed in the real world, they may encounter unknown classes, such as compound expressions that cannot be classified into existing basic classes. To address this issue, we propose the open-set FER task for the first time. Though there are many existing open-set recognition methods, we argue that they do not work well for open-set FER because FER data are all human faces with very small inter-class distances, which makes the open-set samples very similar to close-set samples. In this paper, we are the first to transform the disadvantage of small inter-class distance into an advantage by proposing a new way for open-set FER. Specifically, we find that small inter-class distance allows for sparsely distributed pseudo labels of open-set samples, which can be viewed as symmetric noisy labels. Based on this novel observation, we convert the open-set FER to a noisy label detection problem. We further propose a novel method that incorporates attention map consistency and cycle training to detect the open-set samples. Extensive experiments on various FER datasets demonstrate that our method clearly outperforms state-of-the-art open-set recognition methods by large margins. Code is available at https://github.com/zyh-uaiaaaa.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27822 Bootstrapping Cognitive Agents with a Large Language Model 2024-03-24T00:08:36-07:00 Feiyu Zhu feiyuz@andrew.cmu.edu Reid Simmons rsimmons@andrew.cmu.edu

Large language models contain noisy general knowledge of the world, yet are hard to train or fine-tune. In contrast cognitive architectures have excellent interpretability and are flexible to update but require a lot of manual work to instantiate. In this work, we combine the best of both worlds: bootstrapping a cognitive-based model with the noisy knowledge encoded in large language models. Through an embodied agent doing kitchen tasks, we show that our proposed framework yields better efficiency compared to an agent entirely based on large language models. Our experiments also indicate that the cognitive agent bootstrapped using this framework can generalize to novel environments and be scaled to complex tasks.

2024-03-25T00:00:00-07:00 Copyright (c) 2024 Association for the Advancement of Artificial Intelligence https://ojs.aaai.org/index.php/AAAI/article/view/27823 Data Augmented Graph Neural Networks for Personality Detection 2024-03-24T00:08:38-07:00 Yangfu Zhu zhuyangfu@bupt.edu.cn Yue Xia 1216918224@qq.com Meiling Li meilinglee@bupt.edu.cn Tingting Zhang zhangtingting@bupt.edu.cn Bin Wu wubin@bupt.edu.cn

Personality detection is a fundamental task for user psychology research. One of the biggest challenges in personality detection lies in the quantitative limitation of labeled data collected by completing the personality questionnaire, which is very time-consuming and labor-intensive. Most of the existing works are mainly devoted to learning the rich representations of posts based on labeled data. However, they still suffer from the inherent weakness of the amount limitation of labels, which potentially restricts the capability of the model to deal with unseen data. In this paper, we construct a heterogeneous personality graph for each labeled and unlabeled user and develop a novel psycholinguistic augmented graph neural network to detect personality in a semi-supervised manner, namely Semi-PerGCN. Specifically, our model first explores a supervised Personality Graph Neural Network (PGNN) to refine labeled user representation on the heterogeneous graph. For the remaining massive unlabeled users, we utilize the empirical psychological knowledge of the Linguistic Inquiry and Word Count (LIWC) lexicon for multi-view graph augmentation and perform unsupervised graph consistent constraints on the parameters shared PGNN. During the learning process of finite labeled users, noise-invariant learning on a large scale of unlabeled users is combined to enhance the generalization ability. Extensive experiments on three real-world datasets, Youtube, PAN2015, and MyPersonality demonstrate the effectiveness of our Semi-PerGCN in personality detection, especially in scenarios with limited labeled users.