Skip to main content
Skip to main navigation menu
Skip to site footer
Open Menu
Proceedings of the AAAI Conference on Artificial Intelligence
Current
Archives
About
About the Journal
Submissions
Privacy Statement
Contact
Login
Home
/
Search
Search
Search articles for
Advanced filters
Published After
Year
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
Month
January
February
March
April
May
June
July
August
September
October
November
December
Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Published Before
Year
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
Month
January
February
March
April
May
June
July
August
September
October
November
December
Day
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
By Author
Search
Search Results
Found 25136 items.
Debate Helps Weak-to-Strong Generalization
Hao Lang, Fei Huang, Yongbin Li
27410-27418
2025-04-11
JailPO: A Novel Black-Box Jailbreak Framework via Preference Optimization Against Aligned LLMs
Hongyi Li, Jiawei Ye, Jie Wu, Tianjie Yan, Chu Wang, Zhixin Li
27419-27427
2025-04-11
Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update
Qing Li, Jiahui Geng, Derui Zhu, Zongxiong Chen, Kun Song, Lei Ma, Fakhri Karray
27428-27436
2025-04-11
Strong Empowered and Aligned Weak Mastered Annotation for Weak-to-Strong Generalization
Yongqi Li, Xin Miao, Mayi Xu, Tieyun Qian
27437-27445
2025-04-11
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
Zaitang LI, Pin-Yu Chen, Tsung-Yi Ho
27446-27454
2025-04-11
Exploring Intrinsic Alignments Within Text Corpus
Zi Liang, Pinghui Wang, Ruofei Zhang, Haibo Hu, Shuo Zhang, Qingqing Ye, Nuo Xu, Yaxin Xiao, Chen Zhang, Lizhen Cui
27455-27463
2025-04-11
Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception
Wenhao Liao, Sineng Yan, Youqian Zhang, Xinwei Zhai, Yuanyuan Wang, Eugene Fu
27464-27472
2025-04-11
Single Character Perturbations Break LLM Alignment
Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh
27473-27481
2025-04-11
Data with High and Consistent Preference Difference Are Better for Reward Model
Qi Lin, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Huixing Jiang, Wei Chen
27482-27490
2025-04-11
Bias Unveiled: Investigating Social Bias in LLM-Generated Code
Lin Ling, Fazle Rabbi, Song Wang, Jinqiu Yang
27491-27499
2025-04-11
Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction
Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
27500-27508
2025-04-11
Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling
Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan, Kaiqi Huang
27509-27517
2025-04-11
MAPLE: A Framework for Active Preference Learning Guided by Large Language Models
Saaduddin Mahmud, Mason Nakamura, Shlomo Zilberstein
27518-27528
2025-04-11
SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine
Sadanand Modak, Noah Tobias Patton, Isil Dillig, Joydeep Biswas
27529-27537
2025-04-11
Neural Continuous-Time Supermartingale Certificates
Grigory Neustroev, Mirco Giacobbe, Anna Lukina
27538-27546
2025-04-11
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther, Adish Singla, Goran Radanovic
27547-27555
2025-04-11
Is Poisoning a Real Threat to DPO? Maybe More So Than You Think
Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang
27556-27564
2025-04-11
Do Transformer Interpretability Methods Transfer to RNNs?
Gonçalo Paulo, Thomas Marshall, Nora Belrose
27565-27572
2025-04-11
Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems
Pierre Peigné, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, Esben Kran
27573-27581
2025-04-11
Increased Compute Efficiency and the Diffusion of AI Capabilities
Konstantin F. Pilz, Lennart Heim, Nicholas Brown
27582-27590
2025-04-11
Neurons to Words: A Novel Method for Automated Neural Network Interpretability and Alignment
Lukas-Santo Puglisi, Fabio Valdés, Jakob Johannes Metzger
27591-27598
2025-04-11
SEAL: Systematic Error Analysis for Value ALignment
Manon Revel, Matteo Cargnelutti, Tyna Eloundou, Greg Leppert
27599-27607
2025-04-11
ME: Modelling Ethical Values for Value Alignment
Eryn Rigley, Adriane Chapman, Christine Evers, Will McNeill
27608-27616
2025-04-11
SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy
27617-27627
2025-04-11
Reinforcement Learning Platform for Adversarial Black-box Attacks with Custom Distortion Filters
Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Ricardo Luna Gutiérrez, Antonio Guillen, Desik Rengarajan
27628-27635
2025-04-11
16451 - 16475 of 25136 items
<<
<
654
655
656
657
658
659
660
661
662
663
>
>>
Information
For Readers
For Authors
For Librarians
Part of the
PKP Publishing Services Network