Skip to main content Skip to main navigation menu Skip to site footer
Proceedings of the AAAI Conference on Artificial Intelligence
  • Current
  • Archives
  • About
    • About the Journal
    • Submissions
    • Privacy Statement
    • Contact
  • Login
  1. Home /
  2. Search

Search

Advanced filters
Published After
Published Before

Search Results

Found 25136 items.
  • Debate Helps Weak-to-Strong Generalization

    Hao Lang, Fei Huang, Yongbin Li
    27410-27418
    2025-04-11
  • JailPO: A Novel Black-Box Jailbreak Framework via Preference Optimization Against Aligned LLMs

    Hongyi Li, Jiawei Ye, Jie Wu, Tianjie Yan, Chu Wang, Zhixin Li
    27419-27427
    2025-04-11
  • Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update

    Qing Li, Jiahui Geng, Derui Zhu, Zongxiong Chen, Kun Song, Lei Ma, Fakhri Karray
    27428-27436
    2025-04-11
  • Strong Empowered and Aligned Weak Mastered Annotation for Weak-to-Strong Generalization

    Yongqi Li, Xin Miao, Mayi Xu, Tieyun Qian
    27437-27445
    2025-04-11
  • Retention Score: Quantifying Jailbreak Risks for Vision Language Models

    Zaitang LI, Pin-Yu Chen, Tsung-Yi Ho
    27446-27454
    2025-04-11
  • Exploring Intrinsic Alignments Within Text Corpus

    Zi Liang, Pinghui Wang, Ruofei Zhang, Haibo Hu, Shuo Zhang, Qingqing Ye, Nuo Xu, Yaxin Xiao, Chen Zhang, Lizhen Cui
    27455-27463
    2025-04-11
  • Is Your Autonomous Vehicle Safe? Understanding the Threat of Electromagnetic Signal Injection Attacks on Traffic Scene Perception

    Wenhao Liao, Sineng Yan, Youqian Zhang, Xinwei Zhai, Yuanyuan Wang, Eugene Fu
    27464-27472
    2025-04-11
  • Single Character Perturbations Break LLM Alignment

    Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh
    27473-27481
    2025-04-11
  • Data with High and Consistent Preference Difference Are Better for Reward Model

    Qi Lin, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Huixing Jiang, Wei Chen
    27482-27490
    2025-04-11
  • Bias Unveiled: Investigating Social Bias in LLM-Generated Code

    Lin Ling, Fazle Rabbi, Song Wang, Jinqiu Yang
    27491-27499
    2025-04-11
  • Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

    Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
    27500-27508
    2025-04-11
  • Sequential Preference Optimization: Multi-Dimensional Preference Alignment with Implicit Reward Modeling

    Xingzhou Lou, Junge Zhang, Jian Xie, Lifeng Liu, Dong Yan, Kaiqi Huang
    27509-27517
    2025-04-11
  • MAPLE: A Framework for Active Preference Learning Guided by Large Language Models

    Saaduddin Mahmud, Mason Nakamura, Shlomo Zilberstein
    27518-27528
    2025-04-11
  • SYNAPSE: SYmbolic Neural-Aided Preference Synthesis Engine

    Sadanand Modak, Noah Tobias Patton, Isil Dillig, Joydeep Biswas
    27529-27537
    2025-04-11
  • Neural Continuous-Time Supermartingale Certificates

    Grigory Neustroev, Mirco Giacobbe, Anna Lukina
    27538-27546
    2025-04-11
  • Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints

    Jonathan Nöther, Adish Singla, Goran Radanovic
    27547-27555
    2025-04-11
  • Is Poisoning a Real Threat to DPO? Maybe More So Than You Think

    Pankayaraj Pathmanathan, Souradip Chakraborty, Xiangyu Liu, Yongyuan Liang, Furong Huang
    27556-27564
    2025-04-11
  • Do Transformer Interpretability Methods Transfer to RNNs?

    Gonçalo Paulo, Thomas Marshall, Nora Belrose
    27565-27572
    2025-04-11
  • Multi-Agent Security Tax: Trading Off Security and Collaboration Capabilities in Multi-Agent Systems

    Pierre Peigné, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, Esben Kran
    27573-27581
    2025-04-11
  • Increased Compute Efficiency and the Diffusion of AI Capabilities

    Konstantin F. Pilz, Lennart Heim, Nicholas Brown
    27582-27590
    2025-04-11
  • Neurons to Words: A Novel Method for Automated Neural Network Interpretability and Alignment

    Lukas-Santo Puglisi, Fabio Valdés, Jakob Johannes Metzger
    27591-27598
    2025-04-11
  • SEAL: Systematic Error Analysis for Value ALignment

    Manon Revel, Matteo Cargnelutti, Tyna Eloundou, Greg Leppert
    27599-27607
    2025-04-11
  • ME: Modelling Ethical Values for Value Alignment

    Eryn Rigley, Adriane Chapman, Christine Evers, Will McNeill
    27608-27616
    2025-04-11
  • SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

    Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy
    27617-27627
    2025-04-11
  • Reinforcement Learning Platform for Adversarial Black-box Attacks with Custom Distortion Filters

    Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Ricardo Luna Gutiérrez, Antonio Guillen, Desik Rengarajan
    27628-27635
    2025-04-11
16451 - 16475 of 25136 items << < 654 655 656 657 658 659 660 661 662 663 > >> 

Information

  • For Readers
  • For Authors
  • For Librarians
  • Part of the
    PKP Publishing Services Network

Copyright © 2024, Association for the Advancement of Artificial Intelligence

More information about the publishing system, Platform and Workflow by OJS/PKP.