Skip to main content Skip to main navigation menu Skip to site footer
Proceedings of the AAAI Conference on Artificial Intelligence
  • Current
  • Archives
  • About
    • About the Journal
    • Submissions
    • Privacy Statement
    • Contact
  • Login
  1. Home /
  2. Search

Search

Advanced filters
Published After
Published Before

Search Results

Found 25136 items.
  • The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation

    Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab
    37379-37386
    2026-03-14
  • SMiLE: Provably Enforcing Global Relational Properties in Neural Networks

    Matteo Francobaldi, Michele Lombardi, Andrea Lodi
    37387-37395
    2026-03-14
  • EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing

    Fan Gao, Dongyuan Li, Ding Xia, Fei Mi, Yasheng Wang, Lifeng Shang, Baojun Wang
    37396-37406
    2026-03-14
  • Beyond Transcription: Mechanistic Interpretability in ASR

    Neta Glazer, Yael Segal-Feldman, Hilit Segev, Aviv Shamsian, Asaf Buchnick, Gill Hetz, Ethan Fetaya, Joseph Keshet, Aviv Navon
    37407-37416
    2026-03-14
  • AlignTree: Efficient Defense Against LLM Jailbreak Attacks

    Gil Goren, Shahar Katz, Lior Wolf
    37417-37425
    2026-03-14
  • Identifying Features Associated with Bias Against 93 Stigmatized Groups in Language Models and Guardrail Model Safety Mitigation

    Anna-Maria Gueorguieva, Aylin Caliskan
    37426-37434
    2026-03-14
  • Resolving Predictive Multiplicity for the Rashomon Set

    Parian Haghighat, Hadis Anahideh, Cynthia Rudin
    37435-37442
    2026-03-14
  • Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation

    Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee
    37443-37451
    2026-03-14
  • Silenced Biases: The Dark Side LLMs Learned to Refuse

    Rom Himelstein, Amit LeVi, Brit Youngmann, Yaniv Nemcovsky, Avi Mendelson
    37452-37461
    2026-03-14
  • TAPO: Dynamic Teacher and Perturbed Answer Injection for Policy Optimization

    Maowei Jiang, Zihang Wang, Qi Wang, Peter Búš, Moquan Cheng, Yifan Wang, Quangao Liu, Ruiqi Li, Pengyu Zeng, Ruikai Liu, Alan Liang, Yansong Xu, Yusong Hu, Chaoran Zhang, Zhiyong Dong
    37462-37471
    2026-03-14
  • Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

    Haotian Jin, Yang Li, Haihui Fan, Lin Shen, Xiangfang Li, Bo Li
    37472-37480
    2026-03-14
  • Requirements for Aligned, Dynamic Resolution of Conflicts in Operational Constraints

    Steven J. Jones, Robert E. Wray, John E. Laird
    37481-37490
    2026-03-14
  • Benchmarking XAI Explanations with Human-Aligned Evaluations

    Rémi Kazmierczak, Steve Azzolin, Eloïse Berthier, Anna Hedström, Patricia Delhomme, David Filliat, Nicolas Bousquet, Goran Frehse, Massimiliano Mancini, Baptiste Caramiaux, Andrea Passerini, Gianni Franchi
    37491-37500
    2026-03-14
  • Moral Change or Noise? On Problems of Aligning AI with Temporally Unstable Human Feedback

    Vijay Keswani, Cyrus Cousins, Breanna Nguyen, Vincent Conitzer, Hoda Heidari, Jana Schaich Borg, Walter Sinnott-Armstrong
    37501-37509
    2026-03-14
  • Transparent Networks for Multivariate Time Series

    Minkyu Kim, Suan Lee, Jinho Kim
    37510-37518
    2026-03-14
  • Align to Structure: Aligning Large Language Models with Structural Information

    Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop Kang
    37519-37528
    2026-03-14
  • Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images

    Aditya Kumar, Tom Blanchard, Adam Dziedzic, Franziska Boenisch
    37529-37537
    2026-03-14
  • Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

    Shigeki Kusaka, Keita Saito, Mikoto Kudo, Takumi Tanabe, Akifumi Wachi, Youhei Akimoto
    37538-37546
    2026-03-14
  • Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment

    Jea Kwon, Luiz Felipe Vecchietti, Sungwon Park, Meeyoung Cha
    37547-37555
    2026-03-14
  • Selective Weak-to-Strong Generalization

    Hao Lang, Fei Huang, Yongbin Li
    37556-37564
    2026-03-14
  • MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

    Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee
    37565-37573
    2026-03-14
  • STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision

    Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, Zihan Wang, Anudeepsekhar Bolimera, Marios Savvides
    37574-37582
    2026-03-14
  • ARGH-Mark: Anchor-Synchronized Watermarking with Hamming Correction for Robust and Quality-Preserving LLM Attribution

    He Li, Xiaojun Chen, Jingcheng He, Zhendong Zhao, Shuguang Yuan, Xin Zhao, Yunfei Yang
    37583-37590
    2026-03-14
  • StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak

    Hongyi Li, Chengxuan Zhou, Chu Wang, Sicheng Liang, Yanting Chen, Qinlin Xie, Jiawei Ye, Jie Wu
    37591-37599
    2026-03-14
  • How Bias Binds: Measuring Hidden Associations for Bias Control in Text-to-Image Compositions

    Jeng-Lin Li, Ming-Ching Chang, Wei-Chao Chen
    37600-37608
    2026-03-14
21076 - 21100 of 25136 items << < 839 840 841 842 843 844 845 846 847 848 > >> 

Information

  • For Readers
  • For Authors
  • For Librarians
  • Part of the
    PKP Publishing Services Network

Copyright © 2024, Association for the Advancement of Artificial Intelligence

More information about the publishing system, Platform and Workflow by OJS/PKP.