Scaling Effects on Latent Representation Edits in GPT Models (Student Abstract)

Austin L. Davis; Gita Sukthankar

doi:10.1609/aaai.v39i28.35245

Scaling Effects on Latent Representation Edits in GPT Models (Student Abstract)

Authors

Austin L. Davis University of Central Florida
Gita Sukthankar University of Central Florida

DOI:

https://doi.org/10.1609/aaai.v39i28.35245

Abstract

Probing classifiers are a technique for understanding and modifying the operation of neural networks in which a smaller classifier is trained to use the model's internal representation to learn a related probing task. Similar to a neural electrode array, training probing classifiers can help researchers both discern and edit the internal representation of a neural network. This paper presents an evaluation of the use of probing classifiers to modify the internal hidden state of a chess-playing transformer. We demonstrate that intervention vector scaling should follow a negative exponential according to the length of the input to ensure model outputs remain semantically valid after editing the residual stream activations.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Davis, A. L., & Sukthankar, G. (2025). Scaling Effects on Latent Representation Edits in GPT Models (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29343–29344. https://doi.org/10.1609/aaai.v39i28.35245

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Scaling Effects on Latent Representation Edits in GPT Models (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information