SALSA: Semantically-Aware Latent Space Autoencoder
DOI:
https://doi.org/10.1609/aaai.v38i12.29221Keywords:
ML: Representation Learning, ML: Unsupervised & Self-Supervised Learning, ML: Deep Generative Models & Autoencoders, ML: ApplicationsAbstract
In deep learning for drug discovery, molecular representations are often based on sequences, known as SMILES, which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are specified by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that SMILES-based autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not necessarily respect the semantic similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA) for molecular representations: a SMILES-based transformer autoencoder modified with a contrastive task aimed at learning graph-to-graph similarities between molecules. To accomplish this, we develop a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We evaluate semantic awareness of SALSA representations by comparing to its ablated counterparts, and show empirically that SALSA learns representations that maintain 1) structural awareness, 2) physicochemical awareness, 3) biological awareness, and 4) semantic continuity.Downloads
Published
2024-03-24
How to Cite
Kirchoff, K. E., Maxfield, T., Tropsha, A., & Gomez, S. M. (2024). SALSA: Semantically-Aware Latent Space Autoencoder. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13211-13219. https://doi.org/10.1609/aaai.v38i12.29221
Issue
Section
AAAI Technical Track on Machine Learning III