Interpreting Multimodal Machine Learning Models Trained for Emotion Recognition to Address Robustness and Privacy Concerns
Many mobile applications and virtual conversational agents now aim to recognize and adapt to emotions. These predicted emotions are used in variety of downstream applications: (a) generating more human like dialogues, (b) predicting mental health issues, and (c) hate speech detection and intervention. To enable this, data are transmitted from users' devices and stored on central servers. These data are then processed further, either annotated or used as inputs for training a model for a specific task. Yet, these data contain sensitive information that could be used by mobile applications without user's consent or, maliciously, by an eavesdropping adversary. My work focuses on two major issues that are faced while training emotion recognition algorithms: (a) privacy of the generated representations and, (b) explaining and ensuring that the predictions are robust to various situations. Tackling these issues would lead to emotion based algorithms that are deployable and helpful at a larger scale, thus enabling more human like experience when interacting with AI.