Synthesis of Programs from Multimodal Datasets


  • Shantanu Thakoor Stanford University
  • Simoni Shah Indian Institute of Technology, Bombay
  • Ganesh Ramakrishnan Indian Institute of Technology, Bombay
  • Amitabha Sanyal Indian Institute of Technology, Bombay


APP: Other Applications, MLA: Applications of Supervised Learning, HSO: Optimization, NLPML: Natural Language Processing (General/Other), HSO: Search (General/Other)


We describe MultiSynth, a framework for synthesizing domain-specific programs from a multimodal dataset of examples. Given a domain-specific language (DSL), a dataset is multimodal if there is no single program in the DSL that generalizes over all the examples. Further, even if the examples in the dataset were generalized in terms of a set of programs, the domains of these programs may not be disjoint, thereby leading to ambiguity in synthesis. MultiSynth is a framework that incorporates concepts of synthesizing programs with minimum generality, while addressing the need of accurate prediction. We show how these can be achieved through (i) transformation driven partitioning of the dataset, (ii) least general generalization, for a generalized specification of the input and the output, and (iii) learning to rank, for estimating feature weights in order to map an input to the most appropriate mode in case of ambiguity. We show the effectiveness of our framework in two domains: in the first case, we extend an existing approach for synthesizing programs for XML tree transformations to ambiguous multimodal datasets. In the second case, MultiSynth is used to preorder words for machine translation, by learning permutations of productions in the parse trees of the source side sentences. Our evaluations reflect the effectiveness of our approach.




How to Cite

Thakoor, S., Shah, S., Ramakrishnan, G., & Sanyal, A. (2018). Synthesis of Programs from Multimodal Datasets. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). Retrieved from