DFEE: Interactive DataFlow Execution and Evaluation Kit

Han He; Song Feng; Daniele Bonadiman; Yi Zhang; Saab Mansour

doi:10.1609/aaai.v37i13.27073

Authors

Han He Emory University
Song Feng AWS AI Labs
Daniele Bonadiman AWS AI Labs
Yi Zhang AWS AI Labs
Saab Mansour AWS AI Labs

DOI:

https://doi.org/10.1609/aaai.v37i13.27073

Keywords:

DataFlow, Semantic Parsing, Program Synthesis, Dialog2API, Temporal Reasoning, Event Scheduling, Execution Accuracy

Abstract

DataFlow has been emerging as a new paradigm for building task-oriented chatbots due to its expressive semantic representations of the dialogue tasks. Despite the availability of a large dataset SMCalFlow and a simplified syntax, the development and evaluation of DataFlow-based chatbots remain challenging due to the system complexity and the lack of downstream toolchains. In this demonstration, we present DFEE, an interactive DataFlow Execution and Evaluation toolkit that supports execution, visualization and benchmarking of semantic parsers given dialogue input and backend database. We demonstrate the system via a complex dialog task: event scheduling that involves temporal reasoning. It also supports diagnosing the parsing results via a friendly interface that allows developers to examine dynamic DataFlow and the corresponding execution results. To illustrate how to benchmark SoTA models, we propose a novel benchmark that covers more sophisticated event scheduling scenarios and a new metric on task success evaluation. The codes of DFEE have been released on https://github.com/amazonscience/dataflow-evaluation-toolkit.

DFEE: Interactive DataFlow Execution and Evaluation Kit

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information