Do LLMs Really Struggle at NL-FOL Translation? Revealing Their Strengths via a Novel Benchmarking Strategy

Andrea Brunello; Luca Geatti; Michele Mignani; Angelo Montanari; Nicola Saccomanno

doi:10.1609/aaai.v40i36.40258

Authors

Andrea Brunello University of Udine, Italy
Luca Geatti University of Udine, Italy
Michele Mignani University of Udine, Italy
Angelo Montanari University of Udine, Italy
Nicola Saccomanno University of Udine, Italy

DOI:

https://doi.org/10.1609/aaai.v40i36.40258

Abstract

Due to its expressiveness and unambiguous nature, First-Order Logic (FOL) is a powerful formalism for representing concepts expressed in natural language (NL). This is useful, e.g., for specifying and verifying desired system properties. While translating FOL into human-readable English is relatively straightforward, the inverse problem, converting NL to FOL (NL-FOL translation), has remained a longstanding challenge, for both humans and machines. Although the emergence of Large Language Models (LLMs) promised a breakthrough, recent literature provides contrasting results on their ability to perform NL-FOL translation. In this work, we provide a threefold contribution. First, we critically examine existing datasets and protocols for evaluating NL-FOL translation performance, revealing key limitations that may cause a misrepresentation of LLMs' actual capabilities. Second, to overcome these shortcomings, we propose a novel evaluation protocol explicitly designed to distinguish genuine semantic-level logical understanding from superficial pattern recognition, memorization, and dataset contamination. Third, using this new approach, we show that state-of-the-art, dialogue-oriented LLMs demonstrate strong NL-FOL translation skills and a genuine grasp of sentence-level logic, whereas embedding-centric models perform markedly worse.

Do LLMs Really Struggle at NL-FOL Translation? Revealing Their Strengths via a Novel Benchmarking Strategy

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information