Neuro-symbolic program generation and execution for hybrid reasoning

Thumbnail Image
Date
2025-05
Authors
Hu, Yaojie
Major Professor
Advisor
Tian, Jin
Quinn, Christopher
Jannesari, Ali
Gao, Hongyang
Martin, Ryan
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Neuro-symbolic learning aims to combine neural networks and symbolic reasoning for a hybrid AI. It can offer many desiderata of human-like intelligence, including explainability, efficiency, compositionality, and robustness that are sorely lacking in the monolithic deep neural networks today. A well-designed interface between deep learning and symbolic reasoning provides a structural learning prior that can lead to performance improvements and state-of-the-art results experimentally. In the bigger picture, the integration of deep learning and symbolic reasoning constitutes an algorithm that unifies empiricism and rationalism, two branches of epistemology in philosophy that explain the human acquisition of knowledge, which makes neuro-symbolic reasoning a fundamentally important problem that may one day lead to AGI. This dissertation explores various problems and methods of neuro-symbolic learning with computer programs as the symbolic form, targeting program generation and execution as the two main topics. First, Neuron Dependency Graphs (NDGs) discover symbolic rules that exist commonly in trained neural networks and represent them as directed graphs, where each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. In addition to providing symbolic explanations of the neural network’s internal structure, an NDG can represent a Structural Causal Model (SCM) that is a causal abstraction of the corresponding neural network that "unfolds" the same way under interventions. Then, NSEdit designs a domain-specific language (DSL) as the interface for Transformers to edit code. The DSL interface allows localization, insertion, and deletion, and a neuro-symbolic bi-modal decoder learns to perform bug localization and repair jointly, predicting mixed data types, including editing actions, locations, and words. When published, NSEdit achieved the state-of-the-art program repair performance. Next, Neural Interpretation (NI) presents a neural model for procedural code execution, where each function is represented by a neural network, and every variable is represented by a vector. NI resembles how humans abstractly understand how computers would execute programs from top to bottom, without knowing how the program will actually run step-by-step. In experiments, we show that the neuro-symbolic interpreter can be trained end-to-end with gradient descent. The method can be trained to "execute" library functions without test inputs, because the variables are represented as vectors and do not require the actual values or entry points. Following Neural Interpretation, a Neuro-symbolic Interpreter for Arithmetic Composition (NIAC) demonstrates the compositional generalization ability of NI when performing arithmetic calculation. NIAC learns a structure-preserving mapping between neural execution and arithmetic calculation. Unlike LLMs that lack compositional generalization with respect to productivity (length) and systematicity (format), NIAC guarantees perfect compositional generalization and uses constant memory for potentially infinite input length during inference. Additionally, TableLabeler uses LLMs for automated tabular dataset construction with large language models with quality validation on the LLM-annotated labels. The efficiency of the LLM annotation process enables TableLabeler to introduce the largest executable SQL dataset in the literature. The dataset construction method can also synthesize programs for new database languages with very little training data beforehand. Finally, QualityFlow proposes an agentic workflow method for program synthesis, consisting of software engineering roles including program generator, test designer, and self-debugger, all of which are controlled by a centralized quality checker. QualityFlow achieved the state-of-the-art performance on various program synthesis benchmarks.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Computer Science
Type
article
Comments
Rights Statement
Copyright
Funding
Subject Categories
DOI
Supplemental Resources
Source