Neuro-symbolic program generation and execution for hybrid reasoning

dc.contributor.advisor Tian, Jin
dc.contributor.advisor Quinn, Christopher
dc.contributor.advisor Jannesari, Ali
dc.contributor.advisor Gao, Hongyang
dc.contributor.advisor Martin, Ryan
dc.contributor.author Hu, Yaojie
dc.contributor.department Computer Science en_US
dc.date.accessioned 2025-06-25T22:43:26Z
dc.date.available 2025-06-25T22:43:26Z
dc.date.issued 2025-05
dc.date.updated 2025-06-25T22:43:28Z
dc.description.abstract Neuro-symbolic learning aims to combine neural networks and symbolic reasoning for a hybrid AI. It can offer many desiderata of human-like intelligence, including explainability, efficiency, compositionality, and robustness that are sorely lacking in the monolithic deep neural networks today. A well-designed interface between deep learning and symbolic reasoning provides a structural learning prior that can lead to performance improvements and state-of-the-art results experimentally. In the bigger picture, the integration of deep learning and symbolic reasoning constitutes an algorithm that unifies empiricism and rationalism, two branches of epistemology in philosophy that explain the human acquisition of knowledge, which makes neuro-symbolic reasoning a fundamentally important problem that may one day lead to AGI. This dissertation explores various problems and methods of neuro-symbolic learning with computer programs as the symbolic form, targeting program generation and execution as the two main topics. First, Neuron Dependency Graphs (NDGs) discover symbolic rules that exist commonly in trained neural networks and represent them as directed graphs, where each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. In addition to providing symbolic explanations of the neural network’s internal structure, an NDG can represent a Structural Causal Model (SCM) that is a causal abstraction of the corresponding neural network that "unfolds" the same way under interventions. Then, NSEdit designs a domain-specific language (DSL) as the interface for Transformers to edit code. The DSL interface allows localization, insertion, and deletion, and a neuro-symbolic bi-modal decoder learns to perform bug localization and repair jointly, predicting mixed data types, including editing actions, locations, and words. When published, NSEdit achieved the state-of-the-art program repair performance. Next, Neural Interpretation (NI) presents a neural model for procedural code execution, where each function is represented by a neural network, and every variable is represented by a vector. NI resembles how humans abstractly understand how computers would execute programs from top to bottom, without knowing how the program will actually run step-by-step. In experiments, we show that the neuro-symbolic interpreter can be trained end-to-end with gradient descent. The method can be trained to "execute" library functions without test inputs, because the variables are represented as vectors and do not require the actual values or entry points. Following Neural Interpretation, a Neuro-symbolic Interpreter for Arithmetic Composition (NIAC) demonstrates the compositional generalization ability of NI when performing arithmetic calculation. NIAC learns a structure-preserving mapping between neural execution and arithmetic calculation. Unlike LLMs that lack compositional generalization with respect to productivity (length) and systematicity (format), NIAC guarantees perfect compositional generalization and uses constant memory for potentially infinite input length during inference. Additionally, TableLabeler uses LLMs for automated tabular dataset construction with large language models with quality validation on the LLM-annotated labels. The efficiency of the LLM annotation process enables TableLabeler to introduce the largest executable SQL dataset in the literature. The dataset construction method can also synthesize programs for new database languages with very little training data beforehand. Finally, QualityFlow proposes an agentic workflow method for program synthesis, consisting of software engineering roles including program generator, test designer, and self-debugger, all of which are controlled by a centralized quality checker. QualityFlow achieved the state-of-the-art performance on various program synthesis benchmarks.
dc.format.mimetype PDF
dc.identifier.orcid 0000-0003-0288-6717
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/VrO5dGMw
dc.language.iso en
dc.language.rfc3066 en
dc.subject.disciplines Artificial intelligence en_US
dc.subject.keywords explainable AI en_US
dc.subject.keywords large language models en_US
dc.subject.keywords machine learning en_US
dc.subject.keywords neuro-symbolic reasoning en_US
dc.subject.keywords program execution en_US
dc.subject.keywords program synthesis en_US
dc.title Neuro-symbolic program generation and execution for hybrid reasoning
dc.type article en_US
dc.type.genre dissertation en_US
dspace.entity.type Publication
thesis.degree.discipline Artificial intelligence en_US
thesis.degree.grantor Iowa State University en_US
thesis.degree.level dissertation $
thesis.degree.name Doctor of Philosophy en_US
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Hu_iastate_0097E_22216.pdf
Size:
13.6 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: