Neuro-symbolic program generation and execution for hybrid reasoning

Hu, Yaojie

Neuro-symbolic program generation and execution for hybrid reasoning

dc.contributor.advisor	Tian, Jin
dc.contributor.advisor	Quinn, Christopher
dc.contributor.advisor	Jannesari, Ali
dc.contributor.advisor	Gao, Hongyang
dc.contributor.advisor	Martin, Ryan
dc.contributor.author	Hu, Yaojie
dc.contributor.department	Computer Science	en_US
dc.date.accessioned	2025-06-25T22:43:26Z
dc.date.available	2025-06-25T22:43:26Z
dc.date.issued	2025-05
dc.date.updated	2025-06-25T22:43:28Z
dc.description.abstract	Neuro-symbolic learning aims to combine neural networks and symbolic reasoning for a hybrid AI. It can offer many desiderata of human-like intelligence, including explainability, efficiency, compositionality, and robustness that are sorely lacking in the monolithic deep neural networks today. A well-designed interface between deep learning and symbolic reasoning provides a structural learning prior that can lead to performance improvements and state-of-the-art results experimentally. In the bigger picture, the integration of deep learning and symbolic reasoning constitutes an algorithm that unifies empiricism and rationalism, two branches of epistemology in philosophy that explain the human acquisition of knowledge, which makes neuro-symbolic reasoning a fundamentally important problem that may one day lead to AGI. This dissertation explores various problems and methods of neuro-symbolic learning with computer programs as the symbolic form, targeting program generation and execution as the two main topics. First, Neuron Dependency Graphs (NDGs) discover symbolic rules that exist commonly in trained neural networks and represent them as directed graphs, where each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. In addition to providing symbolic explanations of the neural network’s internal structure, an NDG can represent a Structural Causal Model (SCM) that is a causal abstraction of the corresponding neural network that "unfolds" the same way under interventions. Then, NSEdit designs a domain-specific language (DSL) as the interface for Transformers to edit code. The DSL interface allows localization, insertion, and deletion, and a neuro-symbolic bi-modal decoder learns to perform bug localization and repair jointly, predicting mixed data types, including editing actions, locations, and words. When published, NSEdit achieved the state-of-the-art program repair performance. Next, Neural Interpretation (NI) presents a neural model for procedural code execution, where each function is represented by a neural network, and every variable is represented by a vector. NI resembles how humans abstractly understand how computers would execute programs from top to bottom, without knowing how the program will actually run step-by-step. In experiments, we show that the neuro-symbolic interpreter can be trained end-to-end with gradient descent. The method can be trained to "execute" library functions without test inputs, because the variables are represented as vectors and do not require the actual values or entry points. Following Neural Interpretation, a Neuro-symbolic Interpreter for Arithmetic Composition (NIAC) demonstrates the compositional generalization ability of NI when performing arithmetic calculation. NIAC learns a structure-preserving mapping between neural execution and arithmetic calculation. Unlike LLMs that lack compositional generalization with respect to productivity (length) and systematicity (format), NIAC guarantees perfect compositional generalization and uses constant memory for potentially infinite input length during inference. Additionally, TableLabeler uses LLMs for automated tabular dataset construction with large language models with quality validation on the LLM-annotated labels. The efficiency of the LLM annotation process enables TableLabeler to introduce the largest executable SQL dataset in the literature. The dataset construction method can also synthesize programs for new database languages with very little training data beforehand. Finally, QualityFlow proposes an agentic workflow method for program synthesis, consisting of software engineering roles including program generator, test designer, and self-debugger, all of which are controlled by a centralized quality checker. QualityFlow achieved the state-of-the-art performance on various program synthesis benchmarks.
dc.format.mimetype	PDF
dc.identifier.orcid	0000-0003-0288-6717
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/VrO5dGMw
dc.language.iso	en
dc.language.rfc3066	en
dc.subject.disciplines	Artificial intelligence	en_US
dc.subject.keywords	explainable AI	en_US
dc.subject.keywords	large language models	en_US
dc.subject.keywords	machine learning	en_US
dc.subject.keywords	neuro-symbolic reasoning	en_US
dc.subject.keywords	program execution	en_US
dc.subject.keywords	program synthesis	en_US
dc.title	Neuro-symbolic program generation and execution for hybrid reasoning
dc.type	article	en_US
dc.type.genre	dissertation	en_US
dspace.entity.type	Publication
thesis.degree.discipline	Artificial intelligence	en_US
thesis.degree.grantor	Iowa State University	en_US
thesis.degree.level	dissertation	$
thesis.degree.name	Doctor of Philosophy	en_US

File

Original bundle

Now showing 1 - 1 of 1

Name:: Hu_iastate_0097E_22216.pdf
Size:: 13.6 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations