Information extraction with weak supervision
dc.contributor.advisor | Li, Qi | |
dc.contributor.advisor | Cai, Ying | |
dc.contributor.advisor | Liu, Kevin | |
dc.contributor.advisor | Gao, Hongyang | |
dc.contributor.advisor | Huai, Mengdi | |
dc.contributor.author | Zhou, Kang | |
dc.contributor.department | Department of Computer Science | |
dc.date.accessioned | 2025-02-11T17:23:01Z | |
dc.date.available | 2025-02-11T17:23:01Z | |
dc.date.issued | 2024-12 | |
dc.date.updated | 2025-02-11T17:23:02Z | |
dc.description.abstract | This dissertation explores the development and application of weak supervision techniques to address key challenges in three fundamental information extraction (IE) tasks: Named Entity Recognition (NER), Relation Extraction (RE), and Entity Linking (EL). Traditional supervised learning methods in these domains often require extensive human annotations, which are costly and time-consuming, limiting their scalability and applicability in real-world scenarios. To overcome these limitations, this research introduces innovative weakly supervised methodologies for each of these tasks, aiming to reduce reliance on manual labeling while maintaining high performance. The first part of the dissertation presents a novel framework, Confidence-Based Multi-Class Positive and Unlabeled (Conf-MPU) learning, designed to enhance the performance of distantly supervised NER. By incorporating confidence scores into a multi-class PU learning approach, Conf-MPU effectively handles incomplete labeling and varying false negative rates inherent in distantly supervised data. Experimental results on benchmark datasets demonstrate that Conf-MPU significantly outperforms existing state-of-the-art methods, advancing the field of distantly supervised NER. The second part focuses on improving Relation Extraction through the integration of indirect supervision. A novel approach, DSRE-NLI, is introduced, which leverages a Natural Language Inference (NLI) engine and a Semi-Automatic Relation Verbalization (SARV) mechanism to diagnose and mitigate label noise in distantly supervised RE tasks. This method enhances the semantic diversity of relation templates with minimal human input, resulting in a significant performance boost over traditional distantly supervised methods on real and simulated datasets. The third part of the dissertation addresses challenges in Zero-Shot Entity Linking (ZSEL) with a new re-ranking approach, GenDecider, which incorporates “None of the Candidates” (NoC) judgments into the re-ranking process. By formulating the task as a generative process using the Llama model, GenDecider effectively detects scenarios where the correct entity is not among the retrieved candidates. This approach significantly improves the accuracy and reliability of ZSEL systems, as evidenced by its performance on the benchmark ZESHEL dataset. Collectively, the contributions of this dissertation lie in advancing weak supervision techniques across three critical IE tasks, reducing the dependency on extensive manual annotations, and improving the robustness and scalability of information extraction systems. The findings have broad implications for the development of practical, scalable IE solutions in data-rich environments. Future research directions include refining noise-handling mechanisms, optimizing computational efficiency, and expanding the proposed methods to multilingual and low-resource settings. | |
dc.format.mimetype | ||
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/Qr9mg7Jr | |
dc.language.iso | en | |
dc.language.rfc3066 | en | |
dc.subject.disciplines | Computer science | en_US |
dc.subject.keywords | Information Extraction | en_US |
dc.subject.keywords | Weak Supervision | en_US |
dc.title | Information extraction with weak supervision | |
dc.type | dissertation | en_US |
dc.type.genre | dissertation | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | f7be4eb9-d1d0-4081-859b-b15cee251456 | |
thesis.degree.discipline | Computer science | en_US |
thesis.degree.grantor | Iowa State University | en_US |
thesis.degree.level | dissertation | $ |
thesis.degree.name | Doctor of Philosophy | en_US |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Zhou_iastate_0097E_21832.pdf
- Size:
- 2.6 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description: