PathBinder: mining MEDLINE for protein-protein interactions

Thumbnail Image
Date
2003-01-01
Authors
Qi, Wenxin
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract

Exploring protein-protein interactions and the regulation of signal transduction pathways from biomedical texts has become a daily routine for many scientists. The MEDLINE citation database is the largest English language biomedical bibliographic database. We report the design and development of an automatic text mining system, PathBinder, for extracting, manipulating, and managing protein-protein interactions from MEDLINE abstracts to facilitate extracting pathway information to a database. PathBinder is a broad-scale data mining tool. It processes large volumes of text information in the biomedical domain, resulting in output that contains the desired information in highly concentrated form. The extracted information is then presented to the user with a visualization tool. Developing and integrating PathBinder documents processing functionalities will allow it to serve as the content builder of a pathways database. We partially solved the problems of inefficiency, ambiguity and low coverage, which exist in many text mining systems. Our processing unit is the sentence. Abstracts from the MEDLINE database are parsed into individual sentences, which in turn are searched for the presence of 1) one pair of protein names 2), one protein name and one interaction related verb, 3) one protein name and another word describing desired context. The qualified sentences are in a database for querying. The query is automatically conflated to aliases of gene; build the link for related information. The user can add human knowledge inputs to create the pathway database, and apply graphical display to connect each protein name with clickable nodes and edges. The nodes of graph represent the protein names occurrence in a sentence within the literature. The edges of graph represent the two protein names in one sentence within the literature. The nodes and edges can be hypertext-linked to sentence databases. The system dynamically generates and presents all related sentences in a friendly interface. Protein names are highlighted in each sentence for quick browsing. A hypertext link to the original abstract in the online MEDLINE database is offered.

Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
thesis
Comments
Rights Statement
Copyright
Wed Jan 01 00:00:00 UTC 2003
Funding
Supplemental Resources
Source