PathBinder: mining MEDLINE for protein-protein interactions

dc.contributor.author Qi, Wenxin
dc.contributor.department Department of Electrical and Computer Engineering
dc.date 2020-07-17T07:22:33.000
dc.date.accessioned 2021-02-26T08:32:44Z
dc.date.available 2021-02-26T08:32:44Z
dc.date.copyright Wed Jan 01 00:00:00 UTC 2003
dc.date.issued 2003-01-01
dc.description.abstract <p>Exploring protein-protein interactions and the regulation of signal transduction pathways from biomedical texts has become a daily routine for many scientists. The MEDLINE citation database is the largest English language biomedical bibliographic database. We report the design and development of an automatic text mining system, PathBinder, for extracting, manipulating, and managing protein-protein interactions from MEDLINE abstracts to facilitate extracting pathway information to a database. PathBinder is a broad-scale data mining tool. It processes large volumes of text information in the biomedical domain, resulting in output that contains the desired information in highly concentrated form. The extracted information is then presented to the user with a visualization tool. Developing and integrating PathBinder documents processing functionalities will allow it to serve as the content builder of a pathways database. We partially solved the problems of inefficiency, ambiguity and low coverage, which exist in many text mining systems. Our processing unit is the sentence. Abstracts from the MEDLINE database are parsed into individual sentences, which in turn are searched for the presence of 1) one pair of protein names 2), one protein name and one interaction related verb, 3) one protein name and another word describing desired context. The qualified sentences are in a database for querying. The query is automatically conflated to aliases of gene; build the link for related information. The user can add human knowledge inputs to create the pathway database, and apply graphical display to connect each protein name with clickable nodes and edges. The nodes of graph represent the protein names occurrence in a sentence within the literature. The edges of graph represent the two protein names in one sentence within the literature. The nodes and edges can be hypertext-linked to sentence databases. The system dynamically generates and presents all related sentences in a friendly interface. Protein names are highlighted in each sentence for quick browsing. A hypertext link to the original abstract in the online MEDLINE database is offered.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/rtd/19545/
dc.identifier.articleid 20544
dc.identifier.contextkey 18549577
dc.identifier.doi https://doi.org/10.31274/rtd-20200716-112
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath rtd/19545
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/96912
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/rtd/19545/Qi_ISU_2003_Q23.pdf|||Fri Jan 14 21:57:39 UTC 2022
dc.subject.keywords Electrical and computer engineering
dc.subject.keywords Computer engineering
dc.title PathBinder: mining MEDLINE for protein-protein interactions
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
thesis.degree.discipline Computer Engineering
thesis.degree.level thesis
thesis.degree.name Master of Science
File