PathBinder: a sentence repository of biochemical interactions extracted from MEDLINE

dc.contributor.author Ding, Jing
dc.contributor.department Department of Electrical and Computer Engineering
dc.date 2020-08-05T18:36:45.000
dc.date.accessioned 2021-02-26T08:41:45Z
dc.date.available 2021-02-26T08:41:45Z
dc.date.copyright Wed Jan 01 00:00:00 UTC 2003
dc.date.issued 2003-01-01
dc.description.abstract <p>MEDLINE is a fast growing online scientific literature database covering the fields of life science, medicine, health care, etc. It provides attractive opportunities for automatic information extraction for tasks such as extracting networks of protein interactions, as well as for benefiting researchers who need to efficiently sift through the literature to find work relating to small sets of biochemicals of interest. PathBinder is a software system that extracts sentences containing potential biochemical interactions from the baseline MEDLINE database annual distribution. Interactions between two biochemicals are assumed if they co-occur in a single sentence. Single sentences were parsed from MEDLINE abstracts, and scanned against a dictionary containing more than 80,000 entries (>40,000 biochemicals and their aliases) for at least two different biochemicals. The dictionary was constructed automatically by extracting names and synonyms of protein and non-protein biochemicals from four databases. The extracted sentences are organized in a repository, about 11 GB in size, easily retrievable through a 2-level index system based on two biochemical names. The performance of PathBinder in terms of information extraction metrics (e.g. precision and recall) was evaluated using a sample MEDLINE file. Sentence parsing has a precision of 99.6% and a recall of 99.5%. Biochemical labeling had a precision of 80.5% and a recall of 57.3%.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/rtd/19945/
dc.identifier.articleid 20944
dc.identifier.contextkey 18780054
dc.identifier.doi https://doi.org/10.31274/rtd-20200803-167
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath rtd/19945
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/97312
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/rtd/19945/Ding_ISU_2003_D56.pdf|||Fri Jan 14 22:01:19 UTC 2022
dc.subject.keywords Electrical and computer engineering
dc.subject.keywords Computer engineering
dc.title PathBinder: a sentence repository of biochemical interactions extracted from MEDLINE
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication a75a044c-d11e-44cd-af4f-dab1d83339ff
thesis.degree.discipline Computer Engineering
thesis.degree.level thesis
thesis.degree.name Master of Science
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ding_ISU_2003_D56.pdf
Size:
789.62 KB
Format:
Adobe Portable Document Format
Description: