Integer linear programming formulation for the unified duplication-loss-coalescence model

dc.contributor.advisor Eulenstein, Oliver
dc.contributor.advisor Wang, Lizhi
dc.contributor.advisor Li, Qi
dc.contributor.author Ansarifar, Javad
dc.contributor.department Department of Computer Science
dc.date.accessioned 2022-11-09T02:36:47Z
dc.date.available 2022-11-09T02:36:47Z
dc.date.issued 2021-12
dc.date.updated 2022-11-09T02:36:47Z
dc.description.abstract The classical Duplication-Loss-Coalescence parsimony model (DLC-model) is a powerful tool when studying the complex evolutionary scenarios of simultaneous duplication-loss and deep coalescence events in evolutionary histories of gene families. However, inferring such scenarios is an intrinsically difficult problem and, therefore, prohibitive for larger gene families typically occurring in practice. The parsimony duplication-loss-coalescence model (DLCParsimony model) appeared as an effective way to study commonly appearing gene families, whose evolutionary histories had duplication-loss events appearing alongside the incomplete lineage sorting events. This model allows practitioners to accurately infer ortholog/paralog relationships among extant genes and to study the relationship between the gene family evolution (gene tree) and the host species evolution (species tree). Unfortunately, the existing enumeration algorithm, DLCPar, which finds the optimum reconciliation between a gene tree and a species tree under the DLCParsimony model, is very limited in practice due to its advanced complexity. To overcome this stringent limitation, we first describe a non-trivial and flexible Integer Linear Programming (ILP) formulation for inferring DLC evolutionary scenarios. A comparative scalability study demonstrates that our ILP formulation commonly outperforms the standard enumeration algorithm DLCPar for inferring scenarios under the DLC-model. To make the DLC-model more practical, we then introduce two sensibly constrained versions of the model and describe two respectively modified versions of our ILP formulation reflecting these constraints. Using a simulation study, we showcase that our constrained ILP formulation computes evolutionary scenarios that are substantially larger than the scenarios computable under our original ILP formulation and DLCPar. Further, scenarios computed under our constrained DLC-model are overall remarkably accurate when compared to corresponding scenarios under the original DLC-model.
dc.format.mimetype PDF
dc.identifier.doi https://doi.org/10.31274/td-20240329-433
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/kv7kjORv
dc.language.iso en
dc.language.rfc3066 en
dc.subject.disciplines Computer science en_US
dc.subject.disciplines Bioinformatics en_US
dc.subject.disciplines Operations research en_US
dc.subject.keywords Bioinformatics en_US
dc.subject.keywords Duplication-Loss-Coalescence parsimony model en_US
dc.subject.keywords Operation research en_US
dc.subject.keywords Optimization en_US
dc.title Integer linear programming formulation for the unified duplication-loss-coalescence model
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
thesis.degree.discipline Computer science en_US
thesis.degree.discipline Bioinformatics en_US
thesis.degree.discipline Operations research en_US
thesis.degree.grantor Iowa State University en_US
thesis.degree.level thesis $
thesis.degree.name Master of Science en_US
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Ansarifar_iastate_0097M_19932.pdf
Size:
920.24 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: