Interpretability of configurable software in the biosciences

dc.contributor.advisor Myra B Cohen
dc.contributor.author McDevitt, Mikaela
dc.contributor.department Department of Computer Science
dc.date 2020-09-23T19:13:00.000
dc.date.accessioned 2021-02-25T21:35:25Z
dc.date.available 2021-02-25T21:35:25Z
dc.date.copyright Sat Aug 01 00:00:00 UTC 2020
dc.date.embargo 2021-02-28
dc.date.issued 2020-01-01
dc.description.abstract <p>Users of bioinformatics software tools range from bench scientists with little computational experience, to sophisticated developers. As the number and types of tools available to this diverse set of users grow, they are also increasing in flexibility. The customization of these tools makes them highly-configurable — where the end user is provided with many customization (configuration) options. At the same time, biologists and chemists are engineering living organisms by programming their DNA in a process that mimics software development. As they share their designs and promote re-use, their programs are also emerging as highly-configurable.</p> <p>As these bioscience systems become mainstream tools for the biology and bioinformatics communities, their dependability, reliability, and reproducibility becomes critical. Scientists are making decisions and drawing conclusions based on the software they use, and the constructs designed by synthetic biologists are being built into living organisms and used in the real world. Yet there is little help guiding users of bioinformatics tools or those building new synthetic organisms.</p> <p>As an end user equipped with minimal information, it is hard to predict the effect of changing a particular configuration option, yet the choice of configuration can lead to a large amount of variation in functionality and performance. Even if the configuration options make sense to an expert user, understanding all options and their interactions is difficult or even impossible to compute due to the exponential number of combinations. Similarly, synthetic biologists must choose how to combine small DNA segments. However, there can be millions of ways to combine these pieces, and determining the architecture can require significant domain knowledge.</p> <p>In this dissertation we address these challenges of interpreting the effects of configurability in two areas in the biosciences: (1) bioinformatics software, and (2) synthetic biology. We highlight the challenges of configurability in these areas and provide approaches to help users navigate their configuration spaces leading to more interpretable configurable software in the biosciences.</p> <p>First, we demonstrate there is variability in both the functional and performance outcomes of highly-configurable bioinformatics tools, and find previously undetected faults. We discuss the implications of this variability, and provide suggestions for developers. Second, we develop a user-oriented framework to identify the effect of changing configuration options in software, and communicate these effects to the end user in a simplistic format. We demonstrate our framework in a large study and compare to a state of the art method for performance-influence modeling in software.</p> <p>Last, we define a mapping of software product line engineering to the domain of synthetic biology resulting in organic software product lines. We demonstrate the potential reuse and existence of both commonality and variability in an open source synthetic biology repository. We build feature models for four commonly engineered biological functions and demonstrate how product line engineering can benefit synthetic biologists.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/18184/
dc.identifier.articleid 9191
dc.identifier.contextkey 19236756
dc.identifier.doi https://doi.org/10.31274/etd-20200902-103
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/18184
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/94336
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/18184/McDevitt_iastate_0097E_18905.pdf|||Fri Jan 14 21:38:03 UTC 2022
dc.subject.keywords Bioinformatics
dc.subject.keywords Configurability
dc.subject.keywords Interpretability
dc.subject.keywords Software engineering
dc.title Interpretability of configurable software in the biosciences
dc.type dissertation en_US
dc.type.genre dissertation en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
thesis.degree.discipline Computer Science
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
McDevitt_iastate_0097E_18905.pdf
Size:
3.54 MB
Format:
Adobe Portable Document Format
Description: