Bringing ultra-large-scale software repository mining to the masses with Boa

dc.contributor.advisor Hridesh Rajan Dyer, Robert
dc.contributor.department Computer Science 2018-08-11T09:12:49.000 2020-06-30T02:50:31Z 2020-06-30T02:50:31Z Tue Jan 01 00:00:00 UTC 2013 2015-07-30 2013-01-01
dc.description.abstract <p>Mining software repositories provides developers and researchers a</p> <p>chance to learn from previous development activities and apply that</p> <p>knowledge to the future. Ultra-large-scale open source repositories</p> <p>(e.g., SourceForge with 350,000+ projects, GitHub with 250,000+</p> <p>projects, and Google Code with 250,000+ projects) provide an extremely</p> <p>large corpus to perform such mining tasks on. This large corpus allows</p> <p>researchers the opportunity to test new mining techniques and</p> <p>empirically validate new approaches on real-world data. However, the</p> <p>barrier to entry is often extremely high. Researchers interested in</p> <p>mining must know a large number of techniques, languages, tools, etc,</p> <p>each of which is often complex. Additionally, performing mining at</p> <p>the scale proposed above adds additional complexity and often is</p> <p>difficult to achieve.</p> <p>The Boa language and infrastructure was developed to solve these</p> <p>problems. We provide users a domain-specific language tailored for</p> <p>software repository mining and allow them to submit queries via our</p> <p>web-based interface. These queries are then automatically</p> <p>parallelized and executed on a cluster, analyzing a dataset containing</p> <p>almost 700,000 projects, history information from millions of</p> <p>revisions, millions of Java source files, and billions of AST nodes.</p> <p>The language also provides an easy to comprehend visitor syntax to</p> <p>ease writing source code mining queries. The underlying</p> <p>infrastructure contains several optimizations, including query</p> <p>optimizations to make single queries faster as well as a fusion</p> <p>optimization to group queries from multiple users into a single query.</p> <p>The latter optimization is important as Boa is intended to be a</p> <p>shared, community resource. Finally, we show the potential benefit of</p> <p>Boa to the community by reproducing a previously published case</p> <p>study and performing a new case study on the adoption of Java language</p> <p>features.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 4560
dc.identifier.contextkey 5050393
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/13553
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 19:55:24 UTC 2022
dc.subject.disciplines Computer Sciences
dc.title Bringing ultra-large-scale software repository mining to the masses with Boa
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456 dissertation Doctor of Philosophy
Original bundle
Now showing 1 - 1 of 1
1.53 MB
Adobe Portable Document Format