Boa: Ultra-Large-Scale Software Repository and Source Code Mining

dc.contributor.author Dyer, Robert
dc.contributor.author Rajan, Hridesh
dc.contributor.author Nguyen, Hoan
dc.contributor.author Rajan, Hridesh
dc.contributor.author Nguyen, Tien
dc.contributor.department Computer Science
dc.date 2018-02-16T17:25:26.000
dc.date.accessioned 2020-06-30T01:57:05Z
dc.date.available 2020-06-30T01:57:05Z
dc.date.issued 2015-07-09
dc.description.abstract <p>In today’s software-centric world, ultra-large-scale software repositories, e.g. SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for mining software repository (MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large-scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse-grained, or sacrifice studying the history of the code. In this paper we address mining source code: a) at a very large scale; b) at a fine-grained level of detail; and c) with full history information. To address these challenges, we present domain-specific language features for source code mining in our language and infrastructure called Boa. The goal of Boa is to ease testing MSR-related hypotheses. Our evaluation demonstrates that Boa substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability.</p>
dc.identifier archive/lib.dr.iastate.edu/cs_techreports/374/
dc.identifier.articleid 1373
dc.identifier.contextkey 7309269
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath cs_techreports/374
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/20209
dc.source.bitstream archive/lib.dr.iastate.edu/cs_techreports/374/main.pdf|||Fri Jan 14 23:50:34 UTC 2022
dc.subject.disciplines Programming Languages and Compilers
dc.subject.disciplines Software Engineering
dc.subject.keywords mining
dc.subject.keywords software
dc.subject.keywords repository
dc.subject.keywords scalable
dc.subject.keywords ease of use
dc.subject.keywords lower barrier to entry
dc.subject.keywords Boa
dc.subject.keywords domain-specific languages
dc.title Boa: Ultra-Large-Scale Software Repository and Source Code Mining
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication 4e3f4631-9a99-4a4d-ab81-491621e94031
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
main.pdf
Size:
939.03 KB
Format:
Adobe Portable Document Format
Description:
Collections