Identify algorithms from code

dc.contributor.advisor Wei . Le Leslie, Stroh
dc.contributor.department Computer Science 2020-02-12T22:57:28.000 2020-06-30T03:20:30Z 2020-06-30T03:20:30Z Sun Dec 01 00:00:00 UTC 2019 2021-12-03 2019-01-01
dc.description.abstract <p>Choosing an algorithm to use can depend on a variety of factors such as runtime, space, and</p> <p>problem requirements. Many algorithms already have tested implementations in open source code.</p> <p>Reusing or interchanging algorithms can help save development time and improve the performance</p> <p>of applications.</p> <p>Existing code search techniques often rely heavily on natural language components of the code.</p> <p>Simple techniques, such as Grep, are sensitive to the naming choices and conventions in code. Grep</p> <p>in particular do not precisely find implementations, outputting single lines. Grep does not rank</p> <p>the result, and is subject to lots of noise.</p> <p>We develop a technique to search for algorithms in code using existing pseudo code as a query.</p> <p>We leverage the structural, mathematical and natural language components of pseudo code to find</p> <p>its corresponding implementation in code. This approach defines a simple language to represent</p> <p>pseudo code with atoms that include different features of the algorithm. We then use these features</p> <p>to search code using a bounding box and extract the code snippet that contains the functionality</p> <p>of the pseudo code.</p> <p>We collected 19 different repositories in both C and Java and searched for 27 different algorithms.</p> <p>Using our technique we found over 60 algorithm implementations in roughly 1.8 million lines of</p> <p>code. We also conduct a comparison of our tool against a search implementation using a popular</p> <p>enterprise search platform Apache Solr and show our approach can find more algorithms with high</p> <p>rank.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 8735
dc.identifier.contextkey 16524976
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/17728
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 21:28:11 UTC 2022
dc.subject.disciplines Computer Sciences
dc.subject.keywords Algorithms
dc.subject.keywords Code Search
dc.title Identify algorithms from code
dc.type article
dc.type.genre thesis
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456 Computer Science thesis Master of Science
Original bundle
Now showing 1 - 1 of 1
1.89 MB
Adobe Portable Document Format