A hybrid approach for selecting and optimizing graph traversal strategy for analyzing big code

dc.contributor.advisor Hridesh Rajan
dc.contributor.author Ramu, Ramanathan
dc.contributor.department Computer Science
dc.date 2018-08-11T11:40:21.000
dc.date.accessioned 2020-06-30T03:09:25Z
dc.date.available 2020-06-30T03:09:25Z
dc.date.copyright Sun Jan 01 00:00:00 UTC 2017
dc.date.embargo 2001-01-01
dc.date.issued 2017-01-01
dc.description.abstract <p>Our newfound ability to analyze source code in massive software repositories such as GitHub has led to an uptick in data-driven solutions to software engineering problems. Source code analysis is often realized as traversals over source code artifacts represented as graphs. Since the number of artifacts that are analyzed is huge, in millions, the efficiency of the source code analysis technique is very important. The performance of source code analysis techniques heavily depends on the order of nodes visited during the traversals: the traversal strategy. For instance, selecting the best traversal strategy and optimizing it for a software engineering task, that infers the temporal specification between pairs of API method calls, could reduce the running time on a large codebase from 64% to 96%. While, there exists several choices for traversal strategy, like depth-first, post-order, reverse post-order, etc., there exists no technique to choose the most time-efficient strategy for traversals. In this paper, we show that a single traversal strategy does not fit all source code analysis scenarios. Somewhat more surprisingly, we demonstrate that given the source code expressing the analysis task (in a declarative form) one can compute static characteristics of the task, which together with the runtime characteristics of the input, can help predict the most time-efficient traversal strategy for that (analysis task, input) pair. We also demonstrate that these strategies can be realized in a manner that is effective in accelerating ultra-large-scale source code analysis. Our evaluation shows that our technique successfully selected the most time-efficient traversal strategy for 99.99%-100% of the time and using the selected traversal strategy and optimizing it, the running times of a representative collection of source code analysis in our evaluation were considerably reduced by 1%-28% (13 minutes to 72 minutes in absolute time) when compared against the best performing traversal strategy. The case studies show that hybrid traversal reduces 80--175 minutes in running times for two software engineering tasks. The overhead imposed by collecting additional information for our approach is less than 0.2% of the total running time for a large dataset that contains 287K</p> <p>Control Flow Graphs (CFGs) and less than 0.01% for an ultra-large dataset that contains 162M CFGs.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/16202/
dc.identifier.articleid 7209
dc.identifier.contextkey 11457166
dc.identifier.doi https://doi.org/10.31274/etd-180810-5831
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/16202
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/30385
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/16202/Ramu_iastate_0097M_16328.pdf|||Fri Jan 14 20:56:41 UTC 2022
dc.subject.disciplines Computer Sciences
dc.title A hybrid approach for selecting and optimizing graph traversal strategy for analyzing big code
dc.type article
dc.type.genre thesis
dspace.entity.type Publication
relation.isOrgUnitOfPublication f7be4eb9-d1d0-4081-859b-b15cee251456
thesis.degree.discipline Computer Science
thesis.degree.level thesis
thesis.degree.name Master of Science
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
5.6 MB
Adobe Portable Document Format