Identify algorithms from code

Thumbnail Image
Date
2019-01-01
Authors
Leslie, Stroh
Major Professor
Advisor
Wei . Le
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

Journal Issue
Is Version Of
Versions
Series
Department
Abstract

Choosing an algorithm to use can depend on a variety of factors such as runtime, space, and

problem requirements. Many algorithms already have tested implementations in open source code.

Reusing or interchanging algorithms can help save development time and improve the performance

of applications.

Existing code search techniques often rely heavily on natural language components of the code.

Simple techniques, such as Grep, are sensitive to the naming choices and conventions in code. Grep

in particular do not precisely find implementations, outputting single lines. Grep does not rank

the result, and is subject to lots of noise.

We develop a technique to search for algorithms in code using existing pseudo code as a query.

We leverage the structural, mathematical and natural language components of pseudo code to find

its corresponding implementation in code. This approach defines a simple language to represent

pseudo code with atoms that include different features of the algorithm. We then use these features

to search code using a bounding box and extract the code snippet that contains the functionality

of the pseudo code.

We collected 19 different repositories in both C and Java and searched for 27 different algorithms.

Using our technique we found over 60 algorithm implementations in roughly 1.8 million lines of

code. We also conduct a comparison of our tool against a search implementation using a popular

enterprise search platform Apache Solr and show our approach can find more algorithms with high

rank.

Comments
Description
Keywords
Citation
DOI
Source
Subject Categories
Copyright
Sun Dec 01 00:00:00 UTC 2019