Cross-language binary-to-source code matching approach using graph neural networks

Thumbnail Image
Date
2023-08
Authors
Chen, Hanze
Major Professor
Advisor
Jannesari, Ali
Huai, Mengdi
Basu, Samik
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Computer Science
Abstract
Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there are existing methods that try to match source code with binary code to accelerate the reverse engineering process, most are designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still struggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. This paper presents GraphBinMatch, an innovative machine-learning application developed to detect code similarity and clones across different programming languages. The graph neural network-based method presented here addresses challenges in binary-to-source code matching across different programming languages and significantly improves existing methodologies. GraphBinMatch can effectively learn similarities between binary and source codes and accommodate a broad range of programming languages. We evaluated GraphBinMatch across multiple tasks and found that it significantly outperforms current methodologies, with up to a 21\% increase in F1 scores and a 39\% increase in recall compared to state-of-the-art research. Moreover, we demonstrate GraphBinMatch's versatility by introducing an OpenMP bug detection application. While GraphBinMatch is promising, we also candidly discuss its limitations and potential for future work. This comprehensive view offers promising directions for future research.
Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright