Benchmarking the Impact of Contextual Information on OpenMP Code Generation by Large Language Models
Date
2025-05
Authors
Azhar, Muhammad Adistya
Major Professor
Mitra, Simanta
Advisor
Committee Member
Mitra
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The rapid development of tools based on artificial intelligence has been driven by the widespread adoption of large language models (LLMs). These models have opened up new opportunities for automating tasks that were previously considered manual and labor-intensive, enabling end-to-end automation. For example, LLMs have been applied in various industries to power chatbots that assist customers by answering questions about products. A key technique utilized in many of these applications is retrieval augmented generation (RAG). Existing benchmarks for evaluating the performance of LLMs in RAG have typically focused on documents from Wikipedia and news websites. In this study, we extend the concept of RAG to retrieval augmented code generation (RACG), aimed at helping LLMs generate domain-specific OpenMP source code. Code generation in this context is particularly challenging due to the need for precise handling of function signatures, method calls, and other domain-specific constructs. We use contextual information from Stackoverflow posts and GitHub repositories to aid in this task. To assess code generation performance, we evaluated 15 LLMs of varying sizes and computed the CodeBLEU and CodeBERT metric. Our findings show that (1) most LLMs can generate OpenMP code effectively without context, (2) GitHub code snippets provide more useful context
for LLMs than Stackoverflow posts, (3) context helps smaller and non-code-focused models improve their ability to generate OpenMP source code, and (4) large input prompts exceeding the context size hurts LLM performance.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Computer Science
Type
Text
Comments
Rights Statement
Attribution 3.0 United States
Copyright
2025