Institution: | Slovak University of Technology |
Technologies used: | C#.NET |
Inputs: | source code files |
Outputs: | ordered list of relevant functions |
Addressed problem
When programmers write new code, they are often interested in finding definitions of functions, existing, working fragments with the same or similar functionality, and reusing as much of that code as possible. Short fragments that are often returned by search engines as results to user queries do not give enough information to help programmers determine how to reuse them. Understanding code and determining how to use it, is a manual and time-consuming process. In general, programmers want to find initial points such as relevant functions. They want to easily understand how the functions are used and see the sequence of function invocations in order to understand how concepts are implemented.
Description
Our main goal is to enable programmers to find relevant functions to query terms and their usages. In our approach, identifying popular fragments is inspired by PageRank algorithm, where the “popularity” of a function is determined by how many functions call it. We designed a model based on the vector space model by which we are able to establish relevance among facts which content contains terms that match programmer’s queries.
The search phase (Figure 1) enables programmers to find relevant functions to query terms and subsequently to trace their usages. Searching consists of three main steps. First, (top) relevant documents are retrieved based on a similarity sim(dj, q) between documents (source code files) and programmer’s query q. Second, each document dj is divided into subdocuments, where each one contains only one definition of a function Fn contained in the “parent” document dj. For each subdocument djk, a similarity sim(djk, q) to the query q is calculated. Finally, an ordered list of relevant functions is obtained so that, for each function Fn (Fn ∈ dj), a final score sc(Fn, q) is calculated as the sum of the similarities and a PageRank score pr(Fn)