Institution: | Slovak University of Technology |
Technologies used: | C#.NET |
Inputs: | Information tags or concepts |
Outputs: | Unlabelled relationships between information tags or concepts |
Addressed problem
The creation of information tags is the first step for effective search and retrieval of software artifacts, such as code snippets. The information tags also comprise terms representing various concepts of the software engineering. The relationships between them can be then exploited to further improve searching. However, with the absence of the domain model, the information tags are only individual entities, related to each other only by co-occurrence and matching terms. Thus, a layer of additional connections between the information tags representing other relationships (e.g. synonymic, taxonomic or paradigmatic relatedness) would help. The acquisition of these additional relationships is the aim of our method.
Description
The second method we propose aims to interconnect domain terms found in the information tags. We base our method on a game-based approach, which is designed for collecting term relationships in general domain. It is a web search query formulation game, in which players reduce the number of results returned by a search engine to a minimum, using specially formatted queries (e.g., “star –movie –wars –death”), which force them to reveal their perception of term relationships. The game utilizes the principle of negative search, in which the original set of web search results is stripped of a subset of results containing specific negative terms, to construct a term relationship network by mining the game query logs. At the start of the game, the player is given a task in the form of a positive query term that yields a certain number of search results. The player’s task is to reduce the number of results by adding proper negative terms to the given initial query term. The lower the final number of results, the better rank the player gets. In order to achieve the best results, players must enter negative terms that have high co-occurrences with the task term on the Web. This principle is also the key for term relationship networks acquisition since players interpret the co-occurrence of terms as a semantic relationship between them and vice versa. We tailor the game to the domain of software artifacts (acting as documents) and their semantics: the information tags (acting as search terms).