Crowd-based Information Tag Validation

Institution:	Slovak University of Technology
Technologies used:	C#.NET
Inputs:	objects with information tags of uncertain quality
Outputs:	confirmed information tags

Addressed problem

Various methods (using either automated or manual means) are creating information tag over software artifacts, such as code snippets. These metadata carry wide range of information about the objects they describe such as code smells, test coverage, design patterns, change risks etc. However, the information tags are not always correctly assigned. Firstly, as the underlying objects are changing, the validity of their metadata is deteriorating over time. Secondly the metadata might also be invalid from the start as a mistake of an individual that created them or error of an automated method. Our method aims to improve the correctness of the metadata layer over a software artifact corpus. It does so by using the crowd to identify invalidly assigned information tags.

Description

The main feature of our approach is a multi-choice question being answered by a worker. Depending on the user behavior, the method infers information about true validity of object-metadata relationship. In general, any type of object-metadata could be tested by this method. In our adaptation, the “question part” of the question is the software artifact (in general, any object). The “answer choices” are ad-hoc created objects comprising possible metadata, assignable to the “question part” object. One of the multiple choices (denoted as “correct choice”) is composed of metadata actually assigned to the object represented in the “question part”, the others are created (pseudo)randomly out of metadata assigned elsewhere in the corpus. Worker’s task is to identify the “correct” answer among the choices. He can answer “correctly” or “incorrectly” (meaning he identifies the originally assigned metadata choice among the “made-up” ones). This renders some information about the true validity of the metadata assignments: (1) if the user answers the question incorrectly, then it is a sign that the metadata in the “correct” choice might not be necessarily correct. (2) If the user answers correctly, then it is a sign that the metadata in the “correct” choice might be truly correct. We created several heuristics for estimating the correctness of metadata assigned to software artifacts (from the point of their general usability). All of these heuristics manipulate with so called support value – an expression of a probability that information tag is correctly or incorrectly assigned. This value is initially set to zero and is iteratively modified by heuristics triggered by players’ actions. When support value reaches a positive or negative threshold, the tag is excluded from the process (and the game) as either confirmed or rejected.

Crowd-based Information Tag Validation

Addressed problem

Description

Partners