Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software-engineering life cycle. Researchers have proposed several methods primarily relying on information-retrieval techniques. Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiveness, attempts to improve the use of information retrieval by augmenting with software-engineering knowledge. In our previous work, we proposed the software-literature-context method for using software-engineering literature as a source of contextual information to detect duplicates. If bug reports relate to similar subjects, they have a better chance of being duplicates. Our method, being largely automated, has a potential to substantially decrease the level of manual effort involved in conventional techniques with a minor trade-off in accuracy.
In this study, we extend our work by demonstrating that domain-specific features can be applied across projects than project-specific features demonstrated previously while still maintaining performance. We also introduce a hierarchy-of-context to capture the software-engineering knowledge in the realms of contextual space to produce performance gains. We also highlight the importance of domain-specific contextual features through cross-domain contexts: adding context improved accuracy; Kappa scores improved by at least 3.8% to 10.8% per project.