matiNov2014

Speaker: Mati Shomrat

Title: public PhD lecture: Detecting Refactored Clones

Abstract:

Software systems tend to contain sections of code that are very similar, named code clones. Such code duplication may occur for a variety of reasons. Once duplicated, each copy takes on a life of its own, as different transformations may be applied to it. Some of these transformations are behavior-preserving (that is, refactorings), while others may modify the behavior of a copy.
Cloning, therefore, creates the risk of changes not being propagated to all copies and errors arising due to diverging clones.

Developers often copy and paste code to quickly implement functionalities that have been implemented before. With the proliferation of open-source repositories, this kind of reuse is easier than ever. Sometimes, however, the code is copied illegally; this can be either intentional plagiarism or, given the complexity of software licenses, it can also be the result of an innocent mistake on the part of a developer believing that a certain piece of code can be legally copied. Software-development companies need to protect themselves against both kinds of violations.

The availability of automated refactoring support in modern development environments further complicates the task of clone detection, as these tools make it very easy for developers (and plagiarists) to make significant and wide-ranging syntactic changes to code without changing its functionality.

We present Cider, a general tool for the identification of refactored clones. Cider is a semantic clone detector, based on a graph representation of programs. The graph abstraction allows Cider to detect semantically similar code fragments, while abstracting away from the concrete syntax, thus avoiding the syntactic effects of refactoring. Refactorings may change not only the intraprocedural structure of code, but its interprocedural organization as well. Some refactorings, such as Extract Method and Introduce Factory, introduce new methods, while others, such as Inline Method, remove method calls and may remove the called methods altogether. Cider is able to cope with such interprocedural refactorings, and is unique in doing so.

Cider was evaluated on several open-source projects. The results suggest that interprocedural clones are ubiquitous, demonstrating the pervasive nature of the problem.