Detecting code smells with SPNs

  • M el Bouazzaoui

Student thesis: Master's Thesis

Abstract

Software systems are getting more complex, leading to the risk of introducing technical debt [27], i.e., sub-optimal implementation decisions that provide short-term benefits but cause a decrease of software quality. The presence of technical debt usually also indicates the presence of code smells within that same system [15]. Code smells, introduced in [15], indicate that important software design and implementation principles were violated in the source code of a software application during its life cycle. These code smells lead to an increase in the complexity of the software, hence also leading to difficulties regarding the comprehensibility and maintenance of the software application.
Checking the source code to identify code smells manual is a time-consuming and complex process. This is mainly due to a lack of knowledge and the fact that the detection of potential code smells is prone to subjective interpretation by developers. Hence research has been performed in detecting code smells automatic to support developers. This led to a variety of tools, i.e., static analysis tools, implementing heuristic-based approaches that are simple and easy. However, this again leads to a lot of uncertainty during the process of identifying code smells. This was mainly because the list of potential code smells, produced by the static analysis tools, was also prone to subjective interpretation by developers. Therefore, manual inspection is still necessary when using these types of tools, thus making the entire process of detection still time-consuming. This in turn limits the adoption of code smell detection in practice.
To overcome the limitations of these types of tools researchers proposed various code smell detection mechanisms using machine learning techniques. For this study, a deep learning approach using sum-product networks (SPNs) is proposed to detect code smells. This is done by learning from both code metrics and word embeddings extracted from the source code. Several experiments have been carried out to detect the following three code smells Long Method, Feature Envy, and Large Class. The results were compared to a deep learning approach, using neural networks, to detect code smells as covered in [26]. This deep learning approach outperformed the state-of-the-art static analysis tools currently available. The results of the SPN models showed that detecting code smell Long Method performed at least as well as the deep learning approach defined in [26]. With regards to code smells Feature Envy and Large class the SPN models under-performed the deep learning approach. Therefore, more research is needed regarding the potential of SPNs to detect code smells. This further research should also include a more extensive dataset than was used for this study.
Furthermore, as of this writing no code smell detection tools are available that use SPNs. With several experiments, part of a case study, it is shown that SPNs do have potential regarding practical applicability. Therefore, to evaluate the practical applicability in a development session, a tool should be developed that employs SPNs to detect code smells.
Date of Award18 Nov 2022
Original languageEnglish
SupervisorArjen Hommersom (Examiner) & Harrie Passier (Co-assessor)

Master's Degree

  • Master Software Engineering

Cite this

'