Home »

Archives

Subscribe

Archives

Categories

Attribution-NonCommercial-ShareAlike 4.0 International

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

Software Evolution for Machine Learning

Refactorings and Technical Debt in Machine Learning Systems

Introduction

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap in knowledge about how ML systems actually evolve and are maintained. In this project, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and propose several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.

Publications

Below is a list of publications. My and my research students‘ names are boldfaced, undergraduate students are italicized, and female students are underlined:

Yiming TangRaffi Khatchadourian, Mehdi Bagherzadeh, Rhia SinghAjani Stewart, and Anita Raja. An empirical study of refactorings and technical debt in Machine Learning systems. In International Conference on Software Engineering, ICSE ’21, pages 238–250. IEEE/ACM, IEEE, May 2021. (138/615; 22% acceptance rate). [ bib | DOI | data | slides | http ]

Study Data

DOI

Our dataset is hosted on Zenodo.

Presentations

Automated Refactoring of Legacy Java Software to Enumerated Types

Introduction

Screenshot

Modern Java languages introduce several new features that offer significant improvements over older Java technology. In this project, we consider the new enum construct, which provides language support for enumerated types. Prior to recent Java languages, programmers needed to employ various patterns (e.g., the weak enum pattern) to compensate for the absence of enumerated types in Java. Unfortunately, these compensation patterns lack several highly-desirable properties of the enum construct, most notably, type safety. We present a novel fully-automated approach for transforming legacy Java code to use the new enumeration construct. This semantics-preserving approach increases type safety, produces code that is easier to comprehend, removes unnecessary complexity, and eliminates brittleness problems due to separate compilation. At the core of the proposed approach is an interprocedural type inference algorithm that tracks the flow of enumerated values. The algorithm was implemented as an open-source, publicly available Eclipse plug-in and evaluated experimentally on 17 large Java benchmarks. Our results indicate that analysis cost is practical and the algorithm can successfully refactor a substantial number of fields to enumerated types. This work is a significant step towards providing automated tool support for migrating legacy Java software to modern Java technologies.

Researchers

Name Affiliation Email
Raffi Khatchadourian Hunter College, City University of New York raffi.khatchadourian@hunter.cuny.edu
Jason Sawin University of St. Thomas jason.sawin@stthomas.edu
Atanas Rountev Ohio State University rountev@cse.ohio-state.edu
Benjamin Muskalla Tasktop Technologies benjamin.muskalla@tasktop.com

Publications

My and my research students‘ names appear in boldface. Undergraduate students appear in italics.

Raffi Khatchadourian. Automated refactoring of legacy Java software to enumerated types. Automated Software Engineering, 24(4):757–787, December 2017. [ bib | DOI | http ]

Raffi Khatchadourian and Benjamin Muskalla. Enumeration refactoring: A tool for automatically converting Java constants to enumerated types. In International Conference on Automated Software Engineering, ASE ’10, pages 181–182, New York, NY, USA, September 2010. IEEE/ACM. (18/45; 40% acceptance rate). [ bib | DOI | tool | slides | http ]

Raffi Khatchadourian, Jason Sawin, and Atanas Rountev. Automated refactoring of legacy Java software to enumerated types. In International Conference on Software Maintenance, ICSM 2007, pages 224–233. IEEE, October 2007. (46/214; 21% acceptance rate). [ bib | DOI | slides | http ]

Raffi Khatchadourian, Jason Sawin, and Atanas Rountev. Automated refactoring of legacy Java software to enumerated types. Technical Report OSU-CISRC-4/07-TR26, Ohio State University, April 2007. [ bib | .pdf ]

Research Prototype

Our research prototype may be found on GitHub.

Presentations