Archives

Software Evolution for Machine Learning

Refactorings and Technical Debt in Machine Learning Systems

Introduction

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap in knowledge about how ML systems actually evolve and are maintained. In this project, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and propose several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.

Researchers

Name	Affiliation
Yiming Tang	City University of New York (CUNY) Graduate Center
Raffi Khatchadourian	City University of New York (CUNY) Hunter College
Mehdi Bagherzadeh	Oakland University
Rhia Singh	City University of New York (CUNY) Macaulay Honors College
Ajani Stewart	City University of New York (CUNY) Macaulay Honors College
Anita Raja	City University of New York (CUNY) Hunter College

Publications

Below is a list of publications. My and my research students‘ names are boldfaced, undergraduate students are italicized, and female students are underlined:

Yiming Tang, Raffi Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, and Anita Raja. An empirical study of refactorings and technical debt in Machine Learning systems. In International Conference on Software Engineering, ICSE ’21, pages 238–250. IEEE/ACM, IEEE, May 2021. (138/615; 22% acceptance rate). [ bib | DOI | data | slides | http ]

Study Data

Our dataset is hosted on Zenodo.

Presentations

Automated Refactoring of Legacy Java Software to Enumerated Types

Introduction

Modern Java languages introduce several new features that offer significant improvements over older Java technology. In this project, we consider the new enum construct, which provides language support for enumerated types. Prior to recent Java languages, programmers needed to employ various patterns (e.g., the weak enum pattern) to compensate for the absence of enumerated types in Java. Unfortunately, these compensation patterns lack several highly-desirable properties of the enum construct, most notably, type safety. We present a novel fully-automated approach for transforming legacy Java code to use the new enumeration construct. This semantics-preserving approach increases type safety, produces code that is easier to comprehend, removes unnecessary complexity, and eliminates brittleness problems due to separate compilation. At the core of the proposed approach is an interprocedural type inference algorithm that tracks the flow of enumerated values. The algorithm was implemented as an open-source, publicly available Eclipse plug-in and evaluated experimentally on 17 large Java benchmarks. Our results indicate that analysis cost is practical and the algorithm can successfully refactor a substantial number of fields to enumerated types. This work is a significant step towards providing automated tool support for migrating legacy Java software to modern Java technologies.

Researchers

Name	Affiliation	Email
Raffi Khatchadourian	Hunter College, City University of New York	raffi.khatchadourian@hunter.cuny.edu
Jason Sawin	University of St. Thomas	jason.sawin@stthomas.edu
Atanas Rountev	Ohio State University	rountev@cse.ohio-state.edu
Benjamin Muskalla	Tasktop Technologies	benjamin.muskalla@tasktop.com

Publications

My and my research students‘ names appear in boldface. Undergraduate students appear in italics.

Raffi Khatchadourian. Automated refactoring of legacy Java software to enumerated types. Automated Software Engineering, 24(4):757–787, December 2017. [ bib | DOI | http ]

Raffi Khatchadourian and Benjamin Muskalla. Enumeration refactoring: A tool for automatically converting Java constants to enumerated types. In International Conference on Automated Software Engineering, ASE ’10, pages 181–182, New York, NY, USA, September 2010. IEEE/ACM. (18/45; 40% acceptance rate). [ bib | DOI | tool | slides | http ]

Raffi Khatchadourian, Jason Sawin, and Atanas Rountev. Automated refactoring of legacy Java software to enumerated types. In International Conference on Software Maintenance, ICSM 2007, pages 224–233. IEEE, October 2007. (46/214; 21% acceptance rate). [ bib | DOI | slides | http ]

Raffi Khatchadourian, Jason Sawin, and Atanas Rountev. Automated refactoring of legacy Java software to enumerated types. Technical Report OSU-CISRC-4/07-TR26, Ohio State University, April 2007. [ bib | .pdf ]

Research Prototype

Our research prototype may be found on GitHub.

Raffi Khatchadourian

Archives

Pages

Recent News

Subscribe

Archives

Categories

Meta

Software Evolution for Machine Learning

Refactorings and Technical Debt in Machine Learning Systems

Introduction

Researchers

Publications

Study Data

Presentations

Automated Refactoring of Legacy Java Software to Enumerated Types

Introduction

Researchers

Publications

Research Prototype

Presentations

Need help with the Commons?

Raffi Khatchadourian

Archives

Pages

Recent News

Subscribe

Archives

Categories

Tags

Meta

Software Evolution for Machine Learning

Refactorings and Technical Debt in Machine Learning Systems

Introduction

Researchers

Publications

Study Data

Presentations

Automated Refactoring of Legacy Java Software to Enumerated Types

Introduction

Researchers

Publications

Research Prototype

Presentations