This page is intended for students who are interested in program analysis and transformation research and whom would like more information on the topic. This page may be used as a primer to getting involved in this research area.
There are essentially two ways to analyze programs. In each way, the input is the program text, e.g., a Java source code file (.java). If the analysis studies the program without executing it, that is called static analysis. This is very much related to compiler technology. A compiler, e.g., javac, is a program that receives a program text as input. It does not run the program, instead, it analyzes it and produces either an executable file or a file containing an intermediate representation., e.g., a bytecode file (.class). You can think of this activity as a translation from a higher level language into a lower level one. A great book to read about compiler technology is Compilers: Principles, Techniques, and Tools.
If, on the other hand, the analysis runs the program and studies what the program performs given various input (or test cases), that is called dynamic analysis. A dynamic analysis can either study the results of the program or a trace of the program, which may include the sequence of method calls, field accesses, etc. Dynamic analyses are also very helpful in finding information about a program and, often, can be more accurate and faster than static analyses. However, dynamic analysis is dependent on the test suite, i.e., the input to the studied program, whereas static analyses are independent of such input. As such, there is a trade-off between the two kinds of analyses and often times can be used in conjunction.
Since much of my research pertains to program transformation, of which the source code is required as input, I mainly deal with static analysis. Unlike a compiler, however, a general program transformation may not always transform programs between language levels. In fact, such a transformation may transform a program from a high-level language to a program in the same language. In this case, the meaning (or semantics) of the program remains the same but the text has changed. This is called a refactoring.
Refactoring is performed for different reasons but the main being to improve the structure of a program or to migrate the program to use new language features. Refactoring: Improving the Design of Existing is a great book that contains many common refactorings, i.e., ways to improve existing software. Refactoring to Design Patterns talks about refactorings that involving existing software by introducing design patterns, or “best practices” for software development (a great book related to patterns is Design Patterns: Elements of Reusable Object-Oriented Software.
Note that while the above resources describe program transformation, the steps portrayed are manual. Automated refactorings, one of the main focuses of my research, are programs that perform the refactorings automatically (or sometimes semi-automatically, occasionally needing help from the developer). Many automated refactorings have not only the program text as input but also additional input. For example, a rename refactoring requires not only the original program but the entity to be renamed and what the new name should be. Moreover, many automated refactorings are manifested as plug-ins into integrated development environments (IDEs), like Eclipse and NetBeans.
I have listed some additional resources below. Please also refer to my publications and software projects for explanations and examples. For a good example of automated refactoring, please refer to this paper and this project.