Lectures
- Introduction
- Java Overview
- Eclipse, OSGi, and the Java Model
- Abstract Syntax Trees (ASTs) and the Visitor Pattern
- Assignment 1
- Control Flow Analysis
- Attribute Grammars
- Reengineering and Refactoring
- Eclipse Refactoring
- Call Graphs
Assignments
Git & GitHub Fundamentals
This assignment aims to give you a brief introduction to Git and GitHub. We’ll be using Git and GitHub Classroom for various assignments. This is a refresher assignment that is not graded and is optional.
Intermediate Representations
Two parts. The first part is not graded.
Part 1: AST Background
- Read about ASTs in Wikipedia.
- Read the Javadoc on the Eclipse AST type.
- Read Eclipse Java Development Tools (JDT) Fundamentals.
- Complete the Eclipse Plug-in Development “Create an Eclipse plug-in” tutorial.
- Make sure you have the Eclipse PDE plugin in your Eclipse installation.
- If you do, you’ll see “Welcome to Eclipse for RCP and RAP Developers” on the welcome screen. Select tutorials.
- Under “Eclipse Plug-in Development,” choose “Create an Eclipse plug-in.” Go through that tutorial. Let me know how it goes.
- Create simple plugins in Eclipse using built-in templates.
- Please use as many templates as possible to get a good feel of the process. Let me know if you have any questions.
- You may find this article helpful.
- Read “The Language Toolkit: An API for Automated Refactorings in Eclipse-based IDEs.”
Part 2: WALA IR
Complete the WALA Quick Start assignment.
Visitors and Plug-ins
Part 1: Visitors in WALA
Redo Intermediate Representations, Part 2 using the visitor pattern. Specifically, create your visitor class by subclassing com.ibm.wala.ssa.SSAInstruction.Visitor
. Then, change cta.Main.main(String[])
to call com.ibm.wala.ssa.IR.visitNormalInstructions(Visitor)
instead of calling cta.Main.checkInstruction(SSAInstruction)
for each instruction. Commit and push your code to your original repositories. Again answer the last two questions of Intermediate Representations, Part 2.
Hint
Should these numbers change?
Part 2: Building an AST Plug-in
Complete the Building an AST Eclipse Plug-in assignment. Once it works, find a medium-sized open-source Java project to run your plugin on. You may want to explore GitHub. Import the project into Eclipse and run your plugin on it. Report on the following, which may require you to change some of the source code so that it is convenient:
- Project name.
- Project URL.
- Project description.
- The number of classes in the project.
- The number of user-defined methods in the project.
- For each class, the number of method calls.
- Statistics about the method calls:
- The total number of method calls for all methods in the project.
- The minimum number of method calls for any given method in the project.
- The maximum number of method calls for any given method in the project.
- The mean number of method calls for any given method in the project.
- The standard deviation of the above.
- The mode number of method calls for any given method in the project.
- The medium number of method calls for any given method in the project.
Sample Output
Resources
The below-listed resources may help complete this assignment.
More AST Analysis
Extend your AST plug-in from the previous assignment to do the following:
List all the fields in a method. When the user right-clicks on a user-defined method, the plug-in should produce a report showing all fields accessed in the method, what each of their types is, how many times they are used in the method, whether it’s a class or instance field, which fields are read from, which fields are written to, the total number of fields, the most frequently used field, and the least frequently used field.
Use a different project from the previous assignment.
Hint
Ensure that you build the AST with “source symbol bindings.” As discussed in class, this “overlays” the concrete (parse) tree with the AST, and the bindings include type information.
Instruction Design
Consider the following example goto shrike instruction:
goto (from iindex= 106 to iindex = 142)
Given that each instruction has an index, meaning that the goto instruction above itself has an index, why is a “from index” used in the argument above?
Control Flow Analysis
- (10 pts) Consider a control-flow graph G with an entry node ENTRY (a node without predecessors). Assume that every other node in G is reachable from ENTRY. Consider an arbitrary node n ≠ ENTRY and let DOM(n) be the set of nodes m such that (1) m ≠ n and (2) m dominates n. (Note that in the paper by Cooper et al. discussed in class, DOM(n) contains n by default, while in this problem we exclude n from DOM(n).)
- Can DOM(n) be empty? Why?
- Suppose that G is acyclic. Consider two distinct nodes p, q ∈ DOM(n). Prove that p ∈ DOM(q) or q ∈ DOM(p). (Hint: try a proof by contradiction.)
- Suppose that G contains cycles. We again want to prove p ∈ DOM(q) or q ∈ DOM(p). Does your proof from the previous bullet still work? If yes, why? If no, generalize your proof.
- Consider two distinct nodes p,q ∈ DOM(n). In a general G with cycles, is it possible that p ∈ DOM(q) and q ∈ DOM(p)? If yes, provide an example. If no, provide a proof.
- (5 pts) Prove that dominance is a transitive relation.
- (5 pts) Consider a node n ≠ ENTRY and all its predecessor nodes mi where i = 1,…,k and k > 0. Consider any node p distinct from n and distinct from all mi. Prove that p dominates n if it dominates all mi.
Credits
This assignment is based on an assignment by Atanas Rountev.
Working with Control Flow Graphs
Complete the Working with Control Flow Graphs assignment.
Dominator Trees
This assignment extends the Working with Control Flow Graphs assignment. You can use the same git repository to complete this assignment; tag where the last assignment ends before you start this assignment:
git tag working_with_control_flow_graphs
git push --tags
For this assignment, we will create dominator trees from the CFGs you created in the last assignment. Build dominator trees from your collected CFGs using WALA’s dominator API. After computing the dominator tree for the CFG, calculate a set of numbers {count_0, count_1, count_2, … }. count_0 is the number of nodes with 0 children in the dominator tree (i.e., the number of leaves). count_1 is the number of nodes with 1 child in the dominator tree. In general, count_k is the number of nodes with k children in the tree. There exists some point after which count_k = 0 (for sufficiently large k). Compute the counts only up to the most significant non-zero value. As a sanity check, make sure that the sum of all counts is equal to the number of nodes in the CFG.
Once all CFGs are processed and their sets {count_0,count_1, count_2, … } are computed, add all count_0 values for all CFGs to obtain a single number COUNT_0
for the entire program. Similarly, compute COUNT_1
, COUNT_2
, etc.
By the deadline listed in Blackboard, commit and push the source code to your repository. Then, submit in Blackboard a single text file report.txt containing:
COUNT_0: ...
COUNT_1: ...
COUNT_2: ...
...
up to the largest non-zero COUNT_k
.
Credits
This assignment is based on an assignment by Atanas Rountev.
Attribute Grammars
Write the parse tree for let x = 1 in let x = (x+1) in (x+2) end end
.
Presentations
Details
- Presentations will start on 11/23 and go through 12/7.
- Each talk will be 45 minutes long. An additional 10 minutes will be allotted for questions.
- We will have two talks each week.
- Students may lobby the instructor if there is an unlisted paper related to the student’s interest in programming languages and software engineering.
Potential Papers
- Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints
- Program Analysis as Model Checking of Abstract Interpretations (paper)
- IvySyn: Automated Vulnerability Discovery for Deep Learning Frameworks
- HARP: holistic analysis for refactoring Python-based analytics programs
- Ariadne: analysis for machine learning programs
- An Empirical Study on Performance Bugs in Deep Learning Frameworks
- Automated Testing of Software that Uses Machine Learning APIs
- Are Machine Learning Cloud APIs Used Correctly?
- CARGO: AI-Guided Dependency Analysis for Migrating Monolithic Applications to Microservices Architecture
- Reformulator: Automated Refactoring of the N+1 Problem in Database-Backed Applications
- Practically Correct, Just-in-Time Shell Script Parallelization
- On decomposing a deep neural network into modules
- Synthesis-Powered Optimization of Smart Contracts via Data Type Refactoring
- Automated transpilation of imperative to functional code using neural-guided program synthesis
- Overwatch: Learning Patterns in Code Edit Sequences
- DeepStability: A Study of Unstable Numerical Methods and Their Solutions in Deep Learning
- DeepDiagnosis: Automatically Diagnosing Faults and Recommending Actionable Fixes in Deep Learning Programs
- DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs
- Green AI: Do Deep Learning Frameworks Have Different Costs?
- Refty: Refinement Types for Valid Deep Learning Models
- Towards Training Reproducible Deep Learning Models
- Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python
- Static Inference Meets Deep Learning: A Hybrid Type Inference Approach for Python
Project
Details
Final projects can be something related to your research (but also related to our course in some way, e.g., integrating machine learning with programming languages), completely separate from your research (but again also related to our course), an annotated bibliography on a new topic you’d like to explore, etc. Final projects intend to spark an idea leading to a research (conference) paper.
Final projects are due the day of our scheduled final exam. That is our final exam day (see the syllabus). This project has no late policy, as grades are due shortly afterward.
Please make an appointment to see me to discuss your project or come during office hours. If those times don’t work for you, please contact me for more times.
I must approve of projects once you decide to pursue them. Below is a list of potential projects, but you can also propose new projects.
Potential Projects
- Create a “simple” refactoring for Java in Eclipse.
- Use the “Software analysis and automated refactoring” lecture notes.
- One refactoring could be to move all local variable declaration statements in method bodies to the closest possible location to where they are first used.
- Other refactoring resources can be found on my research background page.
- Static analysis for Python in WALA.
- Repeat the WALA Quick Start (and subsequent ones, perhaps) but for Python. Use either WALA Ariadne or WALA-Python.
- Static analysis for Python in Scalpel.
- Use Scalpel to create some exciting analyses for Python programs.
- Use CAst to extend WALA to analyze a new language.
- May not be able to do all constructs for that language.
- Annotated bibliography (10-15 papers, 1-page summary each, good for finding future research ideas) on a research topic related to our course. Possible topics include:
- Refactoring.
- Semantics preservation.
- Machine Learning and Software Engineering.
- Machine Learning and Programming Languages.
- Type inference for dynamic programming languages.
- Static analysis of dynamic programming languages.
- A customized project related to your ongoing research (requires instructor approval and consultation).
Resources
- Compilers: Principles, Techniques, & Tools A computer science textbook by Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman about compiler construction for programming languages.
- Core Java SE 9 for the Impatient A book by my favorite Java author.
- Design Patterns: Elements of Reusable Object-Oriented Software Capturing a wealth of experience about the design of object-oriented software, four top-notch designers present a catalog of simple and succinct solutions to commonly occurring design problems.
- Refactoring to Patterns Introduces the theory and practice of pattern-directed refactorings: sequences of low-level refactorings that allow designers to safely move designs to, towards, or away from pattern implementations.
- Refactoring: Improving the Design of Existing Code The de facto book on software refactoring.
- Installing Linux on Windows 10 Instructions for installing a linux subsystem on a windows platform.
- Proof templates
- Research background Possibly useful links to various resources.
- An Illustrated Guide to Git on Windows
- Git documentation Documentation and reference on the popular Git versioning system.
- Git for Computer Scientists A quick introduction to Git internals.
- Git Magic Rough instructions for particular Git effects.
- Git ready Learn git one commit at a time.
- Resolving a merge conflict using the command line How to resolve merge conflicts using the command line and a text editor.
- try.github.com Git quickstart.
- Syncing a fork Sync a fork of a repository to keep it up-to-date with the upstream repository on GitHub.
- Creating an annotated bibliography How to create an annotated bibliography in LaTeX.
- How to give a great research talk Simple guidelines from Simon Peyton Jones of Microsoft Research for giving research talks.
- Annotated bibliography samples Sample annotations from annotated bibliographies, each with a different research project.
- GitHub Education Resources for GitHub as used in the classroom.
- GitHub Education Student Developer Pack The best developer tools, free for students.
- LaTeX Homework Templates Some templates for writing your homework in latex.
- Library remote access Accessing library material from home. You may need these instructions to access protected papers.
- WALA T.J. Watson Libraries for Analysis.
- Eclipse Eclipse is a powerful and extendable Integrated Development Environment (IDE) for many languages, including Java.
- org.eclipse.jdt.astview – AST View A view to visualize the AST (abstract syntax tree) of a Java file open in the editor.
- org.eclipse.jdt.jeview – JavaElement View A view to visualize the Java elements in Eclipse's Java model.
- GitKraken GitKraken is a Git GUI client for Windows, Mac, and Linux.
- Gitless An experimental version control system built on top of Git.
- Sourcetree Sourcetree simplifies how you interact with your Git repositories so you can focus on coding.
- LaTeX Homework Templates Some templates for writing your homework in latex.
- LaTeX proof package Write derivations in LaTeX.
- Blackboard Mobile Learn™ for Android Blackboard Mobile Learn™ makes it easier for you to keep up with your courses by letting you access them whenever and wherever you want.
- Blackboard Mobile Learn™ for iOS Blackboard Mobile Learn™ makes it easier for you to keep up with your courses by letting you access them whenever and wherever you want.
- UNIX/Linux Tutorial for Beginners A beginners guide to the Unix and Linux operating system. Eight simple tutorials which cover the basics of UNIX/Linux commands.
- Eclipse JDT – Abstract Syntax Tree (AST) and the Java Model Usage of the Abstract Syntax Tree (AST) and the Java model in Eclipse.
- gittutorial Explains how to import a new project into Git, make changes to it, and share changes with other developers.
- Understanding Git Conceptually A tutorial on the Git version control system.
- GitHub Classroom Our "classroom" on GitHub.
- GitHub Organization Our "organization" on GitHub.