For me personally, I quite enjoy learning math and theory (however I really suck at test taking). I find that working on a difficult problem set is extremely satisfying when you somehow contort your brain in just the right way to get that epiphany moment. However, theory is not the only thing expected of most computer scientists (unfortunate) especially in industry. Project courses, especially when working with groups, are a great way to learn how to actually use version control for real, debug code you didn’t write yourself, and so many other things.

As I am turning in my final project of my undergraduate career I thought I’d give an overview of how some of my course projects turned out.

Reducing Cache Pollution at Compile Time

You can read the report here.

This is the project that I just turned in for the course 15-745 Optimizing Compilers for Modern Architectures. My partner and I knew we wanted to do a project that was somehow related to architecture. The project timeline was 6 weeks long, but we didn’t really figure out how to proceed with our original idea until about 3 weeks in.

The idea is to use non-temporal memory instructions, which do not write cache lines back to the cache, in order to reduce bus traffic within and around cache operations. This would reduce the total bus traffic during program execution thus reducing the pressure on the memory bandwidth of a processor and decreasing total execution time.

The problem with using non-temporal memory instructions is that if a piece of data is to be reused very soon, and the writing process bypasses the cache, then it basically completely removes the benefits of caching at all. So our compiler analysis had to figure out when it was suitable and beneficial to actually use a non-temporal memory instruction.

CNN Interpretability

You can read the report here.

10-707 Topics in Deep Learning was a largely theoretical course. So when it came to proposing a final project my partner and I naturally went with a more theoretical project idea. The idea of this project is to expand upon some prior research on CNN interpretability results that uses L1-norm part templates to guide convergence of kernels towards certain labeled features in an animal recognition dataset.

The training data was a bunch of animals with their macro body parts labeled. The goal was to guide the convergence of the network while forcing each kernel to select only for pixels of a certain body part. I won’t explain the math here, but we discovered benefits and drawbacks of using different geometries in the part templates compared to the study this project was based upon.

Parallel Galaxy Simulation

You can read the report here.

Galaxy simulation is a very computationally intense thing to do if you are to be 100% precise. It is also difficult to parallelize that process. For the course project in 15-418 Parallel Computer Architecture and Programming we implemented the famous Barnes-Hut algorithm using OpenMP. The uses a quad-tree (or an octree for 3D simulations) and a Hilbert curve to subdivide the simulation space, fill it with stellar bodies, and approximate nearby gravitational forces.

Most of the difficulty in doing parallel galaxy simulation is reducing how much time a process is locking up the tree in order to do its movement edits. So our final approach used a lock-free quad-tree that is able to use atomics at a much finer grain tree depth to prevent locking the entire tree up at the root and killing the parallelization.