Week of July 4

This week marked aligner attempt #2: The first program I wrote ran too slowly on my test input, even with the help of the Tufts high-performance cluster (HPC). This gave me an opportunity to rewrite it with some performance and modularity improvements in C++. I spent some time planning and re-designing, and got back to work. Despite the abstract nature of programming, I tend to plan best when I can work with my hands, so I started by drawing a diagram of the code I planned to write in a notebook.

The Needleman-Wunsch algorithm runs in O(mn), where m and n are the lengths of the pair of sequences to be aligned - there’s no escaping that. However, the original sequences are all only 325 base pairs long, and even mutant repair products don’t get much longer. So in practice, it should be very possible to write a reasonably fast aligner for our data even with this relatively slow algorithm. The limiting factor is the number of reads to be aligned in each file (the largest construct has 2.5 million), meaning that if it takes even one second to complete a single pairwise alignment, it could take up to a month to finish aligning that one file. I’m confident that, once tested and complete, the new aligner will be much faster.

One thing I enjoy the most about the computational biology field is the breadth of subfields that are available to explore. My first experience with computational biology research was working on a project in collaboration with the Putnam Lab at the University of Rhode Island. In Fall 2021, I contributed to a methods project investigating variation in RNA-seq analysis results based on library preparation method and alignment tool using two coral species, Pocillopora acuta and Montipora capitata. The project has since narrowed in scope, but I’ve continued to sit in on meetings between the Cowen Group and Putnam Lab to stay up-to-date on coral-related projects. One such meeting was this week; it was a welcome, brief distraction from my aligner planning.

Written on July 8, 2022