Algorithms for life

Contact: Diana Hearit
Aug. 1, 2017

Read more about WMU researchers and their ongoing work in the .

KALAMAZOO, Mich.鈥擶hat these computer algorithms uncover about genes and proteins may one day advance drug development and individualized medical treatment. As a computational scientist, Dr. Fahad Saeed鈥檚 research projects are all about bigness鈥攂ig numbers, big data and big ambitions.
His latest endeavor, funded by a coveted National Science Foundation CAREER award grant, is no exception, as it builds on some of the biggest breakthroughs in biological sciences.

But what role could a computational scientist play in biology?

First, about those breakthroughs.

In 2003, the Human Genome Project achieved its stunning goal to sequence and map all the genes in the human body, which has about 20,000.
Another spectacular, more recent advance came with the ability to map an organism鈥檚 proteome, or its full complement of proteins. Humans may have upwards of one million.

Scientists say both achievements, separately and together, hold keys to how the human body works fundamentally; and not just natural, normal processes, but how and why things go awry and result in disease and dysfunction. Recall that genes give instructions while proteins carry out those orders.
But the massive amount of data produced by sequencing both these 鈥渙mes鈥 is large and complex; it is not humanly possible to sift through and make sense of the multitude of interactions and pathways that exist to control the thousands of genes and proteins.

鈥淪o, while it is very exciting that we have big data from the genome and that we have big data from the proteome, it鈥檚 also a challenge because the techniques that allow us to analyze those data sets are lagging behind,鈥 Saeed says.

This is where biological science turns to experts with Saeed鈥檚 advanced skillset in computational science to develop computer algorithms that analyze 鈥渂ig data鈥 to do such things as eliminate irrelevant information, pinpoint biological interactions and, particularly in the case of proteomics and genomics, possibly decipher previously unknown or little understood functions of proteins or genes.

鈥淲ithout computational biologists, the vast amounts of raw data collected by bench scientists would remain meaningless,鈥 says Dr. Jason Hoffert, a National Institutes of Health scientific review officer who has worked alongside Saeed on projects in the past.

鈥淎t the same time, bench scientists play a key role in interpreting the cleaned data sets in order to draw meaningful biological conclusions.鈥
Based in WMU鈥檚 College of Engineering and Applied Sciences, Saeed specializes in high-performance, high-speed computer algorithms designed to break down big data sets into discernible information.

He鈥檚 an assistant professor both in the department of computer science and in the department of electrical and computer engineering and directs the Parallel Computing and Data Science laboratory at the college.

He鈥檚 developing computer algorithms capable of analyzing massive amounts of genomic and proteomic data more efficiently than any previous techniques as well as designing architecture with the capacity to store, manage and transfer this data.

To give a sense of the 鈥渂igness鈥 of the biological data this project will grapple with, Saeed explains that currently 鈥渟ome of the data sets we can produce are up to 10 terabytes (1,000 gigabytes), and that is just for one experiment for one species.鈥

鈥淚f you combine data sets (genomic plus proteomic), they get into the petabyte level (1,000 terabytes), and the computational challenges just get exponentially larger and more complex, and that is what the grant proposes to solve.鈥

The magnitude is so large, it鈥檚 difficult to imagine. For perspective, one petabyte is the equivalent of 20 million four-drawer file cabinets filled with text, according to mozy.com.

What Saeed鈥檚 algorithmic tools help life sciences researchers tease out about genes and proteins may lead to advances in drug development and individualized medical treatment.

Ultimately, Saeed says, 鈥淲e want to take this genomic and proteomic science to a place where we are able to do genomic and proteomic profiling of each person who goes to a clinic. That is what we call personal or precision medicine.

鈥淚f you are able to profile genomes and proteomes at the individual level, we are able to very specifically know what diseases you might be prone to and what are the things we can do to make sure you do not get those diseases.鈥

However, he concedes that nature is very complex. 鈥淚t will take a lot of time to really know in a very systemwide level what is going on with our bodies. But we will reach that.鈥

Saeed hopes computational tools he is designing will help make 鈥渃rucial steps toward understanding the genomic, proteomic and evolutionary aspects of species in the tree of life.鈥