As Titus Brown begins his keynote address to the 2014 Bioinformatics Open Source Conference in Boston’s Hynes Convention Center, he takes us twenty-five years forward to his envisioned future in biology. He speaks of the datapocalypse, sweatshops composed of bioinformatics graduate students, a great loss of data scientists from Bay Area after the great California earthquake, an open source and open data biology, and redefined foundations of biology. He goes on to explain how we will deal with some of the major drawbacks in current biological research, but also cautions us against some of the solutions we will inevitably choose. Brown’s talk paves a path for the rapid changes in the way we will conduct and pursue biological research in the forthcoming years.
To explain what the biological renaissance will be, it may first be important to understand the nascent field of systems biology, a field that is revolutionizing the future of medicine, and the biotechnology industry, as we know it. Systems biology is an approach that is founded on mathematical and physical principles to understand the complexity of biological systems in their entirety. To quote Leroy Hood, President and Co-Founder of the Institute for Systems Biology based in Seattle,
“systems biology is the science of discovering, modeling, understanding and ultimately engineering at the molecular level the dynamic relationships between the biological molecules that define living organisms.”
Let us take, for example, the timeworn field of medicine. All contemporary approaches to disease diagnosis, prognosis, and treatment are based on correlations between certain observations (clinical syndromes), and a pathological analysis. This 19th century approach, known as the Oslerian clinicopathological correlation, defines human disease on the principle organ that the disease’s symptoms and signs manifest in. While this approach has served us quite well so far in describing pathophenotypes, it is largely limited in its ability to account for any manifestations of preclinical disease states or in its application of individualized diagnosis and therapy of more uncommon diseases.
Joseph Loscalzo and Albert-Laszlo Barabasi outline four of the most critical shortcomings of the conventional pathophenotyping approach that medicine has taken thus far. First, since the Oslerian correlation focuses on late-appearing dysfunctional organ systems, it disregards the original molecular and environmental determinants of disease. Because of this, therapies target symptoms and not origins. Secondly, since conventional disease paradigms neglect to look at the pathobiological mechanisms, they also neglect the deterministic and stochastic factors that govern the evolution of the disease from susceptibility of the disease state to preclinical and then overt pathophenotype. Third, conventional definitions of disease exclude molecular characterizations that can display subtle, but important, differences between individual diagnoses, thereby allowing for accuracy in treatment and therapeutic targets. Finally, classical approaches neglect another dimension of disease: the entire pathobiological process that constitutes a disease. Rarely is a diseased state caused by a single abnormality, but rather interactions of processes (both stochastic and deterministic) in a complex network to yield a single phenotype.
With abundant genome sequencing technologies and our fleeting abilities to start to elucidate what this immense genomic data means, it’s unsurprising that such clinicopathological correlations are not only outdated, but also unhelpful in rigorously understanding cause and cure for diseases. A systems pathobiology approach is required to integrate genetic, genomic, biochemical, cellular, physiological and clinical data to gain an appreciation of the entire disease-ome, as opposed to a single pathophenotype.
In a future where science is defined by its understanding of complex networks and system-wide interactions, modern biology will not be trial and error driven, but rather data driven. Modern sequencing technologies are already paving the path for this future, and we are already seeing biology’s conversion into a data science. According to the UK-based European Bioinformatics Institute (EBI), a part of the European Molecular Biology Laboratory (EMBL), genomic data accounts for 2 petabytes (a number that more than doubles every year) of their total 20 petabytes stores. (For reference, a petabyte of data is equivalent to 1015 bytes of data. In physical terms, that’s about 200,000 standard DVDs!)
By Hood’s prognosis, there exist three distinct types of biological information. First, is that of the digital genome: a quantity that is ultimately available for observation and analysis up to the very last base pair. It is the fundamental building block and fingerprint of all biological specimens: an accessible and ever-certain information that is available for all existing organisms and also the basic building block for any future organism. The second type of biological information is the three dimensional information of proteins, and the third type is the four dimensional, time variant information of biological systems working across developmental and physiological time spans. This introduces the main caveat of biological data: the inherent heterogeneity in not only data types and the information they hold, but also in methods of data acquisition.
In order to traverse through the infinite and heterogeneous labyrinth of biological data, future biologists will require not only immense computational power, but also mathematical and statistical tools for capturing, storing, assessing the quality of, analyzing, modeling and distributing the data explosion. As Ovidiu Lipan and Wing H. Wong assert, the future of biology is Newtonian (that is, that the underlying basis of biology can be described and understood in a mathematical language) as opposed to Shakespearean (that is, an observation of interactions between the players without any mathematical accompaniment).
Such a Newtonian future for life sciences requires us to overcome several challenges. Foremost, we will need to understand how to describe a biological system. We can begin with deterministic ordinary differential equations to address questions regarding feedback, stability, etc. However, biological systems involve individual molecules as their basic players; this makes biological systems inherently noisy and not deterministic, but rather filled with stochastic fluctuations. Thus, to understand molecular interactions, we first deal with biological noise, which we can study using an array of signal generators, an indispensable tool for systems biology (and therefore biology).
These basic tools combined with our increasing abilities in sequencing technologies will allow us to study biochemical pathways, moreover, estimate complicated genetic networks based on relatively simple input-output methods but robust in silico analysis tools. As predicted by Brown’s keynote address, this will require high performing computational biologists as well as increased capabilities in computing hardware.
According to Kevin Kelly, the founding executive editor of Wired magazine, the next century will be that of biology. Biological research is where the most number of scientists and the most new results exist. It is the field exhibiting the most economic value, from healthcare to biotechnologies, for the future, and the field with the most to learn. In recent years, there has seen a clear shift in how biological research is conducted. The future of biology, and therefore science in general, will be a completely different landscape: one that is not as much of an attributive and descriptive science but rather a predictive one.