Robert Grossman, PhD

Robert Grossman, PhD

Robert Grossman, PhD, professor of medicine, is an expert in data intensive computing and its applications to biology, medicine and healthcare. In FY 14, Dr. Grossman was awarded a major contract from the National Cancer Institute (NCI) to build and operate a system called the Genomic Data Commons (GDC) to host large genomic datasets from NCI-funded projects, and make them available to the cancer research community.

Dr. Grossman received his BA in mathematics from Harvard University in 1980 and his PhD in applied mathematics from Princeton University in 1985. In 2010, Dr. Grossman joined the Department of Medicine as a professor of medicine, and in 2011, he became the chief research informatics officer for the Biological Sciences Division. He is a core faculty and senior fellow in the Institute for Genomics and Systems Biology and in the Computation Institute and the Founder and Director of the Center for Data Intensive Science.

Data intensive computing (also sometimes known less formally as “big data”) develops technology to manage and analyze large datasets. Dr. Grossman has been interested in big data since he co-directed a research collaboration during 1991-1995 that developed technology to manage and analyze the data that would have been produced by the SuperConducting SuperCollider. Since that time, he has introduced a variety of technologies working with big data, including the Sector system for the distributed management of large datasets, the Sphere system for the distributed computing of large datasets, UDT for transporting data over wide area networks and the Augustus system for efficiently building segmented and ensemble models over large datasets. He was also an early advocate of building open-source systems for data intensive computing and for making data easily accessible in an open format to both humans and machines (“open data”).

In 2010, Dr. Grossman started the Open Science Data Cloud (OSDC), which is a cloud-based infrastructure that enables researchers to host, explore, integrate and analyze large datasets. The OSDC is managed and operated by an independent not-for-profit 501(c)(3) organization with the mission of providing researchers with tools and infrastructure they need to make discoveries from large datasets.

About ten years ago, Dr. Grossman became interested in bioinformatics, especially the development of technology so that big data could be used to improve our understanding of biological, medical and healthcare data. At that time, biological data was largely organized into databases, and algorithms were implemented as bioinformatics tools. To analyze data, a researcher must first set up his or her own computing environment, then download data from the relevant databases, then install and integrate the necessary bioinformatics tools. As long as data was relatively small, this process worked, but with the growth of data, and the increasing number of bioinformatics tools, this approach has become more and more challenging to all but the largest research groups. 

In 2008, he began working with Kevin White, PhD, director of the Institute for Genomics and Systems Biology (IGSB), to develop an alternative approach that worked even if the data was quite large.  Together, they developed a system called Bionimbus, which could host even very large biomedical datasets and which allowed researchers to work with the data without moving it, instead of having to set up their own local computing and bioinformatics environments. Bionimbus hosted projects such as modENCODE, which developed a functional annotation for model organisms, such as the fly and the worm. Over the past several years, this approach has gained momentum, and now a variety of other cloud-based systems for working with genomic and other biomedical datasets have been developed.

From 2011-2013, Dr. Grossman worked with colleagues in the Center for Research Informatics (CRI) to develop the BSD Clinical Research Data Warehouse (CRDW), so that researchers would have easier access to clinical data.

Human genomic data requires additional security and compliance, so in 2013, Dr. Grossman developed a cloud-based infrastructure for working with human genomic data and other PHI called the Bionimbus Protected Data Cloud (PDC). The Bionimbus PDC was the first cloud approved by the National Cancer Institute (NCI) to host data from projects such as The Cancer Genome Atlas (TCGA), a joint project of the NCI and NHGRI. More recently, as indicated above, he was awarded, along with his colleagues Drs. Nancy Cox, Kevin White and Sam Volchenboum, a multi-million dollar contract from the NCI to build and operate the Genomic Data Commons.

In recognition of his contributions, in 2013 Dr. Grossman was elected as a fellow of the American Association for the Advancement of Science. He also received a 2013 Federal 100 Award for his contributions to federal agencies in the area of big data, cloud computing and cybersecurity. In 2011, he was named a University of Chicago Pritzker Scholar.