Big Data Contact List

Solicitation Title: "Big Data Interagency Funding Announcement: Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA)"

For specific information about NIH Institutes and Centers see the list of Program Contacts below.

Description:  BIGDATA seeks proposals that develop and evaluate core technologies and tools that take advantage of available collections of large data sets to accelerate progress in science, biomedical research, and engineering.

Latest: See the First Big Data Webinar (May 8, 2012)

Proposals can focus on one or more of the following three perspectives:

1. Data collection and management (DCM).  Dealing with massive amounts of often heterogeneous and complex data coming from multiple sources—such as those generated by observational systems across many scientific fields, as well as those created in transactional and longitudinal data systems across social and commercial domains—will require the development of new approaches and tools.

2. Data analytics (DA). Significant impacts will result from advances in analysis, simulation, modeling, and interpretation to facilitate discovery of phenomena, to realize causality of events, to enable prediction, and to recommend action.  Advances will allow, for example, modeling of social networks and learning communities, reliable prediction of consumer behaviors and preferences, and the surfacing of communication patterns among unknown groups at a larger, global scale; more effective correlation of events; enhanced ability to extract knowledge from large-scale experimental and observational datasets; and extracting useful information from incomplete data.

3. E-science collaboration environments (ESCE).  A comprehensive "big data" cyberinfrastructure is necessary to allow for broad communities of scientists and engineers to have access to diverse data and to the best and most usable inferential and visualization tools.

In addition to the three science and engineering perspectives on big data described above, all proposals must also include a description of how the project will build capacity:

Capacity-building Requirement (CB). CB activities are critical to the growth and health of this emerging area of research and education. There are three broad types of CB activities: 1) appropriate models, policies and technologies to support responsible and sustainable big data stewardship; 2) training and communication strategies, targeted to the various research communities and/or the public; and 3) sustainable, cost-effective infrastructure for data storage, access and shared services.

Finally, a project may choose to focus its science and engineering big data project in an area of national priority, but this is optional:

National Priority Domain Area Option. In addition to the research areas described above, to fully exploit the value of the investments made in large-scale data collection, BIGDATA would also like to support research in particular domain areas, especially areas of national priority, including health IT, emergency response and preparedness, clean energy, cyberlearning, material genome, national security, and advanced manufacturing. Research projects may focus on the science and engineering of big data in one or more of these domain areas while simultaneously engaging in the foundational research necessary to make general advances in "big data."

IC Contact Email Phone No.

National Institute of General Medical Sciences (NIGMS)

Karin A. Remington, Ph.D.


National Cancer Institute (NCI)

Jerry Li, Ph.D.


National Institute of Biomedical Imaging and Bioengineering (NIBIB)

Vinay Pai, Ph.D.


National Institute on Drug Abuse (NIDA)

Karen Skinner, Ph.D.

301- 443-1887

National Institute of Neurological Disorders and Stroke (NINDS)

James Gnadt, Ph.D.


National Library of Medicine (NLM)

Valerie Florance, Ph.D.


National Human Genome Research Institute (NHGRI)

Vivien Bonazzi, Ph.D.


NIH Specific Research Goals

NIH’s mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce the burdens of illness and disability. Toward this end, NIH provides leadership and direction to programs designed to improve the health of the Nation by conducting and supporting research in the causes, diagnosis, prevention, and cure of human diseases; in the processes of human growth and development; in the biological effects of environmental contaminants; in the understanding of mental, addictive and physical disorders; and in directing programs for the collection, dissemination, and exchange of information in medicine and health, including the development and support of medical libraries and the training of medical librarians and other health information specialists.

To support these goals, NIH seeks proposals of core technologies and tools in the areas described in Part II.A. that take advantage of imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and/or other data sets applicable to understanding health, and to preventing and treating the various diseases and conditions of relevance to the participating Institutes, Centers, and Offices, including, but not limited to:

  • Long-term, economically and technically self-sustaining storage solutions that enable archiving,  mining, retrieving, and analyzing diverse data sets relevant to biomedical and/or behavioral research in an environment of dynamic provenance, including evolving formats, standards and ontologies;
  • Approaches that facilitate examining comparisons of genetic association data from across the globe to identify possible epistatic and/or gene-environment interactions;
  • Computational and informatics infrastructure and tools for the achieving and analysis of “structural and functional connectomes” of large, complex, and dynamic networks, including those relevant to neuroscience and addiction research;
  • Development of better tools and computational approaches to mine and store emergent crowd-sourced datasets (e.g., those produced by Twitter, Facebook, and other social media platforms), where de facto datasets or data points often emerge spontaneously, and can reveal insights into economic, social and technical factors affecting local, regional, national and global health, including drug use and drug toxicity;
  • Computational and informatics infrastructure and tools for conducting landscape analyses of biomedical research areas (e.g., combinations of ontology, text mining, search engine tools to integrate and analyze heterogeneous data and literature to capture the stage of a research topic or the current knowledge of a disease);
  • Efforts in collection, de-identification, validation, archiving, and dissemination of large volumes of imaging and associated genetic, pathological, and clinical data (e.g., cancer or neurological disorders) generated from clinical research projects and clinical trials to enable meaningful coalescing and organization of the data that are complex and diverse, and to enable collective, collaborative, and comprehensive analyses of the enhanced data collections for extraction of valuable information that would otherwise be obscured by the limited trial size or the diversity of data and disparities in data collection methodologies;
  • Development of novel data validation, normalization, and analysis approaches 1) for the identification and development of biomarkers for (e.g., cancer, or neurological disorders) diagnosis and therapy monitoring, 2) for the development of clinical and imaging data modeling that is purposeful and predictive, and directed at elucidating primary biological and pathological driving factors and processes underlying disease and treatment response, and 3) for the development of effective clinical decision support criteria and tools that can be consistently and uniformly adopted;
  • Development of informatics tools and infrastructure for creation of National Registries with the intent of improving the care of patients (e.g., cancer or neurology patients) by capturing real-time, real-world quality assured information on treatment delivery and health outcomes through a prospective electronic registry infrastructure.
  • Development of predictive modeling techniques, based on a large volume of patient data, to provide real-time individualized and optimal diagnosis and treatment plans “at the bedside”.

Also of specific interest are novel tools and techniques for interacting with and managing very large and/or heterogeneous data sets, including, but not limited to:

  • Approaches for minimizing human intervention in the organization and management of large biomedical knowledge resources,  such as automated annotators and intelligent agents to handle updates and quality control in health knowledge repositories;
  • Approaches for in silico science using published knowledge to generate or test hypotheses;
  • Approaches, technical and cultural, to share and compare data among research groups and patient advocacy groups;
  • Intelligent agents that can read a biomedical article and explain its contents to a layperson;
  • Interactive publications that incorporate access to data/knowledge resources, along with tools and approaches for adding data and reanalyzing findings;
  • Accessible data infrastructures that strongly facilitate a culture of data sharing among biomedical researchers;
  • Data infrastructures for benchmarked biomedical data that can be reused for validation and verification purposes.

NIH encourages applicants to address (where relevant) the sustainability of any software and data sharing plans after funding under this FOA would cease.

Because of the varied interests and priorities of the Institutes and Centers participating in this announcement, prospective applicants are strongly encouraged to contact the scientific officer of any targeted Institute or Center prior to application.