Dr. Bruno Sobral speaks in the FDAs series “Developing the Path to Personalized Nutrition and Medicine”.
Abstract
Public genome-scale data are deposited in globally distributed resources that have varying quality and annotation standards and data model for storage and querying. Often, these public resources are focused on data acquisition from large-scale data generation efforts such as major DNA sequencing centers, protein structure determination centers, and so on. Because of the breadth of most of these repositories, they tend to focus on data acquisition and dissemination to a broad audience in a timely manner. These major repositories play a fundamental role, but they cannot be highly focused on the needs of any specific community of data consumers for the purposes of computer assisted reasoning and research. The strength of these resources is their comprehensiveness. Their challenge is the lack of connectivity to the specific communities that are focused on data utilization (instead of generation).
Most researchers have a very strong desire for the full integration of data and analysis tools through a single interface. Data analysis, visualization, interpretation, and integration from the perspective of a given research community and its interests is best handled through specific and close interaction with that community and interoperation with major comprehensive data resources. Perhaps the historically best example of resources that are closely knit with their communities are those represented by the model organism information resources. Infectious disease research and development provides a uniquely challenging and high impact opportunity to develop resources that interoperate with comprehensive resources while integrating various types of data and analysis systems for the specific needs of a global community. The biological complexity of infectious disease systems, which are composed of interactions between potential pathogens, hosts (and vectors) and the environment, challenges information resources because of the breadth of organism-organism and organism-environment interactions that are needed to understand outcomes such as disease, asymptomatic carrying, and disease resistance. Beyond research, applications of integrated data for infectious diseases could serve a variety of constituencies, such as clinical, diagnostic, drug and vaccine development, and epidemiological, which are very important applied areas of data utilization. Thus there is a complexity represented by the data users and their needs and workflows as well.
In this talk I will discuss interoperability (syntactical) and integration (semantic) aspects of developing and deploying distributed information systems that serve the bacterial infectious disease community through a single interface, using the PAThosystems Resource Integration Center (PATRIC) as an example.