Ontology-based Entity Registration in large biomedical Research Projects

Biomedical research projects are typically initiated to investigate biological and medical phenomena and their implications. For instance, they study the causes and the therapy and cure process of patients with a specific disease, such as different variants of cancer or HIV. For this purpose, lots of data describing patients, their findings and treatments is captured and analyzed.

LIFE is a biomedical project in the described context. The project aims at investigating causes for several civilization diseases including adiposity, diabetes, depression, and allergies by finding factors on the genomic and clinical level but also by considering the environment and the lifestyle of patients. Various partners are involved in this project including several institutions of the University of Leipzig and external organizations, like local hospitals and other research institutions and laboratories. These institutions participate on the project by following specific biomedical research questions, such as the correlation of nutrition patterns, tobacco and alcohol consumption of patients with adiposity. On the other hand, there are institutions providing different biomedical investigation services including several laboratory analyses of bio-material, such as blood or urine specimens of patients. All investigations along inter-organizational workflows produce data. In particular, in large biomedical projects, this data is produced at different locations and by different information and laboratory systems. Therefore, there are several types of data which are usually stored in various sources. These sources are typically heterogeneously organized and use source-specific syntax and semantics to store and to represent the meaning of their data. However, the data need to be integrated to execute comprehensive analyses and, thus, to answer complex research questions.

To address these problems, we centrally register entities and their relationships by using a multi-layered model (see Fig. 1). The model utilizes the developed LIFE Investigation Ontology (LIO) and a typed system graph. LIO uniformly defines all types of physical entities. Fig. 2 gives an overview of LIO. It extends the top level ontology General Formal Ontology (GFO) by defining two distinct groups of entities, namely presentials and processes. Presentials are entities that can be described at a specific point in time, such as the study participant, (patient, proband), special specimen types, and generated data. We distinguish between two process types, the data and material generating processes. The former process type subsumes all process that generates any kind of data, e.g., medical checkups and interviews, whereas the latter type include all processes in which primarily biomaterial is separated or taken. These concepts are not organized in a is-a hierarchy but also are interrelated by the relationships generates and is-used-in to semantically represent the project workflows on an abstract level.


Figure 1: Multi-layered model for the entity registration: The LIFE Investigation Ontology (LIO) is used on the type level but can be replaced by another ontology.

Figure 2: LIFE Investigation Ontology (LIO) using the top level ontology GFO as basis.

The typed system level combines both, the semantic ontology concepts of LIO and the set of information systems managing the entity data. Hence, this level describes which information systems hold which kinds of entities. Using the workflow-specific relationships generates and is-used-in of LIO makes it possible to model and follow the project workflows on a system level, i.e., to find out which information system participate on which workflow. Finally, the entities are associated to the typed system on the entity level. Fig. 3 shows an example of the interlinked typed systems and associated entities.

Using LIO on the type level, the proposed model allows to semantically describe and classify entities and their relationships. In particular, the proposed approach makes it possible to centrally track entities along the project workflows. It can be used in explorative data analyses as well as by other data integration approaches using the registered entity relationships. Fig. 4 show the online user application Registry Browser that can be used for entity tracking and browsing. This application show the entities using the registered entity identifier. All other entity data is stored in their original sources. However, this data can be accessed on demand in their original source.


Figure 3: Example of typed systems and associated registered entities.

Figure 4: The Online application Registry Browser for tracking the registered entities along the project workflows; the arrows shows the navigational access to this data.

Last modified: 24.02.2011 by Toralf Kirsten
back