Selected Projects and Software

Selected Projects and further Software

Data integration in large biomedical research projects (2010 - present)

This project is part of the multi million Euro Project LIFE - Leipzig Interdisciplinary Research Cluster of Genetic Factors, Clinical Phenotypes and Environment. In this large biomedical project, data management and integration is a key and challenging topic. Our work include the following:

Ontology-based registration of entities
Large biomedical projects often include workflows running across institutional borders. In these workflows, data describing biomedical entities, such as patients, bio-materials but also processes itself, is typically produced, modified and analyzed at different locations and by several systems. Therefore, both tracking entities within inter-organizational workflows and data integration are often crucial steps. To address these problems, we centrally register entities and their relationships by using a multi-layered model. The model utilizes an ontology and a typed system graph to semantically describe and classify entities and their relationships but also to access entity data on demand in their original source. Moreover, this approach allows to centrally track entities along the project workflows and can be used in explorative data analyses as well as by different data integration approaches. More ...
Applying model management approaches to the integration of biomedical data
Using the ontology-based registration approach to semantically categorize sources and, thus, data we are currently working on an appraoch to integrate data from evolving and heterogeneous sources. With this approach, data from different sources are centrally materialized in the so called research database. Using the source schemas, the schema of the research database is generated and continuously adapted by using high-level operators as known from the model management. The schema mappings are created by the inhouse-developed matching tool GOMMA. (More information will come soon).

Parallized entity matching (2009 - present)

Description will come soon

Entity matching in biomedical projects (2008 - 2009)

Description will come soon

Matching biomedical ontologies (2006 - present)

Ontologies become increasingly important in both commercial and scientific application domains. Relevant objects of such domains, e.g. products, genes, etc., can be semanti-cally described and categorized by ontologies. Unfortunately, ontologies also introduce semantic heterogeneity since many independ-ently developed ontologies are now in common use. This is especially the case for or-ganization-specific ontologies such as product catalogs, which are typically designed for a specific purpose. Hence ontologies of different organizations may widely differ even if they address the same application domain. An ontology mapping can bridge the semantic heterogeneity of different ontologies and thus help to search or query data from different sources, e.g. to compare or recommend similar products offered in different e-shops. The goal of this project is to study approaches allowing to determine and to produce mappings between ontologies. These approaches utilize metadata like the concept names, concept descriptions or structural context information but also use associated instances or objects.

Mapping-based data integration (2005 - 2008)

The interpretation of analyzed wet-lab data necessitates the integration of genetic annotation data, that are mostly publicly available in diverse and highly interlinked web data sources. In this project, we developed two integration approaches, the hybrid integration and iFuice/BioFuice, taking the inter-source correspondences between objects of different sources into account. In the first approach, the mapping data (set of object correspondences) are materialized in a separate database using a generic star-like schema. This schema allows an efficient 2-way join between the objects of different sources and is used by the Sequence Retrieval System, a very popular integration system in the bioinformatics domain. The iFuice/BioFuice approach utilizes a semantical domain model associating semantic categories to sets of objects as well as mappings. This model and a set of high level operators is used by the iFuice mediator for querying and integration processing. we also applied this mapping-based approach to other domains. BioFuice is the domain-specific solution in the life science landscape.

Managing and integrating Genbank data (2003 - 2004)

The GenBank database at the NCBI is a popular web-based research utility. It comprises various annotation data, e.g. biblographic data, sequence localizations, DNA and RNA sequences as well as protein sequences, for many genomic regions of various genomes. Each entry is identified by a unique accession id. Such annotation data can be used for any further purposes, e.g. sequence alignment to determine the secondary strucure or to build phylogentic trees. The goal of this project was to create an integration approach allowing to manage selected GenBank data locally on the researchers desktop. As a result, the prototype GenBank Entry Manager is provided. More ...

Managing and integrating large sets of genetic data (2002 - 2007)

The goal of this project is to create an approach to integrate huge amounts of genetic data including wet lab data, such as microarray data, and large sets of annotation data. Our approach follows the data warhouse approach and is called Genetic Warehouse (GeWare). GeWare centrally stores expression data together with mutation data as result of Matrix-CGH arrays and a variety of annotations to support different analysis forms. Compared to previous work, our approach is unique in several aspects. First, GeWare offers high flexibility with a multidimensional data model where expression and mutation data is stored in several fact tables which are associated with multiple hierarchical dimensions holding describing annotations on genes, clones, experiments, and processing methods. Second, consistent experiment annotation is achieved by means of pre-defined annotation templates and controlled vocabularies. Finally, various analysis methods have been integrated using a flexible framework based on the exchange of treatment groups, gene/clone groups and expression/CGH matrices. The system is fully operational and has been employed in several research projects and two Germany-wide clinical trials. More ...

Further software

RESTful interface to Lime Survey database schema (description is coming soon)
Assessment Battery (based on Lime Survey, description is coming soon)
Wet Lab Annotation Manager (based on Lime Survey, description is coming soon)
GE-Exporter for Affymetrix MicroDBs
Perl Database Access Test (written in the MIAMExpress installation process)

Last modified: 20.02.2011 by Toralf Kirsten