Selected Projects and further Software
Data integration in large biomedical research projects (2010 - present)

This project is part of the multi million Euro Project LIFE - Leipzig Interdisciplinary Research Cluster of Genetic Factors, Clinical Phenotypes and Environment. In this large biomedical project, data management and integration is a key and challenging topic. Our work include the following:

Parallized entity matching (2009 - present)

Description will come soon

Entity matching in biomedical projects (2008 - 2009)

Description will come soon

Matching biomedical ontologies (2006 - present)

Ontologies become increasingly important in both commercial and scientific application domains. Relevant objects of such domains, e.g. products, genes, etc., can be semanti-cally described and categorized by ontologies. Unfortunately, ontologies also introduce semantic heterogeneity since many independ-ently developed ontologies are now in common use. This is especially the case for or-ganization-specific ontologies such as product catalogs, which are typically designed for a specific purpose. Hence ontologies of different organizations may widely differ even if they address the same application domain. An ontology mapping can bridge the semantic heterogeneity of different ontologies and thus help to search or query data from different sources, e.g. to compare or recommend similar products offered in different e-shops. The goal of this project is to study approaches allowing to determine and to produce mappings between ontologies. These approaches utilize metadata like the concept names, concept descriptions or structural context information but also use associated instances or objects.

Mapping-based data integration (2005 - 2008)

The interpretation of analyzed wet-lab data necessitates the integration of genetic annotation data, that are mostly publicly available in diverse and highly interlinked web data sources. In this project, we developed two integration approaches, the hybrid integration and iFuice/BioFuice, taking the inter-source correspondences between objects of different sources into account. In the first approach, the mapping data (set of object correspondences) are materialized in a separate database using a generic star-like schema. This schema allows an efficient 2-way join between the objects of different sources and is used by the Sequence Retrieval System, a very popular integration system in the bioinformatics domain. The iFuice/BioFuice approach utilizes a semantical domain model associating semantic categories to sets of objects as well as mappings. This model and a set of high level operators is used by the iFuice mediator for querying and integration processing. we also applied this mapping-based approach to other domains. BioFuice is the domain-specific solution in the life science landscape.

Managing and integrating Genbank data (2003 - 2004)

The GenBank database at the NCBI is a popular web-based research utility. It comprises various annotation data, e.g. biblographic data, sequence localizations, DNA and RNA sequences as well as protein sequences, for many genomic regions of various genomes. Each entry is identified by a unique accession id. Such annotation data can be used for any further purposes, e.g. sequence alignment to determine the secondary strucure or to build phylogentic trees. The goal of this project was to create an integration approach allowing to manage selected GenBank data locally on the researchers desktop. As a result, the prototype GenBank Entry Manager is provided. More ...

Managing and integrating large sets of genetic data (2002 - 2007)

The goal of this project is to create an approach to integrate huge amounts of genetic data including wet lab data, such as microarray data, and large sets of annotation data. Our approach follows the data warhouse approach and is called Genetic Warehouse (GeWare). GeWare centrally stores expression data together with mutation data as result of Matrix-CGH arrays and a variety of annotations to support different analysis forms. Compared to previous work, our approach is unique in several aspects. First, GeWare offers high flexibility with a multidimensional data model where expression and mutation data is stored in several fact tables which are associated with multiple hierarchical dimensions holding describing annotations on genes, clones, experiments, and processing methods. Second, consistent experiment annotation is achieved by means of pre-defined annotation templates and controlled vocabularies. Finally, various analysis methods have been integrated using a flexible framework based on the exchange of treatment groups, gene/clone groups and expression/CGH matrices. The system is fully operational and has been employed in several research projects and two Germany-wide clinical trials. More ...

Further software

Last modified: 20.02.2011 by Toralf Kirsten