Projects and Collaborations

Structured Data Learning with General Similarities (StruDL)

We will systematically investigate similarity-based machine learning with structured data such as strings, trees and graphs. While most off-the-shelf machine learning algorithms require data to be embedded in a (finite or infinite) dimensional inner product space, most intuitive notions of similarity for structured data by domain experts do not allow for such an embedding. Examples of such similarities are based on alignments, edit operations, or (graph) matching. Recent progress has allowed learning algorithms to use more general similarities which can be embedded in Krein space. While preliminary work shows the potential of this approach to learning with structured data, this possibility has never been systematically explored. Furthermore, even these approaches have no means for dealing naturally with asymmetric notions of similarity like the ones based on substructure relations. This project will close the described gaps by (i) designing and investigating general similarities for structured data, (ii) developing learning algorithms for general similarities, and (iii) applying combinations of these for concrete problems in cheminformatics. Progress in the design of RNA therapeutics, polyketide pharmaceuticals, and the prediction of mass spectra will have high impact on several areas of human society. Our approach promises higher predictive performance, more efficient learning, and better interpretability of the models by domain experts.

Training Alliance for Computational Systems chemistry (TACsy)

Many important questions and grand challenges in research, industry, and society involve large and complex networks of chemical reactions. Some examples are: studying metabolic networks in humans; planning and optimizing chemical synthesis in industry and research labs; modeling the fragmentation process in mass spectrometry; developing personalized medicine; probing hypotheses of the origins of life; monitoring environmental pollution in air, water, and soil. In project TACsy, we will develop ground-breaking new computational methods for analyzing such networks of chemical reactions and we will train a new generation of excellent and innovative early stage researchers (ESRs) capable of evolving and applying these methods in research and industry. Combined, these efforts carry very strong potential for impact on the grand challenges mentioned above, on the EU commission priority on jobs, growth, investment, and competitiveness, and on the well-being of EU citizens. The research methodology of TACsy arises from the novel application of formalisms, algorithms, and computational methods from computer science to questions in systems chemistry. The first steps demonstrating the strong capabilities of this approach have recently been made. In TACsy, the ESRs will vastly expand these methods and their formal foundations, they will create efficient algorithms and implementations of them, and they will use these implementations for research in complex chemical systems in three flagship application areas. TACsy is a consortium of world-class, experienced scientists which will ensure excellent research training conditions for the ESRs in this highly interdisciplinary field. Through a carefully designed training programme and secondments at leading industry partners, the ESR will acquire a broad career perspective and a strong set of transferable skills. Their unique blend of competences from computer science and chemistry will further increase their high employability.

Secure and Intelligent Human-Centric Digital Technologies

The goal of SecInt is to develop the scientific foundations of secure and intelligent human-centric digital technologies. This requires interdisciplinary research, establishing synergies between different research fields (Security and Privacy, Machine Learning, and Formal Methods). Research highlights brought forward by the synergies across projects include the design of machine learning algorithms resistant to adversarial attacks, the design of machine learning algorithms for security and privacy analysis, the security analysis of personal medical devices, the design of secure and privacy- preserving contact tracing apps, and the enforcement of safety for dynamic robots.

AI for Advanced SAR Processing (AI4SAR)

The usability of Synthetic Aperture Radar (SAR) satellite data depends on the correct interpretation of the underlying scatter mechanism, where current modelling approaches perform poorly or fail. Within the proposed project AI4SAR, different state-of-the-art artificial intelligence (AI) algorithms based on unsupervised, active and knowledge-based learning are further developed to find a data-driven solution for this impressive challenge. The AI-based separation of different scattering mechanisms then allows optimised SAR despeckle filtering, interferometric phase preservation, SAR-to-optical matching, and in general advanced SAR processing. The AI4SAR developments will be demonstrated with the help of different use cases in the fields of forest monitoring, deformation monitoring and ground control point transfer

ML for Analysis and Design of Bacteriophages

Antimicrobial resistance (AMR) is a growing problem in many types of bacteria which cause disease (pathogens) in animals and humans. Salmonella is an important bacterial pathogen of both, and often causes gastrointestinal infections which may sometimes progress to more serious and life-threatening disease. It can spread from infected farm animals to humans through the food chain. Intensively farmed food animals such as poultry and pigs are an important source of Salmonella, and the use of antibiotics in these animals over many years has been associated with the development of new strains of this bacterium which are resistant to antibiotics. This means that infections in animals and humans are more difficult to treat, which may result in more serious infections occurring over time, particularly in vulnerable groups such as the elderly, or those with poor immunity. There is an urgent need to find alternatives to antibiotics which are more sustainable. This project will use laboratory experiments and machine learning to build a comprehensive understanding of how phages infect Salmonella under different conditions.

ML MOOC

Ziel des Projekts ist es einen qualitativ hochwertigen Pool an Lehreinheiten und Kursen aus Informatik-Basiswissen in deutscher Sprache zu entwickeln, der für alle Universitätsstandorte nutzbar ist. Eine Umsetzung in deutscher Sprache ermöglicht den Einsatz in allen Bachelorstudien und eine Öffnung der Akademischen Lehre an alle interessierte Menschen. An allen Standorten kann dadurch ein breites Spektrum an Informatik-Basiswissen auch für eine größere Zahl an TeilnehmerInnen angeboten werden. Die einzelnen Partner des Projekts beteiligen sich an jenen informatischen Themen, für die sie eine besonders hohe Qualifikation und Reputation haben. Die grundlegenden informatischen Themen sollen nicht nur für Informatik-Studien und MINT-Fächer geeignet sein, sondern für alle Studien.

ML for Biological and Chemical Data

Our project focuses on Machine Learning and its applications to complex real-world data processing. Many real-world data sets, such as biological, chemical or materials science data, have an inherent structure and can be modelled as sequences, graphs, or hypergraphs. We are interested, in particular, in two interrelated problems: 1) learning unknown underlying structure in data, and 2) learning efficient graph representations that leads to more accurate and interpretable models than the state-of-the-art.

ML for Analysis and Design of Molecules and Chemical Reactions

ML in ShapeTech

Previous Projects

ML for Constructing Novel Relational Structures

Effective Well-Behaved Pattern Mining Through Sampling