Philipp Wintersberger (University of Applied Sciences Hagenberg & TU Wien)
Enabling Multitasking in Human-AI Cooperation
(25/6/2024 2pm in person)
In the future, humans will cooperate with a wide range of AI-based systems in both working (i.e., decision and recommender systems, language models, industry robots, etc.) and private (i.e., fully- or semi-automated vehicles, smart home applications, ubiquitous computing systems, etc.) environments. Cooperation with these systems involves both shared (i.e., concurrent multitasking) and traded (i.e., task switching) interaction. As it is known that frequently changing atention can yield decreased performance and higher error rates and stress, these systems must consider the users’ atention levels in their operation and communication to be perceived as valuable and trustworthy team partners. This work addresses the emerging problems that occur when users operate potentially safety-critical real-time AI systems while frequently switching their atention to other, non-task related activities. The work investigated users’ behavior (in terms of performance and physiological stress markers) and subjective perceptions (assessed by subjective scales and interviews) in different situations, such as drivers switching between their smartphones and vehicles, multitasking on bicycles, or when interacting with chatbots and language models. Further, this habilitation contributes to these questions on a methodological level by researching evaluation methods and research tools such as simulators. Ultimately, the identified problems require a new class of interactive systems that integrally consider users’ atention, atitudes, and trust levels while providing transparent information about their capabilities and behaviors. To reach this goal, the work proposes multiple solutions, which were prototypically implemented and empirically evaluated. Individuals’ trust levels and behaviors can be influenced by augmented reality visualizations and conversational agents that provide transparency regarding their internal working mechanisms (“Explainable Artificial Intelligence”). Additionally, atention management systems based on reinforcement learning have the potential to mitigate the negative issues of multitasking and improve human performance in multitasking scenarios. The work concludes with critically assessing these concepts while providing tools, methods, and a research agenda to integrally improve human-AI cooperation in the future.
Matthias Lanzinger (TU Wien)
Graph Neural Networks and Graph Motif Parameters: From Theory to Practice
(21/5/2024 2pm in person)
Graph Neural Networks (GNNs) have quickly become one of the most powerful tools in the deep-learning toolkit for graph tasks, like the analysis of molecule structure or traffic networks.
However, the power of GNNs is restricted by very real theoretical boundaries. Graphs with different properties can be indistinguishable to GNNs, often significantly degrading GNN performance for tasks that would benefit from knowing these properties.
In this talk, I will first give an overview of recent theoretical results that provide us with a detailed picture of the power of GNNs when it comes to properties of graphs. In the second half of the talk, we will then see how these theoretical results lead to valuable and actionable insights in practice, and possibly motivate a general rethinking of how to design GNN architectures.
Ruma Maity (TU Wien)
Microswimmer learning to swim via genetic algorithm
(16/4/2024 2.30pm in person)
In nature, microorganisms employ various swimming mechanisms to propel themselves in the medium. Scientists are trying to develop artificial microswimmers capable of mimicking the propulsion mechanisms of the natural microswimmers to perform specific tasks, such as targeted drug delivery, which is a hot topic in nanomedicine. In this work, we tried to computationally train a model one-dimensional microswimmer to swim and respond to a chemical gradient with the help of a genetic algorithm, specifically NEAT. Here, the shape of the swimmer (arms length) is input to the neural network, while the force on them is the output of the network. The network has to adjust the forces on the swimmer in such a way that it can effectively deform its shape to propel in the medium. The networks that emerge from this training are exceptionally simple to drive the swimmer successively in the medium. We aim to train swimmers in higher dimensions with different propulsion mechanisms.
Fabrizio Frasca (Technion)
Towards Expressive and Efficient Graph Neural Networks
(26/3/2024 2pm in person)
The design of Neural Networks for graph-structured data is characterised by an emerging tension between expressive power, computational complexity and the retention of fundamental inductive biases. In this talk we will retrace the most salient works developed during my PhD to discuss principled approaches striking an effective balance between these of desiderata. These methods attain provable expressiveness, respect the symmetries of the objects they process, and feature a computational complexity which reflects their inherent sparsity.
We will start by discussing neural architectures on simplicial complexes, generalisation of graphs accounting for group-wise interactions. We propose to “lift” graphs to simplicial complexes by considering complete subgraphs, and to process them with our proposed architecture. This approach results in a local, hierarchical message-passing procedure beyond the capabilities of the node-centric paradigm of Message Passing Neural Networks on graphs (MPNNs). We broaden the application of this paradigm by considering regular cell complexes: We will discuss lifting graphs to these more general spaces via transformations that additionally consider structures such as simple and induced cycles. Our approach exhibits strong discriminative power and finds tractable and particularly effective application in molecular modelling tasks.
Shifting gear, we will present an alternative approach which prescribes modelling graphs as bags of subgraphs from predefined selection policies. We discuss the design of an architecture to process these objects in a way to respect their inherent symmetries. We show it obtains provable expressiveness even when subgraphs are selected by domain-agnostic, parsimonious policies and are processed with “weak”, efficient message-passing-based encoders. We will finally provide a deeper theoretical characterisation of this paradigm, reconciling a series of coeval works implicitly sharing this underlying ‘bags-of- subgraphs’ approach. A novel symmetry analysis allows us to derive an upper-bound on the expressive power of noteworthy subgraph methods, and to conceive a design space to guide the development of novel architectures in this overall family.
Haggai Maron (Technion)
Exploiting Symmetries for Learning in Deep Weight Spaces
(19/3/2024 2pm online)
Learning to process and analyze the raw weight matrices of neural networks is an emerging research area with intriguing potential applications like editing and analyzing Implicit Neural Representations (INRs), weight pruning/quantization, and function editing. However, weight spaces have inherent permutation symmetries – permutations can be applied to the weights of an architecture, yielding new weights that represent the same function. As with other data types like graphs and point clouds, these symmetries make learning in weight spaces challenging.
This talk will overview recent advances in designing architectures that can effectively operate on weight spaces while respecting their underlying symmetries. First, we will discuss our ICML 2023 paper which introduces novel equivariant architectures for learning on multilayer perceptron weight spaces. We first characterize all linear equivariant layers for their symmetries and then construct networks composed of these layers. We then turn to our ICLR 2024 work, which generalizes the approach to diverse network architectures using what we term Graph Metanetworks (GMN). This is done by representing input networks as graphs and processing them with graph neural networks. We show the resulting metanetworks are expressive and equivariant to weight space symmetries of the architecture being processed. Our graph metanetworks are applicable to CNNs, attention layers, normalization layers, and more. Together, these works make promising steps toward versatile and principled architectures for weight-space learning.
Johannes Fürnkranz (JKU Linz)
Towards Deep and Interpretable Rule Learning
(15/3/2024 2pm in person at Helmut Veith lecture hall and online)
Inductive rule learning is concerned with the learning of classification rules from data. Learned rules are inherently interpretable and easy to implement, so they are very suitable for formulating learned models in many domains. Nevertheless, current rule learning algorithms have several shortcomings. First, with respect to the current praxis of equating high interpretability with low complexity, we argue that while shorter rules are important for discrimination, longer rules are often more interpretable than shorter rules, and that the tendency of current rule learning algorithms to strive for short and concise rules should be replaced with alternative methods that allow for longer concept descriptions. Second, we think that the main impediment of current rule learning algorithms is that they are not able to learn deeply structured rule sets, unlike the successful deep learning techniques. Both points are currently under investigation in our group, and we will show some results.
Nataliya Sokolovska (Sorbonne University)
Interpretable models in machine learning and their application in medicine
(16/08/2023 at 2pm in person)
An important aspect of practical classifiers is interpretability. Learning compact but highly accurate models that help in human decision-making is challenging. Most such simple scoring systems were constructed by human experts using some heuristics and are not optimal. In many prediction tasks such as medical diagnostics, there are many more challenges: finding optimal individual treatment; taking budget into consideration, and the budget (any finite resource such as time, money, or side effects of medications) in real-life applications is always limited. I will consider principled methods to learn interpretable simple rules purely from data. I will also show possible solutions to take the limited budget into account, and discuss some perspectives for development of methods of personalised medicine.
Viacheslav Borovitskiy (ETH Zürich) Geometric Gaussian Processes (6/6/2023 in person)
Gaussian processes (GPs) are often considered to be the gold standard in settings where well-calibrated predictive uncertainty is of utter importance, such as decision making.
It is important for applications to have a class of “general purpose” GPs. Traditionally, these are the stationary processes, e.g. RBF or Matérn GPs, at least for the usual vectorial inputs. For non-vectorial inputs, however, there is often no such class. This state of affairs hinders the use of GPs in a number of application areas ranging from robotics to drug design.
In this talk, I will consider GPs taking inputs on a manifold, on a node set of a graph, or in a discrete “space” of graphs. I will discuss a framework for defining the appropriate general purpose GPs, as well as the analytic and numerical techniques that make them tractable.
Linara Adilova (Uni Bochum) Information Plane Analysis for Dropout Neural Networks (25/4/2023 at 1pm)
The information-theoretic framework promises to explain the predictive power of neural networks. In particular, the information plane analysis, which measures mutual information (MI) between input and representation as well as representation and output, should give rich insights into the training process. This approach, however, was shown to strongly depend on the choice of estimator of the MI. The problem is amplified for deterministic networks if the MI between input and representation is infinite. Thus, the estimated values are defined by the different approaches for estimation, but do not adequately represent the training process from an information-theoretic perspective. In this work, we show that dropout with continuously distributed noise ensures that MI is finite. We demonstrate in a range of experiments that this enables a meaningful information plane analysis for a class of dropout neural networks that is widely used in practice.
Daniel Springer (IARAI) A Machine Learning Approach to the Analytic Continuation Problem (25/4/2023 at 3.30pm in person)
Machine learning (ML) models have proven to be successful in detecting
patterns and structure in a variety of data. In this work we outline a
machine learning approach to tackle the problem of analytic
continuation. Based on the mathematical structure of the problem we are
able to motivate the usage of different state-of-the-art ML models and
train them to predict highly accurate spectra of the metal-insulator
transition in the single band Hubbard model at half filling.
C’Est La Wien Community Event for Students of Learning Algorithms in Wien (27/2/2023, 8.30am-6pm, in person)
Alice Moallemy-Oureh and Silvia Beddar-Wiesing (both Uni Kassel) A Note on the Modeling Power of Different Graph Types (31/1/2023, 3pm)
Graphs can have different properties that lead to several graph types
and may allow for a varying representation of diverse information. In
order to clarify the modeling power of graphs, we introduce a partial
order on the most common graph types based on an expressivity relation.
The expressivity relation quantifies how many properties a graph type
can encode compared to another type. Additionally, we show that all
attributed graph types are equally expressive and have the same modeling
power.
Arsen Sultanov (Sorbonne University) Generating stable crystal structures with denoising diffusion (22/11/2022, 3.30pm, in person)
In recent years, diffusion-based generative models have achieved SoTA
performance on various tasks, including generation of images, audio,
point clouds and molecular conformations. In our work, we adapt existing diffusion models to solve the problem of periodic crystal structure
generation. We give a general overview of the diffusion models and the
problems arising in our particular application.
Laurens Devos, Linara Adilova (Uni Bochum), and Michael Kamp (Uni Bochum) at our Post-IJCAI-ML workshop (30/7/2022, 11.30am-5pm, in person)
Nicolò Cesa-Bianchi (Università degli Studi di Milano) The power of cooperation in networks of learning agents (19/7/2022, 2pm in Boecklsaal, in person)
We study the power of cooperation in a network of agents that exchange information with each other to learn faster. In the talk, we show the extent to which cooperation allows to prove performance bounds that are strictly better than the known bounds for non-cooperating agents. Our results are formulated within the online learning setting and hold for various types of feedback models.
Nicolò Cesa-Bianchi is professor of Computer Science at the University of Milan, Italy. His main research interests are the design and analysis of machine learning algorithms for statistical and online learning, multi-armed bandit problems, and graph analytics. He is co-author of the monographs “Prediction, Learning, and Games” and “Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems”. He served as President of the Association for Computational Learning and co-chaired the program committees of some of the most important machine learning conferences, including NeurIPS, COLT, and ALT. He is the recipient of a Google Research Award, a Xerox Foundation Award, a Criteo Faculty Award, a Google Focused Award, and an IBM Research Award. He is ELLIS fellow and co-director of the ELLIS program on Interactive Learning and Interventional Representations. He serves on the steering committees of the Italian Laboratory on AI and Intelligent Systems, and of the Italian PhD program on AI.
Derek Lim and Joshua David Robinson (both MIT) Sign and Basis Invariant Networks for Spectral Graph Representation Learning (21/6/2022, 3pm)
Numerous machine learning models process eigenvectors, which arise in various scenarios including principal components analysis, matrix factorizations, and operators associated to graphs or manifolds. We introduce SignNet and BasisNet – new neural architectures that are invariant to two key symmetries displayed by eigenvectors: (i) sign flips, since if v is an eigenvector then so is −v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We prove that our networks are universal, i.e., they can approximate any continuous function of eigenvectors with the desired invariances. Moreover, when used with Laplacian eigenvectors, our architectures are provably expressive for graph representation learning: they can approximate any spectral graph convolution, can compute spectral invariants that go beyond message passing neural networks, and can provably simulate previously proposed graph positional encodings. Experiments show the strength of our networks for molecular graph regression, learning expressive graph representations, and learning neural fields on triangle meshes.
Pascal Welke (Uni Bonn) A Generalized Weisfeiler-Lehman Graph Kernel (15/3/2022, 2pm, in person)
The majority of popular graph kernels is based on the concept of Haussler’s $\mathcal{R}$-convolution kernel and defines graph similarities in terms of mutual substructures. In recent work, we enrich these similarity measures by considering graph filtrations: Using meaningful orders on the set of edges, which allow to construct a sequence of nested graphs, we can consider a graph at \emph{multiple granularities}. For one thing, this provides access to features on different levels of resolution. Furthermore, rather than to simply compare frequencies of features in graphs, it allows for their comparison in terms of \emph{when} and for \emph{how long} they exist in the sequences. In this work, we propose a family of graph kernels that incorporate these existence intervals of features. While our approach can be applied to arbitrary graph features, we particularly highlight Weisfeiler-Lehman vertex labels, leading to efficient kernels. We show that using Weisfeiler-Lehman labels over certain filtrations strictly increases the expressive power over the ordinary Weisfeiler-Lehman procedure in terms of deciding graph isomorphism. In fact, this result directly yields more powerful graph kernels based on such features and has implications to graph neural networks due to their close relationship to the Weisfeiler-Lehman method. We empirically validate the expressive power of our graph kernels and show significant improvements over state-of-the-art graph kernels in terms of predictive performance on various real-world benchmark datasets.
Laura Manduchi and Ricards Marcinkevics (both ETH Zürich) Deep Variational Approaches for Weakly Supervised Clustering With Applications to Survival Data (30/11/2021, 2pm)
Bo Kang (Ghent University) Subjectively Interesting Data Representations (9/11/2021, 2pm)
Stefano Teso (University of Trento) Debugging Machine Learning Models with Explanations (16/11/2021, 2pm)
A central tenet of explainable AI is that the bugs and biases affecting a model can be uncovered by computing and analyzing explanations for the model’s predictions. However, and crucially, techniques for explaining machine learning models do not enable experts to correct the bugs that they expose. In this talk, I will overview recent work on debugging machine learning models that approach the problem by supplying corrective supervision on the model’s explanations. In particular, I will discuss approaches based on local attribute-based explanations, global explanations, as well as example-based explanations. Moreover, I will illustrate how these techniques can be generalized to concept-based models by mixing attribute- and concept-level supervision. I will conclude by outlining some important open issues in this flourishing research topic.
Marco Bressan (University of Milan) Exact recovery of clusters in metric spaces: margins and convexities (24/9/2021, 1pm)
Sebastian Mair (Leuphana Universität Lüneburg) Computing Efficient Data Summaries (29/6/2021, 2pm)
Katrin Ullrich (Fraunhofer IWU) Binding Affinity Prediction - Multi-View Regression in Three Different Learning Scenarios (8/6/2021, 2pm)
Pascal Welke (University of Bonn) Efficient Graph Similarity Learning (27/4/2021, 2pm)
Mario Boley (Monash) Better Short Than Greedy: Interpretable Models Through Optimal Rule Boosting (20/4/2021, 11am)
Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability. However, the myopic and random search components of current rule ensemble methods can compromise this goal: they often need more rules than necessary to reach a certain accuracy level or can even outright fail to accurately model a distribution that can actually be described well with a few rules. Here, we present a novel approach aiming to fit rule ensembles of maximal predictive power for a given ensemble size (and thus model comprehensibility). In particular, we present an efficient branch-and-bound algorithm that optimally solves the per-rule objective function of the popular second-order gradient boosting framework. Our main insight is that the boosting objective can be tightly bounded in linear time of the number of covered data points. Along with an additional novel pruning technique related to rule redundancy, this leads to a computationally feasible approach for boosting optimal rules that, as we demonstrate on a wide range of common benchmark problems, consistently outperforms the predictive performance of boosting greedy rules.
Antoine Ledent (TU Kaiserslautern) Orthogonal Inductive Matrix Completion (13/4/2021, 2pm)
In this talk I will go over our recent method, OMIC, an interpretable approach to inductive matrix completion based on a sum of multiple orthonormal side information terms, together with nuclear-norm regularization. The approach allows us to inject prior knowledge about the eigenvectors of the ground truth matrix. The approach is optimized by a provably converging algorithm, which optimizes all components of the model simultaneously. I will go over the most relevant particular cases, which apply when one wishes to include user/item biases, or when community side information is available. Time permitting, I will finish by presenting an optimized implementation of the algorithm in these cases, with computational complexity comparable to SoftImpute.
Magda Gregorova (University of Applied Sciences-Western Switzerland, Geneva) Learned transform compression with optimized entropy encoding (30/3/2021, 2pm)
We consider the problem of learned transform compression where we learn both, the transform as well as the probability distribution over the discrete codes. We utilize a soft relaxation of the quantization operation to allow for back-propagation of gradients and employ vector (rather than scalar) quantization of the latent codes. Furthermore, we apply similar relaxation in the code probability assignments enabling direct optimization of the code entropy. To the best of our knowledge, this approach is completely novel. We conduct a set of proof-of concept experiments confirming the potency of our approaches.
Gavin Smith (University of Nottingham) Model Class Reliance for Random Forests (16/3/2021, 2pm)
Variable Importance (VI) has traditionally been cast as the process of estimating each variable’s contribution to a predictive model’s overall performance. Analysis of a single model instance, however, guarantees no insight into a variables relevance to underlying generative processes. Recent research has sought to address this concern via analysis of Rashomon sets - sets of alternative model instances that exhibit equivalent predictive performance to some reference model, but which take different functional forms. Measures such as Model Class Reliance (MCR) have been proposed, that are computed against Rashomon sets, in order to ascertain how much a variable must be relied on to make robust predictions, or whether alternatives exist. If MCR range is tight, we have no choice but to use a variable; if range is high then there exists competing, perhaps fairer models, that provide alternative explanations of the phenomena being examined. Applications are wide, from enabling construction of ‘fairer’ models in areas such as recidivism to health analytics and ethical marketing. Tractable estimation of MCR for non-linear models is currently restricted to Kernel Regression under squared loss [7]. In this paper we introduce a new technique that extends computation of Model Class Reliance (MCR) to Random Forest classifiers and regressors. The proposed approach addresses a number of open research questions, and in contrast to prior Kernel SVM MCR estimation, runs in linearithmic rather than polynomial time. Taking a fundamentally different approach to previous work, we provide a solution for this important model class, identifying situations where irrelevant covariates do not improve predictions.
Daniel Paurat (Telekom) Machine Learning @ Telekom (2/3/2021, 2pm)
Dino Oglic (King’s College London) on Parznets – Deep CNNs for Waveform-based Speech Recognition (10/11/2020, 2pm)
We investigate the potential of stochastic neural networks for learning effective waveform-based acoustic models. The waveform-based setting, inherent to fully end-to-end speech recognition systems, is motivated by several comparative studies of automatic and human speech recognition that associate standard non-adaptive feature extraction techniques with information loss which can adversely affect robustness. Stochastic neural networks, on the other hand, are a class of models capable of incorporating rich regularization mechanisms into the learning process. We consider a deep convolutional neural network that first decomposes speech into frequency sub-bands via an adaptive parametric convolutional block where filters are specified by cosine modulations of compactly supported windows. The network then employs standard non-parametric 1D convolutions to extract relevant spectro-temporal patterns while gradually compressing the structured high dimensional representation generated by the parametric block. We rely on a probabilistic parametrization of the proposed neural architecture and learn the model using stochastic variational inference. This requires evaluation of an analytically intractable integral defining the Kullback-Leibler divergence term responsible for regularization, for which we propose an effective approximation based on the Gauss-Hermite quadrature. Our empirical results demonstrate a superior performance of the proposed approach over comparable waveform-based baselines and indicate that it could lead to robustness. Moreover, the approach outperforms a recently proposed deep convolutional neural network for learning of robust acoustic models with standard FBANK features.
Linara Adilova (Fraunhofer IAIS) (27/10/2020, 2pm)
Florian Seiffarth (University of Bonn) Learning with Closure Spaces (13/10/2020)
Fabio Vitale (Inria Lille – Nord Europe and University of Lille) on Fast Clustering through Pairwise Similarity Information (29/9/2020, 2pm)
Michael Kamp (Monash University, Melbourne) on Black-Box Machine Learning (1/9/2020, 1pm)