Seminar in Artificial Intelligence - Theoretical Aspects of Machine Learning

General information

TISS: (link)
contact: Patrick Indri
meeting link: https://tuwien.zoom.us/my/patrickindri
everything important will be announced in TUWEL/TISS.

Format

This seminar simulates a machine learning conference, where the students take on the role of authors and reviewers. It consists of multiple phases.

1. Proposal phase

Attend the mandatory first meeting either in person or remotely (details on TUWEL).

Option 1: our suggestions

You select two topics/papers (i.e., two bullet points) from one of the topics below. You will work with the material mentioned in the overview and the topic-specific resources.

Option 2: your own idea + one of our suggestions

You choose your own topic to work on. This can be some existing machine learning paper/work or an own creative idea in the context of machine learning. We strongly encourage you to start from existing papers from the following venues: NeurIPS, ICML, ICLR, COLT, AISTATS, UAI, JMLR, MLJ. Importantly, your idea has to be specific and worked out well. Nevertheless, choose one of our suggestions as well.

Independent of the option you chose, understand the fundamentals of your topic and try to answer the following questions:

What is the problem?
Why is it an interesting problem?
How do you plan to approach the problem? / How have the authors of your topic approached the problem?

Select topics and write a short description of them together with the answers to the questions (~3 sentences should be sufficient) in TUWEL.

We can only accept your own proposals if you can answer the mentioned questions and have a well worked out topic.

2. Bidding and assignment phase

You will also act as reviewers and bid on the topics of your peers you want to review. Based on the biddings, we (in the role as chairs of the conference) will select one of each student’s proposals as the actual topic you will work on for the rest of this semester. You do not need to work on the other topic, anymore. Additionally, we will also assign two different topics from other students to you, which you will have to review later in the semester.

3. Working phase

Now the actual work starts. Gather deep understanding of your topic, write a first draft of your report and give a 5-minute presentation. We recommend to go beyond the given material.

4. Reviewing phase

You will again act as a reviewer for the conference by writing two reviews, one for each draft report assigned to you.

5. Writing phase

Based on the reviews from your peers (and our feedback) you will further work on your topic.

6. Submission phase

Give a final presentation and submit your report.

General resources (freely available books and lecture notes)

Understanding machine learning: from theory to algorithms. Shai Shalev-Shwartz and Shai Ben-David (pdf)
Foundations of machine learning. Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar (pdf)
Foundations of data science. Avrim Blum, John Hopcroft, and Ravindran Kannan (pdf)
Mathematics for machine learning. Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong (pdf)
Mining of massive datasets. Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman (pdf)
Reinforcement learning: an introduction. Richard Sutton and Andrew Barto (pdf)
Research Methods in Machine Learning. Tom Dietterich (pdf)

Topics (tentative)

You should have access to the literature and papers through Google scholar, DBLP, the provided links, or the TU library.

Active Learning (click to expand)

Motivation: In active learning, the learning algorithm is allowed to select the data points it wants to see labelled, for example, where it is most uncertain. The goal is to reduce the labelling effort. This is useful in applications where unlabelled data is abundant, yet labels are scarce, such as node classification in social networks, drug discovery, and autonomous driving.

Overview:

chapter 1 “Automating inquiry” of Burr Settles’ “Active learning” book, 2012.
introduction and recent research: Rob Nowak and Steve Hanneke - ICML 2019 tutorial (youtube-link)

Papers and topics:

active learning with comparison queries (Kane, D. M., Lovett, S., Moran, S., & Zhang, J. “Active classification with comparison queries.” FOCS 2017)
sample complexity in active learning (Maria-Florina Balcan, Steve Hanneke, and Jennifer Wortman Vaughan “The true sample complexity of active learning.” Machine Learning 2010)
bounded memory active learning (M. Hopkins, D. Kane, S. Lovett & M. Moshkovitz. “Bounded memory active learning through enriched queries.” COLT 2021)

Differentiable Algorithms (click to expand)

Motivation: Classical algorithms cannot be combined with neural networks as training neural networks would require the computation of a gradient which is not possible for classical algorithms. Thus, we are interested in making classical algorithms differentiable.

Papers & Projects:

Petersen, Learning with Differentiable Algorithms, Doctoral Thesis at the University of Konstanz, arXiv, (2022)

The above thesis gives an introduction to the field of differentiable algorithms and is built upon many conference publications by the author at strong ML conferences. We recommend picking either one of the chapters or one of the underlying publications from ICML, ICLR, NeurIPS or CVPR as a basis.

Equivariant Neural Networks (click to expand)

Motivation: Many datastructures have an innate structure that our neural networks should respect. For example the output of a graph neural networks should not change if we permute the vertices (permutation equivariance/invariance).

Overview:

chapter 8 “equivariant neural networks” of “Deep learning for molecules and materials” by Andrew D. White, 2021. (pdf).
introduction to equivariance: Taco Cohen and Risi Kondor - Neurips 2020 Tutorial (first half) (slideslive-link)

Papers and topics:

neural network that can learn on sets (Zaheer, et al. “Deep sets.” NeurIPS 2017)
learning equivariance from data (Zhou, et al. “Meta-learning symmetries by reparameterization.” ICLR 2021)

GNNs (click to expand)

Motivation: Graphs are a very general structure and can be applied to many areas: molecules and developing medicine, geographical maps, spread of diseases. They can be used to model physical systems and solve partial differential equations. Even images and text can be seen as a special case of graphs. Thus it makes sense to develop neural networks that can work with graphs. GNNs have strong connections to many classical computer science topics (algorithmics, logic, …) while also making use of neural networks. This means that work on GNN can be very theoretical, applied or anything in between.

Overview:

Veličković, Everything is connected: Graph neural networks, Current Opinion in Structural Biology, 2023
Sanchez-Lengeling et al., A Gentle Introduction to Graph Neural Networks, distill.pub 2021
Veličković, Intro to graph neural networks (ML Tech Talks), YouTube, 2021

Papers & Projects:

Note: For very long papers we do not expect you to read the entire appendix.

Expressiveness of GNNs
- Zhang et al., Rethinking the Expressive Power of GNNs via Graph Biconnectivity, ICLR, 2023
- Lim et al., Sign and Basis Invariant Networks for Spectral Graph Representation Learning, ICLR, 2023
- Hwang et al., An Analysis of Virtual Nodes in Graph Neural Networks for Link Prediction, LoG, 2022
Oversmoothing & Over-Squashing
- Keriven, Not too little, not too much: a theoretical analysis of graph (over)smoothing, NeurIPS, 2023
- Abboud et al., Shortest Path Networks for Graph Property Prediction, LoG, 2022
- Huang et al., You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets, LoG, 2022
Neural Algorithmic Reasoning / Algorithm Representation Learning
- Numeroso et al., Dual Algorithmic Reasoning, ICLR, 2023
Learning from Graph Data with MLPs
- Tian et al., Learning MLPs on Graphs: A Unified View of Effectiveness, Robustness, and Efficiency, ICLR, 2023

Kernel Methods (click to expand)

Motivation: Kernels generalise linear classifiers to linear functions in a (potentially infinite dimensional) feature space. They are the foundation of various popular machine learning algorithms like the kernel SVM and kernel PCA.

Overview:

chapters 1 and 2 of “Learning with kernels” by Bernhard Schölkopf and Alex Smola, 2002 (pdf)
introduction to kernels: Bernhard Schölkopf - MLSS 2013 (youtube-link)

Papers and topics:

Nyström method (Drineas and Mahoney. “On the Nyström method for approximating a Gram matrix for improved kernel-based learning.” Journal of machine learning research 2005 and Kumar, et al. “Sampling methods for the Nyström method.” Journal of machine learning research 2012)
Nyström method with kernel k-means++ samples as landmarks (Drineas and Mahoney. “On the Nyström method for approximating a Gram matrix for improved kernel-based learning.” Journal of machine learning research 2005 and Oglic and Gärtner. “Nyström method with kernel k-means++ samples as landmarks.” ICML 2017)
random features (Rahimi and Recht. “Random features for large-scale kernel machines.” NIPS 2007 and Le, et al. “Fastfood: approximate kernel expansions in loglinear time.” ICML 2013)
neural tangent kernel (Jacot, et al. “Neural tangent kernel: convergence and generalization in neural networks.” NIPS 2018)

Modern aspects of learning theory (click to expand)

Motivation: Learning theory studies computational and algorithmic aspects of machine learning algorithms to prove guarantees such as sample complexity bounds. This important to understand and devise novel learning algorithms. In recent years, many long-standing open questions in learning theory have been answered.

Overview:

Olivier Bousquet Stéphane Boucheron, and Gábor Lugosi: “Introduction to Statistical Learning Theory” 2003.
Chapters 1-6 of “Understanding machine learning”
“Extending Generalization Theory Towards Addressing Modern Challenges in ML” by Shay Moran, talk at the HUJI ML Club, 2021 (youtube-link)
(Basic material) Statistical Machine Learning by Ulrike von Luxburg (we recommend part 38-41) (youtube playlist)

Papers and topics:

partial concept classes (Alon, et al., “A theory of PAC learnability of partial concept classes”, unpublished arXiv:2107.08444)
tight bounds (Bousquet, et al., “Proper learning, Helly number, and an optimal SVM bound” COLT 2020)
universal learning (Bousquet, et al., “A theory of universal learning” STOC 2021)
sample compression schemes (Moran, et al., “Sample compression schemes for VC classes” Journal of the ACM 2016)
algorithmic stability (Bousquet, Olivier, and André Elisseeff. “Algorithmic stability and generalization performance.” NeurIPS, 2000)

Neurosymbolic AI / Logic & ML (click to expand)

Overview:

Neurosymbolic AI: The 3rd Wave, 2020 (A. Garcez, L. Lamb)
Neural-Symbolic Cognitive Reasoning, 2009 (A. Garcez, L. Lamb)

Papers and topics:

find your own topic :) (a starting point can be the survey from L. De Raedt, S. Dumancic, R. Manhaeve, G. Marra. “From Statistical Relational to Neuro-Symbolic Artificial Intelligence”, 2020)
SAT solving using deep learning
- D. Selsam, M. Lamm, B. Bünz, P. Liang, D. Dill, L. de Moura. “Learning a SAT Solver from Single-Bit Supervision”, 2019
- V. Kurin, S. Godil, S. Whiteson, B. Catanzaro. “Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning”, 2019
- J. You, H. Wu, C. Barrett, R. Ramanujan, J. Leskovec. “G2SAT: Learning to Generate SAT Formulas”, 2019

ML for SAR image processing (click to expand)

Motivation: Synthetic Aperture Radar (SAR) is an active microwave imaging system that provides high-resolution images day and night under all weather conditions. It has been widely used in many practical applications, such as environment, crop monitoring, and disaster detection. Using best-suited machine learning algorithms to derive useful information from these data is essential.

Overview:

Chapter 2 “Spaceborne Synthetic Aperture Radar: Principles, Data Access, and Basic Processing Techniques” of Franz Meyer’s “SAR Handbook” book, 2019.
“A Tutorial on Synthetic Aperture Radar” (IEEE Geoscience and remote sensing magazine 2013) by A. Moreira, et al.

Papers and topics:

SAR Classification (Miller, et al. “Graph-based Active Learning for Semi-supervised Classification of SAR Data.” 2022)
SAR Features (Zhang, et al. “Sparse Feature Clustering Network for Unsupervised SAR Image Change Detection.” IEEE Transactions on Geoscience and Remote Sensing 2022)
SAR Despckling (Gu, et al. “A Two-Component Deep Learning Network for SAR Image Denoising.” IEEE Access 2021)
SAR Despckling (Yuan, et al. “Blind SAR Image Despeckling Using Self-Supervised Dense Dilated Convolutional Neural Network.” 2019)
SAR Features (Jiang ,et al. “Unsupervised Deep Sparse Features Extraction for SAR Image Segmentation.” IEEE Transactions on Geoscience and Remote Sensing 2022)

Online Learning (click to expand)

Motivation: While standard supervised learning assumes that we have access to some static fixed dataset, often in practice the data arrives in a stream. This is the subject of online learning (not meant in the internet online sense, but rather as streaming/incremental). Here, we often drop standard sampling assumptions and instead study worst-case behaviour (regret).

Overview:

chapter 1 of “A modern introduction to online learning” by Francesco Orabona, 2020.
introduction to online learning (iterative learning / streaming settings): Nicolò Cesa-Bianchi - Mediterranean Machine Learning school 2021 (youtube-link)

Papers and topics:

weighed majority and Littlestone dimension (Littlestone and Warmuth. “The weighted majority algorithm.” Information and computation 1994 and Littlestone “Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.” Machine Learning 1988).
online (sub-)gradient descent (chapter 2-4 of “A modern introduction to online learning”, Francesco Orabona, 2020)
bandits and expert advice (introduction and chapter 1,5,6 of “Introduction to multi-armed bandits”, Aleksandrs Slivkins, 2019)

Optimisation (and Generalisation) in Neural Networks (click to expand)

Overview:

A. Globerson: How SGD Can Succeed Despite Non-Convexity and Over-Parameterization (slides)

Papers and topics:

generalization bounds for deep neural networks (G.K. Dziugaite, D.M. Roy, “Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data”, 2017)
why SGD avoids overfitting and finds global minima (A. Brutzkus et al: “SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data”, 2017)
connection between flatness of loss curve and generalisation (H. Petzka et al. “Relative Flatness and Generalization”, 2021)
mode connectivity (Garipov, et al. “Loss surfaces, mode connectivity, and fast ensembling of dnns.” NeurIPS 2018).
deep learning and generalisation (Zhang, et al. “Understanding deep learning (still) requires rethinking generalization.” Communications of the ACM, 2021)
connectivity of the optimisation landscape (A. Shevchenko, M. Mondelli. “Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks”, 2020)
SGD stability (Hardt, Moritz, Ben Recht, and Yoram Singer. “Train faster, generalize better: Stability of stochastic gradient descent.” International conference on machine learning. PMLR, 2016)
choose one or more papers listed on page 14 in the above mentioned slides :)

Submodularity in ML (click to expand)

Motivation: Submodularity is a property of set functions similar to convexity for real-valued functions. It allows to build strong machine learning algorithms for sub-task such as sketching, coresets, data distillation, and data subset selection. Moreover, it is useful for clustering, active and semi-supervised learning.

Overview:

chapter 1-3 of “Learning with submodular functions: a convex optimization perspective” by Francis Bach, 2013.
introduction to submodularity in machine learning: Stefanie Jegelka - MLSS 2017 (youtube-link)

Papers and topics:

submodularity in data subset selection and active learning (Wei, et al. “Submodularity in data subset selection and active learning.” ICML 2015)
robust submodular observation selection (Krause, et al. “Robust submodular observation selection.” Journal of machine learning research 2008)
submodular function maximization (Krause and Golovin. “Submodular function maximization.” 2014)
graph cuts for image segmentation (Blum and Chawla. “Learning from labeled and unlabeled data using graph mincuts.” ICML 2001 and Jegelka and Bilmes. “Submodularity beyond submodular energies: coupling edges in graph cuts.” CVPR 2011)
learning submodular functions (Balcan and Harvey. “Learning submodular functions.” ACM symposium on theory of computing 2011)
batch active learning using submodular optimization (Chen and Krause. “Near-optimal batch mode active learning and adaptive submodular optimization.” ICML 2013)

Trustworthy ML (click to expand)

Motivation: Machine learning systems are ubiquitous and it is necessary to make sure they behave as intended. In particular, trustworthiness can be achieved by means of privacy-preserving, robust, and explainable algorithms.

Overview:

General: What does it mean for ML to be trustworthy? (youtube-link)
General: Trustworthy ML (Kush R. Varshney) (link)
Differential privacy: Chapter 2 of: Dwork, Cynthia, and Aaron Roth. “The algorithmic foundations of differential privacy.” Found. Trends Theor. Comput. Sci. 2014
Explainability: Došilović, Filip Karlo, Mario Brčić, and Nikica Hlupić. “Explainable artificial intelligence: A survey.” MIPRO 2018

Papers and topics:

Differential privacy:
- differential privacy and deep learning (Chen, Xiangyi, Steven Z. Wu, and Mingyi Hong. “Understanding gradient clipping in private SGD: A geometric perspective.” NeurIPS 2020)
- data reconstruction and differential privacy (Hayes, Jamie, Saeed Mahloujifar, and Borja Balle. “Bounding Training Data Reconstruction in DP-SGD.” arXiv, 2023).
- (extensions of) gaussian mechanism (Balle, Borja, and Yu-Xiang Wang. “Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising.” International Conference on Machine Learning. PMLR, 2018)
- high-dimensional mean estimation which is robust and private (Narayanan, Shyam, Vahab Mirrokni, and Hossein Esfandiari. “Tight and Robust Private Mean Estimation with Few Users.” ICML. PMLR, 2022.)
Explainability:
- nonlinear classifiers (Montavon, Grégoire, et al. “Explaining nonlinear classification decisions with deep taylor decomposition.” Pattern recognition, 2017)
Robustness:
- GNN structural perturbations (Wang, et al. “Certified robustness of graph neural networks against adversarial structural perturbation.” ACM SIGKDD, 2021)