Maximilian Thiessen's Project List

Graph-based active learning

Task: Implement and design an efficient (python) framework to compare graph-based active learning algorithms.

We recommend using efficient graph libraries like graph-tool (https://graph-tool.skewed.de/).

The goal is to have a framework to compare different active learning algorithms more easily. Adding new active learning algorithms, new datasets, or new evaluation strategies should be as simple as possible. Data I/O, the active learning algorithms, the prediction algorithms, and the evaluations should be encapsulated and exchangeable (think of scikit-learn).

Start by reimplementing existing active learning methods:

Additionally, implementations of the following methods are of great interest for us (but are most probably out of scope for the 3 ECTS of an “algorithms and applications project” PR and rather suited for a bachelor or master thesis)

One simple (theoretically motivated) way of comparing active learning algorithms is desribed in (section 6. Experiments)

To empirically evaluate the predictive performance (e.g., the accuracy) of the active learning algorithms, you will also need prediction methods. We recommend using label propagation as described in

It is also described more geneally in:

You do not have to implement these methods by yourself. Feel free to use scikit’s label propagation and/or label spreading https://scikit-learn.org/stable/modules/semi_supervised.html#label-propagation

It might also be interesting to compare it to cut-based approaches like

Regarding datasets, you could start with smaller synthetic graphs (e.g., sampled random graphs). There are also many graph datasets freely available online, e.g., snap http://snap.stanford.edu/data/index.html

Context

Even though implementations of graph-based active learning algorithms are often available, it is tedious to evaluate and compare them on real-world graphs. Additionally, most existing active learning libraries like modAL, alipy and libact focus on heuristic (e.g., uncertainty based) methods, whereas we are interested in graph-based methods with theoretical guarantees. Because of that, our goal is to design an easy-to-use python framework that enables efficient evaluations and comparisons of graph-based methods.

In the long run, it might be possible to integrate these implementations in popular machine learning libraries like scikit-learn (https://scikit-learn.org) or vowpal wabbit (https://vowpalwabbit.org/).

Convexity in real-world graphs

Task Compute

of real-world graphs.

Try to find real-world graphs with (almost) convex classes.

Suggested approach Download some vertex-labelled graphs from http://snap.stanford.edu/data/ and implement routines to read-in the data by using, for example, python and graph-tool (https://graph-tool.skewed.de/).

Write functions that can compute specific properties of the graphs such as:

Do the values change when you ignore the orientation (i.e., directed/undirected) and edge weights (weighted/unweighted) of the graphs?

Can you find particular datasets and application areas (social networks, biological data, etc.) where the classes are (or at least tend to be) often convex?

Print (and visualise) the computed statistics in a consistent way for all the graphs and classes that you inspected.

Remark:

Suggested reading Primary resource (book defining all needed concept):

For k-convexity:

Recent papers employing convexity aspects of graphs in machine learning:

Context While classical notions of convexity in Euclidean space are largely studied and used by the machine learning community, related notions in discrete spaces (e.g., graphs) have been mostly overlooked. Recently, we (and other groups) have started to use the notion of convexity in graphs to achieve guarantees for machine learning problems on graphs.

Currently, these results are mainly of theoretical interest, but the next step is to evaluate and test the developed methods on real-world graphs. For that, we have to find reasonable datasets and application areas that fit our convexity-based assumptions. Your goal is to identify graphs where the labelled subgraphs are (close to) convex.