I am a DPhil student with
Prof Yarin Gal
in the
OATML
group at the University of Oxford and a student in the AIMS program.
Originally from Romania, I grew up in Southern Germany. After studying Computer Science and Mathematics at the Technical University in Munich, I spent a couple of years in Zurich as a software engineer at Google (YouTube). I have worked as a performance research engineer at DeepMind for a year before starting my DPhil in September 2018.
I am interested in Bayesian Deep Learning, Information Theory (and its application within Information Bottlenecks and Active Learning), and Uncertainty Quantification. I also like to think about AI Ethics and AI Safety.
We show that a single softmax neural net with minimal changes can beat the uncertainty predictions of Deep Ensembles and other more complex single-forward-pass uncertainty approaches. Standard softmax neural nets suffer from feature collapse and extrapolate arbitrarily for OoD points. This results in arbitrary softmax entropies for OoD points which can have high entropy, low, or anything in between, thus cannot capture epistemic uncertainty reliably. We prove that this failure lies at the core of "why" Deep Ensemble Uncertainty works well. Instead of using softmax entropy, we show that with appropriate inductive biases softmax neural nets trained with maximum likelihood reliably capture epistemic uncertainty through their feature-space density. This density is obtained using simple Gaussian Discriminant Analysis, but it cannot represent aleatoric uncertainty reliably. We show that it is necessary to combine feature-space density with softmax entropy to disentangle uncertainties well. We evaluate the epistemic uncertainty quality on active learning and OoD detection, achieving SOTA 98 AUROC on CIFAR-10 vs SVHN without fine-tuning on OoD data.
In Uncertainty & Robustness in Deep Learning at Int. Conf. on Machine Learning (ICML Workshop)
The Information Bottleneck (IB) principle characterizes learning and generalization in deep neural networks in terms of the change in two information theoretic quantities and leads to a regularized objective function for training neural networks. These quantities are difficult to compute directly for deep neural networks. We show that it is possible to backpropagate through a simple entropy estimator to obtain an IB training method that works for modern neural network architectures. We evaluate our approach empirically on the CIFAR-10 dataset, showing that IB objectives can yield competitive performance on this dataset with a conceptually simple approach while also performing well against adversarial attacks out-of-the-box.
The Information Bottleneck principle offers both a mechanism to explain how deep neural networks train and generalize, as well as a regularized objective with which to train models. However, multiple competing objectives are proposed in the literature, and the information-theoretic quantities used in these objectives are difficult to compute for large deep neural networks, which in turn limits their use as a training objective. In this work, we review these quantities and compare and unify previously proposed objectives, which allows us to develop surrogate objectives more friendly to optimization without relying on cumbersome tools such as density estimation. We find that these surrogate objectives allow us to apply the information bottleneck to modern neural network architectures. We demonstrate our insights on MNIST, CIFAR-10 and Imagenette with modern DNN architectures (ResNets).
We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time 1-1/e-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.
Seven workshop papers at ICML 2021 (out of which five are first author submissions):
Two papers and posters at the
Uncertainty & Robustness in Deep Learning
workshop:
On Pitfalls in OoD Detection: Entropy Considered Harmful
Andreas Kirsch, Jishnu Mukhoti, Joost van Amersfoort, Philip H.S. Torr and Yarin Gal
or as
Poster PDF version as download
.
Deterministic Neural Networks with Inductive Biases Capture Epistemic and Aleatoric Uncertainty
Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H.S. Torr and Yarin Gal
or as
Poster PDF version as download
.
Four papers (posters, one spotlight) at the
SubSetML: Subset Selection in Machine Learning: From Theory to Practice
workshop:
Active Learning under Pool Set Distribution Shift and Noisy Data
Andreas Kirsch, Tom Rainforth, Yarin Gal
SPOTLIGHT
(also accepted at the
Human in the Loop Learning (HILL) workshop
)
or as
Poster PDF version as download
.
A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions
Andreas Kirsch, Sebastian Farquhar, Yarin Gal
(also accepted at the
Human in the Loop Learning (HILL) workshop
)
or as
Poster PDF version as download
.
A Practical & Unified Notation for Information-Theoretic Quantities in ML
Andreas Kirsch, Yarin Gal
or as
Poster PDF version as download
.
Prioritized training on points that are learnable, worth learning, and not yet learned
Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal
or as
Poster PNG version as download
.
One paper (poster) at the
Neglected Assumptions In Causal Inference
workshop:
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data
Andrew Jesson, Panagiotis Tigas, Joost van Amersfoort, Andreas Kirsch, Uri Shalit, Yarin Gal
or as
Poster PNG version as download
.
Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty
has been uploaded to arXiv as pre-print. Joint work with Jishnu Mukhoti, and together with Joost van Amersfoort, Philip H.S. Torr, Yarin Gal. We show that a single softmax neural net with minimal changes can beat the uncertainty predictions of Deep Ensembles and other more complex single-forward-pass uncertainty approaches.
Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning
was also presented as a poster at the
“NeurIPS Europe meetup on Bayesian Deep Learning”
.
You can find the poster below (click to open):
or as
PDF version to download
.
Two workshop papers have been accepted to
Uncertainty & Robustness in Deep Learning Workshop at ICML 2020
:
Scalable Training with Information Bottleneck Objectives
, and
Learning CIFAR-10 with a Simple Entropy Estimator Using Information Bottleneck Objectives
both together with Clare Lyle and Yarin Gal. They are based on
Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning
for the former, and an application of the UIB framework for the latter: we can use it to train models that perform well on CIFAR-10 without using a cross-entropy loss at all.
Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning
, together with Clare Lyle and Yarin Gal, has been uploaded as pre-print to arXiv. It examines and unifies different Information Bottleneck objectives and shows that we can introduce simple yet effective surrogate objectives without complex derivations.