[18 April, 13:00] Let's Talk ML

Radek Bartyzal - Adversarial Network Compression (slides)

Knowledge distillation is a method of training a smaller student model using a previously trained larger teacher model to achieve a better classification accuracy than a normally trained student model.
This paper presents a new way of knowledge distillation by leveraging the recent advances in Generative Adversarial Networks.

Ondra Podsztavek - World Models (slides)

The paper explores building generative neural network models of popular reinforcement learning environments. A world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, it can train a very compact and simple policy that can solve the required task. It can even train an agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment.

[4 April, 13:00] Let's Talk ML

Ondra Bíža - Learning to grasp object with convolutional networks (slides)

Precise grasping of objects is essential in many applications of robotics such as assisting patients with motoric impairments. I will compare two approaches to learning how to grasp: Google's large-scale venture and a much smaller project carried out by the Northeastern University, which nevertheless achieved competitive results.

Václav Ostrožlík - Differentiable Neural Computer (slides)

Differentiable neural computer is a model based on neural network controller with external memory that is able to store and navigate complex data on its own. I'll go through its architectural details, compare it with Neural Turing Machines and show some interesting possibilities of using the model.

[21 March, 13:00] Let's Talk ML

Matus Zilinec - Machine text comprehension with BiDAF (slides)

I will talk about the Bi-Directional Attention Flow network for answering questions in natural language about an arbitrary paragraph. BiDAF is a multi-stage process that represents the context at different levels of granularity and uses attention mechanism to obtain a query-aware context representation without early summarization. The model achieves state-of-the-art results in Stanford Question Answering Dataset.

Radek Bartyzal - Objects that sound (slides)

A simple network architecture trained only from video is able to reach impressive results in localization of objects that produce a provided sound in a provided frame. This paper builds on an earlier work called 'Look Listen and Learn' and adds support of cross modal retrieval meaning that it can return an image for a provided sound and vice versa. I will present the new architecture and explain the advantages compared to the previous one.

[7 March, 13:00] Let's Talk ML

Markéta Jůzlová - Hyperband (slides)

Hyperband is a multi-armed bandit strategy proposed for hyper-parameter optimization of learning algorithms. Despite its conceptual simplicity, the authors report competitive results to state-of-the-art hyper-parameter optimization methods such as Bayesian optimization.
I will describe the main principle of the method and its possible extension.

Ondra Bíža - Visualizing Deep Neural Networks (slides)

Techniques for visualizing deep neural networks have seen significant improvements in the last year. I will explain a novel algorithm for visualizing convolutional filters and use it to analyze a deep residual network.

[21 February, 13:00] Let's Talk ML

Radek Bartyzal - Born Again Neural Networks (slides)

Knowledge distillation is a process of training a compact model (student) to approximate the results of a previously trained, more complex model (teacher).
The authors of this paper have inspired themselves by this idea and tried training a student of same complexity as its teacher and found that the student surpasses the teacher in many cases. They also try to train a student that has a different architecture than the teacher with interesting results.

This will be one longer (40 min) talk where I will also describe the relevant architectures used in the paper. (DenseNet, Wide ResNet).

Follow Us

Copyright (c) Data Science Laboratory @ FIT CTU 2014–2016. All rights reserved.