Deep Learning Master Class

Tel Aviv University | Wednesday-Thursday, November 5-6, 2014

Thursday Nov. 6

8:30-9:30: Gathering

9:30-10:30: Rob Fergus (Facebook)
Learning to Discover Efficient Mathematical Identities
In this talk, I will describe how machine learning techniques can be applied to the discovery of efficient mathematical identities. We introduce an attribute grammar framework for representing symbolic expressions. Given a set of grammar rules we build trees that combine different rules, looking for branches which yield compositions that are analytically equivalent to a target expression, but of lower computational complexity. However, as the size of the trees grows exponentially with the complexity of the target expression, brute force search is impractical for all but the simplest of expressions. Consequently, we explore two learning approaches that are able to learn from simpler expressions to guide the tree search. The first of these is a simple n-gram model, the other being a recursive neural-network. We show how these approaches enable us to derive complex identities, beyond reach of brute-force search, or human derivation.

10:30-11:30: Shai-Shalev Shwartz (HUJI/ICRI-CI)
Accelerating Stochastic Optimization
Stochastic optimization techniques such as SGD and its variants are currently the method of choice for shallow and deep learning from big data. The two main advantages of SGD are the constant cost of each iteration (which does not depend on the number of examples) and the ability to overcome local minima. However, a major disadvantage of SGD is its slow convergence. I will describe new stochastic optimization algorithms that converge exponentially faster than SGD.

11:30-12:00: Break

12:00-13:00: Ilya Sutskever (Google)
Supervised Learning with Deep Neural Networks
Deep neural networks achieved great results in speech, vision, and language problems. But why do they work so well? I will argue that large deep neural networks can solve almost any problem, no matter how difficult, provided we have a large yet often feasible number of input-output examples. I will then present my recent work, together with Oriol Vinyals and Quoc Le, on applying recurrent neural networks to generic sequence-to-sequence problems, and report its performance on a large scale machine translation task.

13:00-14:00: Lunch

14:00-15:00: Nati Srebro (Technion/ICRI-CI)
Approximation, Generalization and Computation: Neural Networks as Universal Learners
In this talk, we will look at learning deep networks from a learning theoretic perspective, surveying what we do and don't understand about them. We will consider neural networks as powerful, even universal, models, discuss their approximation and generalization ability, and the importance of not separating between these statistical issues and computational/optimization issues. Since our theoretical understanding of deep learning is still limited, we will then focus on two layer networks with linear transfer (aka matrix factorization) and see how our more complete understanding there can inform us about deeper and more complex networks.

15:00-16:00: Yoshua Bengio (Universite de Montreal)
Fundamentals of Deep Learning of Representations
Deep learning has become very popular because of successes in speech recognition and computer vision, but it remains unclear to many why it works. To help feed that understanding, this lecture focuses on the basic motivations behind representation learning, in particular distributed representations of the kind used in deep learning, and for the idea of depth, i.e., composing features at multiple levels to capture a hierarchy of abstractions. It will use manifold learning and the geometric point of view on statistical learning and generalization to illustrate what deep learning of representations can bring, backed up by theoretical results about the potentially exponential gain of deeper networks. This view aims to bring some light on the question of what is a good representation, based on the notion of unfolding manifolds and disentangling the underlying factors of variation. It is anchored on the perspective of broad priors as the fundamental enablers of generalization in high-dimensional spaces, in order to reduce or tackle the curse of dimensionality.

16:00-16:30: Break

16:30-17:30: Demos, Q&A panel