Nur Lan

I am a PhD candidate in linguistics and cognitive science at the Ecole Normale Supérieure in Paris and Tel Aviv University, co-advised by Emmanuel Chemla (ENS) and Roni Katzir (TAU). My PhD is funded by the ED3C.

I am mainly interested in building computational models that try to learn natural language in the same way we think humans do. I am also interested in language evolution, formal language theory, information theory, as well as in animal linguistics and comparative cognition, and I like combining these fields in thinking about the origins of the human language capacity.

I did my master's in computational linguistics under the supervision of Roni Katzir, after completing a bachelor's degree in Computer Science (double-major with a BA in Film), both at Tel Aviv University. In between I worked for several tech companies in Tel Aviv, and as an editor and writer in Israeli press.

Minimum Description Length Recurrent Neural Networks
With Michal Geyer, Emmanuel Chemla & Roni Katzir

Neural networks are remarkably successful at many tasks, like image recognition and generation, but fail on tasks which are very easy for humans, like understanding simple regularities such as 10101010... or aaabbb..., or learning basic arithmetic operations like addition and multiplication.

The reason is that neural networks are often too big and have little incentive to generalize, and so they often end up memorizing, unlike humans. To make networks generalize well, we use a computable version of Kolmogorov Complexity, the Minimum Description Length principle (MDL), which balances the network's architecture size and its accuracy.

Using MDL we find small and perfect networks that can handle tasks which are notoriously hard for traditional networks, like basic addition, and recognition of formal languages such as anbn, anb2n, anbmcn+m, and anbncn. MDL networks are very small, and often contain only one or two hidden units, which makes it possible to prove that they are correct for any legal input. No other neural network that we know of has been proven do to that.

addition network
Addition network
anbncn network
anbncn network
Large Language Models and the Argument From the Poverty of the Stimulus
With Emmanuel Chemla & Roni Katzir

Modern language models are trained on huge corpora that amount to years or even lifetimes of human linguistic experience. Can we use this fact to learn about the initial state of a human child acquiring language?

We examine the knowledge of four state-of-the-art language models, including GPT-2 and 3, regarding important syntactic constraints. We find that all these models fail to acquire an adequate knowledge of these phenomena, delivering predictions that clash with the judgments of human speakers. Since these models are trained on data that go above and beyond the linguistic experience of children, our findings support the claim that children are equipped with innate linguistic biases that these models don't have.

Surprisal values for 'Who did the fact that Mary remembered surprise yesterday/*you'
GPT-3's surprisal values for a grammatical parasitic gap sentence (blue) and its ungrammatical variant (orange)
Papers and manuscripts