I’m a fourth year PhD student studying Linguistics at Harvard University, with a focus on computational linguistics and semantics.

I’m interested in how we can use linguistics and experimental data to pin down language models’ capabilities and limitations and develop ways to improve them. Language models like ChatGPT have taken the world by storm, but we’re still far from understanding how they actually achieve such impressive performance on a wide range of real-life tasks – and why they still fail to handle important aspects of language meaning ranging from seemingly simple negation to more subtle problems like entity tracking. Studying their behaviour compared to human performance is the first step in identifying where they fall short, what kind of in-built biases they have (or don’t have) and how we can improve them. This may involve “inoculating” their pretraining data with small sets of carefully designed sentences showing the requisite phenomena, fine-tuning, or even building appropriate biases into the network architecture itself. But we won’t know what we need to fix until we map out where they fall short.

In particular, I’m studying whether language models capture the compositionality of meaning: humans effortlessly understand sentences they’ve never seen before by putting together the individual words and phrases in a systematic way. Can language models do the same? How do you test what a language model “understands” – what does that even mean? I’m currently looking at this through the lens of adjective-noun composition, and in particular privativity: how some adjective-noun combinations like counterfeit money can result in the new concept being precisely not money while others, like counterfeit sneakers, still qualify as sneakers. How do humans and language models handle combinations like this that they’ve never seen before?

As for the debate of whether we need linguistics for natural language processing, I would argue that we do! Linguistics lets us understand the strengths and limitations of large neural language models by giving us a well-understood set of phenomena to measure language models by, and can help us pinpoint ways in which language models are not, in fact, modelling human language the way we do. Moreover, linguistic typology gives us a principled way to structure cross-lingual transfer learning. I’m also very interested in the question of whether NLP and language acquisition will converge – whether the trending “more data and bigger models” approach to NLP will ever produce a human-like approach to language, or whether we will turn back to building hybrid systems which enforce syntactic, semantic or pragmatic structure in order to learn language with something closer to the mechanisms children use to learn, with vastly less data than their neural network counterparts. Is it possible to learn human-like language without being immersed in a human-like world, and using only distributional approaches with no hard-and-fast rules? Or are humans in fact using more distributional knowledge than current linguistic theories typically account for?

Language models also provide an intriguing window into human linguistic behaviour. In cases where human “snap” judgements or online processing measures diverge from their final conclusions, language model surprisal can provide a remarkably good account of human behaviour, suggesting humans may be be using distributional information in these cases. They might also provide insight in cases where human judgements are context-sensitive or otherwise fuzzy as opposed to following strict theoretical rules.

When I’m not thinking about language models, I work on experimental semantics, specifically crossover: whether a pronoun can be coconstrued with a referent (quantifier, wh-word, indefinite or proper name) that is hierarchically above it in a sentence. Crossover has been the subject of much theoretical literature and remarkably divided opinions on whether crossover applies to non-standard cases like indefinites or relative clauses. Working with Kathryn Davidson and Gennaro Chierchia, we’re developing a robust experiment design to clarify under what circumstances it does and doesn’t occur, with several theoretical ramifications for theories of dynamic semantics or situation semantics.

I also like to indulge in reading new theoretical accounts of indefinites and other areas of semantics. Hiding under the experimental facade, I have a deep love for theory and formalisms, with a soft spot for lambda calculus, fancy operators and continuation semantics. (My background before coming to linguistics was in mathematics, specifically logic and set theory, and a little in computer science.)

Outside of my research, you can find me reading, hiking (ideally up mountains) and spoiling my fluffy black cat, Aurora. If you ever need a native Swiss German (or British English) speaker for your research, let me know.