CLTL

CLTL Research

At CLTL we teach and study Computational Linguistics and Natural Language Processing (NLP). We are interested in how language works and how we can analyse it using computational methods. We are interested in the abstract structures that form the basis of human language and on how to extract and analyse them.

Our research is inherently interdisciplinary. Many of our staff collaborate with experts in other fields, such as psychology to help identify bias, journalism to improve engagement, medicine to help better monitor patients, and history to help them identify and analyse historical events at scale.

Main Research Themes

Identity, reference and perspectives

The CLTL has a strong track record in conversational AI, but we are also developing Dutch FrameNet, and work on entity and event detection, hate speech and subjectivity.
Meaning representation

The CLTL is a long-standing participant of the Global Wordnet Association and a contributor to the Global Wordnet Grid for Dutch. We also study meaning in distributional representations, contextual embeddings and multilingual representations.
LLMs and interpretability

The CLTL actively studies, builds and uses Large Language Models. We notably research hallucination and bias in these models and how to correct them.

Projects

Rethinking News Algorithms: Nudging Users Towards Diverse News Exposure
2020-2024

Can we design news recommenders to nudge users towards diverse consumption of topics and perspectives? The growing role of news recommenders raises the question of how news diversity can be safeguarded in a digital news landscape. Many existing studies look at either the supply diversity of recommendations, or the effects of (decreased) exposure diversity on e.g. polarization and filter bubbles. Research on how users choose from the available supply is lacking, making it difficult to understand the relation between algorithm design and possible adverse effects on citizens.

We directly address the question of how news recommender algorithms can be designed to optimize exposure diversity. This innovative and interdisciplinary project builds on our extensive expertise on news diversity, news consumption behavior, text analysis and recommender design by providing: (WP1) a normative framework for assessing diversity; (WP2) a language model to automatically measure diversity; (WP3) a model of news consumption choices based on supply, presentation, and individual characteristics; and (WP4) a concrete prototype implementation of a recommender algorithm that optimizes exposure diversity, which will be externally validated in a unique field experiment with our media partners.

The project will bridge the gap between differing understandings of news diversity in computer science, communication science, and media law. This will increase our understanding of contemporary news behavior, yield new language models for identifying topics and perspectives, and offer concrete considerations for designing recommenders that optimize exposure diversity. Together with media companies and regulators we turn these scientific insights into concrete recommendations.

Team:

Wouter van Atteveldt (VU)
Antske Fokkens (VU)
Natali Helberger (UvA)
Marijn Sax (postdoc ethics UvA)
Myrthe Reuver (PhD student Vomputational Linguistics VU)
Nicolas Mattis (PhD student Communication Science)
Sanne Vrijenhoek, (scientific programmer, UvA)

In collaboration with:

Suzan Verberne (U Leiden)
Nava Tintarev (TU Delft)
Johan Oomen (VU / B&G)
Media partners: Beeld & Geluid; NPO; VPRO; RTL; SvdJ

Website: https://www.nwo.nl/en/projects/406di19073

Understanding of Language by Machines
2014-2019

The goal of the Spinoza project “Understanding of language by machines” (ULM) is to develop computer models that can assign deeper meaning to language that approximates human understanding and to use these models to automatically read and understand text. Current approaches to natural language understanding consider language as a closed-world of relations between words. Words and text are however highly ambiguous and vague. People do not notice this ambiguity when using language within their social communicative context. This project tries to get a better understanding of the scope and complexity of this ambiguity and how to model the social communicative contexts to help resolving it. The project is divided into 4 subprojects, each investigating a different aspect of assigning meaning:

ULM-1: The borders of ambiguity: ULM-1 will explore the closed world of language as a system of word relations. The goal is to more properly define the problem and find the optimal solution given the vast volumes of textual data that are available. This project starts from the results obtained in the DutchSemCor project.

ULM-2: Word, concept, perception and brain: ULM-2 will cross the borders of language and relate words and their meanings to perceptual data and brain activation patterns.

ULM-3: Stories and world views as a key to understanding language: ULM-3 will consider the interpretation of text built up from words as a function of our ways of interacting with the changing world around us. We interpret changes from our world-views on the here and now and the future. Furthermore, we structure these changes as stories along explanatory motivations. This project builds on the results of the European project NewsReader.

ULM-4: A quantum model of text understanding: ULM-4 is a technical project that investigates a new model of natural-language-processing. Current approaches are based on a pipeline architecture, in which the complete problem is divided in a series of smaller isolated tasks, e.g. tokenization, part-of-speech-tagging, lemmatisation, syntactic parsing, recognition of entities, detection of word meanings. In this new model, none of these tasks is decisive and the final interpretation is left to higher-order semantic and contextual models. This project also builds on the findings of previous European (KYOTO) and ongoing OpeNER and NewsReader) and national (BiographyNet) projects carried out at the VU University Amsterdam. The goal is to develop a new model of natural-language-processing in which text is interpreted in a combined top-down and bottom-up proces.

Hybrid Intelligence
2019-2029

Hybrid Intelligence (HI): Augmenting Human Intellect: NWO Zwaartekracht 2019-2029

A Gravitation Project Hybrid Intelligence (HI) that combines human and artificial intelligence. Six Dutch universities will develop theories and methods for intelligent systems that cooperate with humans, that adapt to dynamic circumstances and that can explain their actions. Ethical and legal values, such as transparency, accountability and trust, will be taken into account during the design of such HI systems. We will demonstrate applications of HI systems in healthcare, education and science to show the potential of artificial intelligence to amplify human intelligence instead of replacing it.

The Hybrid Intelligence Centre

Developing HI needs fundamentally new solutions to core research problems in AI: current AI technology surpasses humans in many pattern recognition and machine learning tasks, but it falls short on general world knowledge, common sense, and the human capabilities of (i) Collaboration, (ii) Adaptivity, (iii) Responsibility and (iv) Explainability of norms and values (CARE). These challenges are being addressed in four interconnected research lines:

Collaborative HI: How to design and build intelligent agents that work in synergy with humans, with awareness of each other’s strengths and limitations? We develop shared mental models for communication between humans and agents, computational theories of mind to enable collaboration, and exploit multimodal interaction for seamless dialogues.
Coordinators: Dr. Hayley Hung, h.hung@tudelft.nl and
Prof. Koen Hindrinks, k.v.hindriks@vu.nl

Adaptive HI: The world in which Hybrid Intelligent systems operate is dynamic, as are the teams of humans and agents that make up such HI systems. HI systems thus need to operate in situations not anticipated by their designers, and cope with variable team configurations, preferences and roles. This requires progress in online reinforcement learning, auto ML, and the integration of learning and reasoning. Coordinators: Dr. Herke van Hoof,
h.c.vanhoof@uva.nl and Prof. Guszti Eiben, a.e.eiben@vu.nl

Responsible HI: Addressing and mitigating some of the perceived risks of Artificial Intelligence technologies requires ethical and legal concerns to be an integral part of the design and operation of HI systems. Values such as transparency, accountability, trust, privacy and fairness can no longer be relegated to regulations that apply after system’s deployment. We develop methods to include ethical, legal and societal considerations into the design process (“ethics in design”) and into the performance (“ethics by design”) of HI systems. Coordinators: Dr. M. Birna van Riemsdijk, m.b.vanriemsdijk@utwente.nl and Prof. Bart Verheij, bart.verheij@rug.nl

Explainable HI: Intelligent agents and humans need to be able to mutually explain to each other what is happening (shared awareness), what they want to achieve (shared goals), and what collaborative ways they see of achieving their goals (shared plans and strategies). The challenge is to generate appropriate explanations in different circumstances and for different purposes, even for systems whose internal representations are vastly different from human cognitive concepts. We use causal models for shared representations, develop methods for contrastive, selective and interactive explanations, and combine symbolic and statistical representations. Coordinators: Dr. Antske Fokkens, antske.fokkens@vu.nl and Prof. Piek Vossen, piek.vossen@vu.nl

Make Robots Talk and Think
2020-2024

The Spinoza-project Understanding-Language-By-Machines has funded a follow-up project “Make Robots talk and think” (2020-2024) with 2 PhD students that work on our robot project Leolani.

Leolani uses communication to learn about us and the world but she also needs to learn our language at the same time. Communicating and reasoning over the physical world and the people she encounters is a real challenge.

They will define their research topic within the Leolani framework and become part of the team of researchers that further develops the platform.

More info will follow. For now read:

Dutch Framenet
2019-2023

Framing situations in the Dutch Language ; Dutch FrameNet
NWO Vrije Competitie Geesteswetenschappen (2019-2023)

Language plays a central role in framing, as we daily choose which nouns and verbs describe or frame a given situation. For some languages, researchers created databases (called FrameNets) containing rich collections of conceptual schemas (frames) that describe situations from a certain perspective. These frames are connected to words and sentences that express them. Several lexical resources exist for Dutch, but no FrameNet. Moreover, we have limited knowledge of the variation of framing in Dutch and how this compares to other languages.

The project’s objectives are:

to create a unique data set where similar situations are framed by many different sources and texts using a newly developed data-to-text method;
to capture the variation in framing these situations in Dutch and other languages;
to capture semantic-pragmatic factors underlying the usage of different frames for similar situations, and
to develop semantic frame and role annotation software.

Clariah
2015-2018

CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities): NWO National Roadmap for Large-Scale Research Facilities programme (2015-2018)

General info

CLARIAH is developing a digital infrastructure that brings together large collections of data and software from different humanities disciplines. This will enable humanities researchers, from historians, literature experts and archaeologists to linguists, speech technologists and media scientists – to investigate cross-disciplinary questions, for example about culture and societal change. CLARIAH has received 12 million euros for the development of research instruments and the training of scientists. This project is vitally important for the development of the humanities in the Netherlands; a digital revolution is taking place that will drastically change how humanities research is done. The potential societal impact of this is also considerable.

Organisations involved

Organisations involved (applicants) are: Huygens ING, International Institute for Social History, Meertens Institute, Netherlands Institute for Sound and Vision, DANS, Radboud University Nijmegen, Utrecht University, University of Amsterdam and VU University Amsterdam. Project leader: Prof. A.F. (Lex) Heerma van Voss.

CLARIAH | NWO-programma Nationale Roadmap on YouTube (in Dutch)

CLARIAH kickoff 2015 on YouTube (in Dutch)

QuPiD 2
2015-2019

QuPiD2: Quality and Perspectives in Deep Data. AAA-DS (Amsterdam Academic Alliance – Data Science) project (2015-2019)

About
QuPiD2 entails three interdisciplinary projects, embedded in the Humanities and Computer Science research groups of VU University Amsterdam and University of Amsterdam, and aiming at good synergy with other Amsterdam Data Science use cases:

The CLTL is involved in the third project, From Text to Deep Data.

Summary of QuPiD2
80% of digital data is in unstructured textual form. Textual data is rich and complex. Not only does it contain massive amounts of statements but, more importantly, it also reflects our perspective on these statements: our emotions, opinions, the interpersonal, as well as the current social debate. Textual data is therefore not only big but it is also deep, adding a multitude of complexity.

The QuPiD2 program aims to deliver a framework for deep data representation that makes data provenance, quality and perspective explicit in the way such data is described and consumed. This will ultimately help indicating bias in factual statements. It will allow to track variations over time and thus enhance our understanding of data and its reliability.

QuPiD2 will apply this framework to a variety of textual sources: social media, newspapers, biographies, encyclopedias, literary texts, e.g. novels, songs.

QuPiD2 aims are four-fold:

modeling of quality and perspectives by providing transparency and reliability measures, and allowing reasoning within social and historic contexts;
machine-crowd empowered processing of textual sources for populating QuPiD model;
collection and analysis of quality factors and perspectives through crowd-expert data interpretation;
demonstrating the value of data perspectives and quality analysis.

Modeling data quality and perspective variation, common in the humanities, is useful for various data science paradigms. In their recent History Manifesto historians Armitage and Guldi ring the alarm bell against the “ghost of short-termism”: policy makers and scientists base their analysis and decisions on limited data sets that cover incredible short periods of time. They break a case for longue dúree perspectives for policy makers, entrepreneurs, and scientists. Data Science will become an important instrument for bridging the gap between the humanities and other sciences, providing long term and ‘deep’ perspectives.

The QuPiD2 Team

Lora Aroyo, Computer Science, Faculty of Science, VU University Amsterdam
Rens Bod, Computational and Digital Humanities, Faculty of Humanities & Faculty of Science, University of Amsterdam
Inger Leemans, Cultural History, Faculty of Humanities, VU University Amsterdam
Julia Noordegraaf, Digital Heritage, Faculty of Humanities, University of Amsterdam
Piek Vossen, Computational Linguistics, Faculty of Humanities, VU University Amsterdam
Serge ter Braake, Media and Culture, Faculty of Humanities, University of Amsterdam
Davide Ceolin, Computer Science, Faculty of Science, VU University Amsterdam
Chantal van Son, Computational Linguistics, Faculty of Humanities, VU University

Digital Humanities
2013-Ongoing

Centre for Digital Humanities Amsterdam (2013-ongoing)

Project Leader on behalf of the VU University in the Centre for Digital Humanities Amsterdam: a collaboration between the University of Amsterdam, the VU University of Amsterdam and the Royal Netherlands Academy of Arts and Sciences. Within the field of Digital Humanities, researchers and students focus on digital or digitized sources and methods of research. Digital data concerning language, art, music, literature and media allow researchers to discovers new patterns, concepts and motives, eventually raising new research questions.

The Centre for Digital Humanities is a collaboration between the University of Amsterdam, the VU University, and the Royal Netherlands Academy of Arts and Sciences, joining forces with the Netherlands eScience Center

The Centre for Digital Humanities Amsterdam facilitates so-called embedded research projects, in which research questions from the humanities are approached by using techniques and concepts out of the fields of Digital Humanities. In these short and intensive projects, which last between 6 and 12 months, researchers collaborate with private partners and deliver proof-of-concepts. The centre preferably initiates embedded research projects in the context of larger projects in which expertise from the humanities and industry is brought together.

Open Dutch Wordnet

Open Source Dutch Wordnet is a Dutch lexical semantic database.

It was created by removing the proprietary content from Cornetto (http://www2.let.vu.nl/oz/cltl/cornetto), and by using open source resources to replace this proprietary content.

Open Source Dutch WordNet contains 116,992 synsets, of which 95,356 originate from WordNet 3.0 and 21,636 synsets are new synsets. The number of English synsets without Dutch synonyms is 60,743, which means that 34,613 WordNet 3.0 synsets have been filled with at least one Dutch synonym.

This project has been co-funded by the Nederlandse Taalunie (2013-2014).

CLTL is project coordinator of the translation of the English WordNet to Dutch.

Global Wordnet Grid
2006-Ongoing

Global WordNet Grid: a GWA Project (2006-ongoing)

In 2006 The Global Wordnet Association launched the Global Wordnet Grid: the building of a complete free worldwide wordnet grid. This grid will be build around a shared set of concepts, such as the Common Base Concepts used in many wordnet projects. These concepts will be expressed in terms of Wordnet synsets and SUMO definitions. People from all language communities are invited to upload synsets from their language to the Grid. Gradually, the Grid will then be represented by all languages. The Grid will be available to everybody and will be distributed completely free.

Global Wordnet Association
2000-Ongoing

Global WordNet Association (2000-ongoing)

Vossen is Founder and President of the Global WordNet Association. He founded GWA (with Christiane Fellbaum of Princeton University) in 2000 as a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. For more information see:

Global WordNet Association
Eighth Global WordNet Conference 2016 in Bucharest, Romania, January 27-30, 2016
Seventh Global WordNet Conference 2014 in Tartu, Estonia, January 25-29, 2014
Sixth Global WordNet Conference 2012 in Matsue, Japan, January 9-13, 2012
Fifth Global WordNet Conference 2010 in Mumbai, India, January 31 – February 4, 2010
Fourth Global WordNet Conference 2008 in Szeged, Hungary, January 22-25, 2008
Third Global Wordnet Conference 2006 in Jeju Island. Korea, January 22-26, 2006
Second Global Wordnet Conference 2004 in Brno, Czech Republic, January 20-23, 2004
First Global WordNet Conference 2002 in Mysore, India, January 21-25, 2002

_{back to top}

CLTL Research

Main Research Themes

Identity, reference and perspectives

Meaning representation

LLMs and interpretability

Projects

The Hybrid Intelligence Centre

General info

Organisations involved