CLTL

Research Overview

Video explaining 5 years of research in the Spinoza-projects “Understanding Language by Machines“, 2019
Video explaining NewsReader‘s Reading Machine — Example of one of CLTL’s projects Lecture (in Dutch) Prof. dr. Piek Vossen at Paradiso Amsterdam: ‘To Communicate with an Imperfect Robot — Get It?’ March 25, 2018

Our research not only targets Dutch and English as a language but also other languages in the world. An important project is the Global Wordnet Association and the building of the Global Wordnet Grid: this project aims at representing many vocabularies of languages as semantic networks or wordnets and combining them through a universal index of meaning. Building and studying this grid will tell us more about universalities and idiosyncrasies of languages and likewise about the roles and functions of words and expressions.

We see language as a reference system that connects people and systems to their perception of the world. Identity, reference and perspectives are central themes in our research and are studied in combination. In our research on conversational Robots, many of our ideas come together: http://makerobotstalk.nl In this project, we try to build robots that communicate with people in real-world situations taking perception of the contexts into account and shared common ground.

General

CLTL is the Computational Lexicology and Terminology Lab, headed by Piek Vossen. We study computational linguistics or natural language processing (NLP). We are interested in how language works and how we can analyse it using computers. We work on automatically getting knowledge from text. This is becoming more and more popular, as all the large technology companies (e.g. Google, IBM, Microsoft and Facebook) are investing in big data and language technology. At the same time, natural language processing is one of the core aspects of digital humanities research. We are collaborating with literature, history and social science researchers to explore the potential of NLP tools in their line of work, automatically analysing thousands of documents.Just imagine what you can do with all that data!

Computational Linguistics operates on the interface between computer science and linguistics. We have topics that require different levels of technical skill as well as different levels of linguistic knowledge. Feel free to come and have a chat if any of the topics below seem appealing to you.

Topics focusing on Natural Language Processing

How does automatic text analysis work? Which tools are available and what can they do? Do they deliver what they promise on new text? Can the results of the state-of-the-art be replicated? How can existing technology be improved?

We work on several technologies that can be adapted for a domain or Dutch, or simply tested and improved. Topics with NLP focus are mainly interesting for people with a strong technical background and programming skills, but it is also possible to study the outcome of tools and analyze what mistakes they make and why.

Topics focusing on Linguistics and Language Resources

How does language work and how can we model it in such a way that a computer can work with it? But also: what does computational linguistics have to offer to linguists (verifying theories through implementation or corpus study).

Topics in this area are interesting for both people with strong linguistic background as well as people who like to build interfaces and resources.

Topics focusing on Knowledge Representation and Reasoning

We have various projects where we mine text, extract information and represent this formally using RDF. This allows us to link information extracted from text to other resources and it allows end-users to query the data we extract. Research related to these topics involve ontology design and evaluation as well as evaluating and improving the results of our NLP analyses.

These topics are mainly interesting for students with some background in data representation. Topics with a higher or lower technical component can be found.

Research topics in this area include:

Topics focusing on Digital Humanities (and Social Sciences)

There are many digitized resources that are relevant for researchers in the humanities. We have various projects where we apply NLP technologies to automatically analyze text. The output of these analyses can be used by historians, specialists in language and literature, philosophers, communication scientist, sociologists and many others.

Topics in this area can be of interest to students of various backgrounds: people with a strong background in computer science or linguistics and who are interested in other domains of the humanities or social sciences can work on a topic where they use their expertise to support researchers in these various fields. Students with a background in other fields of the humanities or social sciences who are interested in text analysis can work on a topic where they investigate what NLP has to offer them.

Here are a few examples of possible projects in this domain:

...

...

...

...