Recent & Upcoming Talks

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

We explore the impact of the training corpus on contextualized word embeddings in five mid-resource languages.

Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures

We propose a new pipeline to filter, clean and classify Common Crawl by language, we publish the final corpus under the name OSCAR.