Recent & Upcoming Talks
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
We explore the impact of the training corpus on contextualized word embeddings in five mid-resource languages.
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures
We propose a new pipeline to filter, clean and classify Common Crawl by language, we publish the final corpus under the name OSCAR.