Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus

Julien Abadji
Julien Abadji
Research Engineer

I’m a research engineer at ALMAnaCH research team at Inria

Pedro Ortiz Suarez
Pedro Ortiz Suarez
PhD Student

I’m a PhD student in Computer Science at Sorbonne Université and at the ALMAnaCH research team at Inria

Laurent Romary
Laurent Romary
Senior researcher

Inria Senior Researcher, DARIAH EU infrastructure, director, ISO/TC 37 chair

Benoît Sagot
Benoît Sagot
Senior researcher

Inria Senior Researcher in Natural Language Processing and Computational Linguistics