Benoît Sagot

Inria Senior Researcher in Natural Language Processing and Computational Linguistics
Head of the ALMAnaCH research team
Holder of a chair in the PRAIRIE institute
Holder of the annual chair on Informatics and Digital Sciences at the Collège de France (2023-2024)
Elected member of Inria's “Commission d'Evaluation” (Evaluation Committee)


Current areas of interest and research domains
  • Neural language models: architectures, cross-lingual transfer, interpretability, high-performing model training
  • Development of large raw corpora for language model training, with a focus on French, languages of France and low-resource languages
  • Machine translation, Text simplification
  • Multimodal NLP (speech, image)
  • Development of lexical resources (morphological, syntactic, semantic, etymological), for French and other languages
  • Interpretability of neural networks
  • Computational and quantitative morphology
  • “Classical” and computational etymology (Indo-European…)
  • Computational historical linguistics (Indo-European, French…)
  • Applications of NLP (opinion mining, computational œnology…)

Tools and Resources

Lefff

Lefff

Morphological and syntactic lexicon for French

Alexina

Alexina

Morphological (and sometimes syntactic) lexicons other than Lefff

UDLexicons

UDLexicons

Morphological lexicons in the CoNLL-UL format

Etymology

EtymDB

Etymological database extracted from wiktionary

WOLF

WOLF

Free Wordnet for French

MElt

MElt

Part-of-speech tagger

SxPipe

SxPipe

Shallow language processing chain

OSCAR

OSCAR

Huge multilingual web-based corpus

CAMEMBERT

CAMEMBERT

Neural language model for French

Publications

Only the most recent publications are listed below.

Projects

Ongoing projects

  • 3IA PRAIRIE (2019-): Director: I. Ryl. The Prairie Institute (PaRis AI Research InstitutE) is one of the four French Institutes of Artificial Intelligence, which were created as part of the national French initiative on AI. More information on the PRAIRIE web site.
  • ANR BASNUM (2019-2023): PI: G. Williams. Other participants: Litt&Arts, LATTICE. Topic: digitalisation and automatic processing of Furetière's Dictionnaire universel in its 1701 version updated by Basnage de Beauval.

Past projects

  • ANR ParSiTi (2016-2021): PI: D. Seddah. Other participants: LIMSI, LIPN. Topic: parsing and machine translation for user-generated content using contextual information.
  • ANR Profiterole (2016-2020): PI: Sophie Prévost (LATTICE). Topic: modelling and analysis of Medieval French and its evolution.
  • ANR SoSweet (2015-2019): PI: J.-P. Magué. Resp. for ALMAnaCH: D. Seddah. Other participants: ICAR (ENS Lyon, CRNS), Dante (Inria). Topic: studying the sociolinguistic variation on Twitter, by comparing linguistic/NLP and graph-based approaches.
  • ANR EDyLex (project PI) — Dynamic extension of lexical resources. Other participants: LIF (Marseilles), LIMSI, AFP, Vecsys Research, Syllabs
  • ANR Séquoïa (in charge for ALPAGE). PI: A. Nasr. Topic: probabilistic parsers for French. Main participant besides Alpage: (LIF) Marseille
  • ANRPerGram. PI: Pollet Samvelian. Topic: Linguistic description and HPSG implementation of Persian syntax.
  • SCRIBO (“pôle de compétitivité” System@tic). Topic: Semi-automatic and Collaborative Retrieval of Information Based on Ontologies
  • ANR Passage. PI: É. de La Clergerie. Topic: automatic construction of a very large syntactically annotated corpus by merging the annotations produced by several parsers; linguistic information extraction from this corpus. Other participatns: LIMSI, CEA, ELDA/ELRA
  • ANR Rhapsodie
  • ARC INRIA Mosaïque: formalismes syntaxiques de haut niveau
  • Projet ILF LexSynt. PI: S. Kahane. Topic: syntactic lexicons.
  • EASy Technolangue project . Topic: evaluation of parser for French.

Curriculum Vitæ

You can download here a reasonably recent version of my CV, from which personal information has been removed.

Contact

Postal address
Inria Paris (équipe ALMAnaCH)
2 rue Simone Iff
CS 42112
75589 Paris Cedex 12
FRANCE

  +33 1 80 49 43 14
  benoit.sagot.at.inria.fr
Twitter
Google Scholar