If you use this resource, please cite:
-
Sagot Benoît et Fišer Darja (2008). Building a free French wordnet from
multilingual resources. In Ontolex 2008, Marrakech, Maroc
The WOLF (Wordnet Libre du Français, Free French Wordnet) is a free
semantic lexical resource (wordnet) for French.
The WOLF has been built from the Princeton WordNet (PWN) and various
multilingual resources (Sagot and Fišer 2008a, Sagot and Fišer 2008b, Fišer and Sagot 2008). Polysemous
literals have been dealt with by an approach based on word-aligning a
parallel corpora in 5 languages. The extracted multilingual lexical has
been semantically disambiguated thanks to wordnets for the languages
involved. Moreover, a bilingual approach was sufficient for building
new entries for monosemous words. To achieve this, we extracted
bilingual lexicons from Wikipedia and thesauri. The resulting wordnet
has been evaluated against the French wordnet developed during the
EuroWordNet project.
In 2009, a specific work has been done on adverbial synsets (Sagot, Fort et Venant
2009a, Sagot, Fort et Venant 2009b).
Since then, several efforts have allowed for an extension of WOLF's coverage and a reduction of its
noise. First, a disambiguation technique for translation pairs extracted from freely available resources lead to
version 0.2 (Sagot and Fišer 2011, 2012a). An approach targeted towards nominalisation extracted from parsed
corpora (version 0.2.1, Gábor et al. 2012) and another one based on word clusters extracted from aligned
corpora (version 0.2.2, Apidianaki et Sagot, 2012) were used to further extend the resource. Version 0.2.5 is
the result of the merging of WOLF 0.2.2 and another wordnet extracted automatically using a new graph-based
appraoch based on translation pairs extracted from wiktionaries (Hanoka and Sagot 2012).
An error identification approach was also developed (Sagot and Fišer 2012b), followed by a manual
validation of several thousands of candidate errors. In parallel, most verbal Basic Concept Set synsets were validated
and extended manually. Finally, we performed a manual filtering of a large number of (literal, synset) pairs that
were inconsistent with POS information from the Lefff lexicon, which allowed for an additional reduction of the
noise in the resource. The result of these semi-manual efforts is WOLF version 1.0b4.
The WOLF contains all PWN synsets, including those for which no French literal is known.
The WOLF is in the XML format used by the DebVisDic tool, which is an updated version of the XML format used in the BalkaNet project. For now,
SENSE elements are filled with information on the sources and approaches thanks to
which the lexeme was found, and not with sense numbers. Among those, a tag starting with "ManVal" indicates a
manually validated (literal, synset) pair, a tag starting with "ManAdd" indicates a pair that was manually added.
The WOLF is a free resource, distributed under the Cecill-C license (LGPL compatible).