Second French parsing evaluation campaign Passage
News

[24/11/2009] Clôture de la campagne: 11 participants ont réussi à retourner des résultats.

[18/11/2009] Nouvelle version (v2.2) du guide d'annotation.

[29/10/2009] Clôture de la campagne repoussée au 24 Novembre 2009.

[19/10/2009] Nouvelle version (v2) du guide d'annotation et mise à disposition d'une liste de formes composées.

[19/09/2009] Ouverture de la campagne

[31/08/2009] Report de la campagne au 15 Septembre 2009. Clôture au 30 Octobre 2009.

Context

The French ANR Passage action organizes its second French parsing evaluation campaign, following the very first one organized by EASy (Technolangue EVALDA action) in 2004.

We invite the developers of parsers for French both from academia and industry to participate to the second PASSAGE evaluation campaign. In the course of achieving the two main objectives of the PASSAGE project:

  1. building a large sized Treebank for French and making it available to the community
  2. and investigating lexical acquisition for Parser improvement by parser outputs merging

Participation is free (after registering, see below) on a voluntary basis and gives access to all the resources that PASSAGE has built (corpora, annotation editors, evaluation toolkits). To have a look at the PASSAGE annotations, you can play with EasyRef, the WEB interactive annotation editor now freely open to all potential participants. The annotation guide and the Passage format specification report are also available.

The data

The corpus that we propose you to parse is a 100 Million words collection of material freely available from the WEB completed with a small amount of copyrighted newspaper material. Word and sentence segmentations are not imposed. The participant data will be mapped onto the reference data using a dynamic programming algorithm.

Syntactic Annotations

The syntactic annotations that we used in PASSAGE are derived from the one used during the EASY campaign. The annotations are described in various papers. Documentation and software support is available on the PASSAGE site along with EasyRef, an open WEB annotation editor and an evaluation server enabling an automatic comparison of one's parser against the PASSAGE development data (app. 85,600 words) issued from previous PASSAGE campaign and the EASY campaign.

Bien que non soumises à évaluation, il est fortement suggéré aux participants de retourner des annotations comprenant les informations sur les lemmes et les parties du discours.

Schedule

Both the development corpus resulting from the EASY evaluation campaign and the PASSAGE-2 test corpus of 100 million words are now available. They will be communicated to the participants as soon as they are registered. Participants to PASSAGE-2 are expected to return the PASSAGE-2 corpus completely parsed with the PASSAGE annotations between September 15th 2009 and October 30 2009..

Evaluation tracks
PASSAGE-2 will have 2 evaluation tracks:
  1. the manual reference track with its gold standard of 400,000 words that have been hand-annotated
  2. the automatic reference track, where the gold standard will be the results of combining the ouputs of the participating parsers.

For all registered participants, performance results of the first track will be published with the participant identified, while performance results of the second track will be published anonymously because of the exploratory nature of the reference data.

Link with EVALITA dependency parsing task

For exploring possible links with Parsing evaluation for other languages, PASSAGE-2 campaign has a tiny development and test corpus shared with EVALITA, the Italian ampaign on dependency parsing (http://evalita.fbk.eu/parsing.html). Aligned data both in French and Italian have been hand-annotated (200 sentences of developement and 50 sentences for test) both with PASSAGE annotations for the French part and the TUT annotations for the Italian one.

Conditions for participation

Registration is now open and necessitates signing a participation agreement available at ELDA. Participants are required to return the test corpus parsed according to the schedule above and agree to the publication of their identified performance results by the PASSAGE-2 organizer. Please contact Olivier Hamon at ELDA for obtaining the participation agreement.

Les annotations produites par les participants seront normalement retournées sur le serveur d'évaluation mis en place par ELDA. En cas de problème, un DVD ou une clé USB pourront être envoyés à ELDA par courrier. Les annotations seront rendues sous forme d'une archive compressée (gzip ou bzip2), comprenant un répertoire par corpus, et pour ceux-ci, un fichier résultat par fichier originel. Ainsi, les résultats pour frwiki/frwikipedia_001.txt.bz2 seront rangés dans le fichier frwiki/frwikipedia_001.xml. Il est estimé que les résultats prennent 2 à 3 Go.

Plusieurs jeux d'annotation pourront être soumis par les participants durant la campagne. Mais seul le dernier soumis sera considéré comme officiel. De plus, seuls les deux derniers jeux soumis seront conservés sur le serveur d'ELDA.

Les participants sont invités à vérifer la conformité de leurs fichiers résultats avec la DTD Passage

Organizing Committee
  • Patrick Paroubek (LIMSI-CNRS, pap@limsi.fr)
  • Anne Vilnat (LIMSI-CNRS)
  • Eric de la Clergerie (INRIA-ATOLL)
  • Oliver Hamon (ELDA)
Links