The aim of this course is to introduce chart-based (or tabular) parsing techniques found in Natural Language Processing and to illustrate them on several large classes of syntactic formalisms (Context-Free Grammars, Unification-based grammars, Tree Adjoining Grammars [TAGs] with, possibly, generalizations to other Mildly-Context Sensitive formalisms). The course will focus on designing such tabular algorithms for new formalisms by providing some intuition about how they work and by examining complexity issues (both on theoretical and practical aspects). The course will also emphasize the notion of shared derivation forest as parsing output and its potential use, during or after parsing.
This paper shows how a careful understanding of some linguistic phenomena and their modeling in the lexicon-grammar tables can help enriching the Lefff, a large-coverage NLP-oriented syntactic lexicon for French. We show how we appied this approach to handle satisfyingly all kind of impersonal constructions, and present a first draft of operational modeling for verbal idioms.
In this paper, we introduce a generic approach to elliptic coordination modeling through the parsing of LTAG grammars. We show that erased lexical items can be replaced during parsing by informations gathered in the other member of the coordinate structure and used as a guide at the derivation level. Moreover, we show how this approach can be indeed implemented as a light extension of the LTAG formalism throuh a so-called "fusion" operation and by the use of tree schemas during parsing in order to obtain a dependency graph.
Nous présentons une méthode de fouille d'erreurs pour détecter automatiquement des erreurs dans les ressources utilisées par les systèmes d'analyse syntaxique. Nous avons mis en oeuvre cette méthode sur le résultat de l'analyse de plusieurs millions de mots par deux systèmes d'analyse différents qui ont toutefois en commun le lexique syntaxique et la chaîne de traitement pré-syntaxique. Nous avons pu identifier ainsi des inexactitudes et des incomplétudes dans les ressources utilisées. En particulier, la comparaison des résultats obtenus sur les sorties des deux analyseurs sur un même corpus nous a permis d'isoler les problèmes issus des ressources partagées de ceux issus des grammaires.
A preliminary proposal to standardize morphosyntactic annotations.
A preliminary proposal to standardize morphosyntactic annotations.
Ce mémoire de stage d'Ingénieur de l'École Polytechnique explore l'utilisation active des techniques de tabulation dans DyALog pour traiter certains phénomènes de coordination en TAL.
Ce mémoire de stage DEA s'inscrit dans le cadre d'un projet d'acquisition de connaissances et a pour objectif de représenter le domaine étudié, à savoir la botanique par l'intégration d'une terminologie à la chaîne de traitement existante et par le développement de méthodes d'acquisition de relations
Ce mémoire de stage Ingénieur X concerne le développement d'un outil générique pour structurer avec des balises un document à partir d'un modèle du document.
Cet article survole les fonctionnalités offertes par le système DyALog pour construire des analyseurs syntaxiques tabulaires. Offrant la richesse d'un environnement de programmation en logique, DyALog facilite l'écriture de grammaires, couvre plusieurs formalismes et permet le paramétrage de stratégies d'analyse.
This paper is a survey of the functionalities provided by system DyALog to build tabular parsers. Providing the expressiveness of logic programming, DyALog eases the development of grammars, covers several linguistic formalisms, and allows the parametrization of parsing strategies.
This paper investigates several refinements of a generic tabular parser for Tree Adjoining Grammars. The resulting parser is simpler and more efficient in practice, even though the worst case complexity is not optimal.
Cet article présente l'environnement de travail que nous développons au sein de l'équipe ATOLL pour les grammaires d'arbres adjoints. Cet environnement comprend plusieurs outils et ressources fondés sur l'emploi du langage de balisage XML. Ce langage facilite la mise en forme et l'échange de ressources linguistiques.
This paper presents a workbench for Tree Adjoining Grammars that we are currently developing. This workbench includes several tools and resources based on the markup language XML, used as a convenient language to format and exchange linguistic resources.
Mémoire relatant la mise en place d'un serveur pour accéder à des grammaire d'arbres adjoints (TAG) stockées dans un BD relationnelle. L'accès se fait par l'intermédiaire d'un langage de requêtes spécialisé. L'ensemble est intégré dans un serveur WEB Apache grâce à des Java Servlet et à Coccon.
Mémoire de DESS relatant une expérience de découverte de la structure logique d'un corpus botanique et représentation de cette structure en XML. Ce travail a été complété par une interface WEB de navigation.
Cet article explicite la nature des forêts d'analyse dans le cadre de grammaires d'unification, de programmes logiques et des grammaires TAG. Nous montrons comment ces forêts peuvent être représentées et extraites d'une execution avec le système DyALog.
This paper describes an experiment we have made in order to illustrate a promising method to speed up parsing. A first tabular parser, built from the syntactic skeleton of a unification grammar, parses a sentence and produces a shared forest. This forest is used to guide and speed-up a second tabular parser which performs the unification checks.
We develop a set of new tabular parsing algorithms for Linear Indexed Grammars, including bottom-up algorithms and Earley-like algorithms with and without the valid prefix property, creating a continuum in which one algorithm can in turn be derived from another. The output of these algorithms is a shared forest in the form of a context-free grammar that encodes all possible derivations for a given input string.
This paper describes the extension of the system DyALog to compile tabular parsers from Feature Tree Adjoining Grammars. The compilation process uses intermediary 2-stack automata to encode various parsing strategies and a dynamic programming interpretation to break automata derivations into tabulable fragments.
Slide for an invited presentation at SEPLN'00 (Vigo, Spain).
Le Traitement Automatique des Langues Naturelles (TALN) est un domaine de recherche en pleine expansion. DyALog, analyseur syntaxique généralisé s'inscrit dans ce cadre. L'ambiguité des grammaires à traiter dans le cas de telles applications conduit à un délai de réponse très long. L'objectif de l'étude de parallélisation était de réduire la complexité du logiciel en vue d'une application à des grammaires du Langage Naturel. L'étude complète de parallélisation de DyALog montre qu'il existe des algorithmes capables d'accélérer considérablement son exécution.
This paper provides a common framework to clarify the relationships between different automata and their associated tabulation techniques for Tree Adjoning Languages, a subclass of Mildly Context-Sensitive Languages. We have chosen Logic Push-down Automata working with Linear Indexed Grammars as a starting point. Several tabulation techniques for different parsing strategies are proposed and compared with previous approaches.
This paper describe several tabular algorithms for Tree Adjoining Grammar parsing, creating a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum.
également disponible en allemand, anglais et espagnol
This paper presents a general framework for deriving tabular algorithms for a very large class of stack-based computations, not only in context-free parsing but in logic programming as well and more generally for all kinds of “information” domains (abstract domains, constraint domains). Tabular algorithms store traces of computations in a table to achieve computation sharing, which is most useful when dealing with non-deterministic computations. By considering what can be naively described as partial information on stack elements, we interpret these traces as stack fragments. Tuning the exact amount of information present in these traces allows us to improve tabular evaluation of stack-based computations, both by increasing the sharing of partial computations and by unifying different tabular algorithms within the same framework.
The paper presents a tabular interpretation for Bottom-up 2-Stack Automata, which can describe parsing strategies for TAG and LIG where predictions are forbidden in one of the stacks. The results are also useful for tabulating other existing bottom-up automata models for this kind of languages.
The paper presents a tabular interpretation for a kind of 2-Stack Automata. These automata may be used to describe various parsing strategies, ranging from purely top-down to purely bottom-up, for LIGs and TAGs. The tabular interpretation ensures, for all strategies, a time complexity in O(n6) and space complexity in O(n5) where n is the length of the input string.
L'objectif final de ce stage de DEA était d'apporter un mécanisme efficace d'indexation de termes logiques à DyALog, un évaluateur tabulaire de programmes logiques. En effet, l'efficacité d'algorithmes tabulaires repose en grande partie sur l'utilisation de systèmes performants d'indexation en vue de recherches rapides dans une table dynamique d'objets. Dans le cadre de programmes logiques, ces recherches se font sur des termes logiques, modulo des opérations coûteuses, telles l'unification ou la subsomption.
The work presented here attempts to bring out some fundamental concepts that underlie some known parsing algorithms, usually called chart or dynamic programming parsers, in the hope of guiding the design of similar algorithms for other formalisms that could be considered for describing the “surface” syntax of languages. The key idea is that chart parsing is essentially equivalent to a simple construction of the intersection of the language (represented by its grammar) with a regular set containing only the input sentence to be parsed (represented by a finite state machine). The resulting grammar for that intersection is precisely what is usually called a shared forest: it represent all parses of a syntactically ambiguous sentence. Since most techniques for processing ill-formed input can be modeled by considering a non-singleton regular set of input sentences, we can expect to generalize these ill-formed input processing techniques to all parsers describable with our approach.
The Logic Push-Down Automaton (LPDA) is introduced as an abstract operational model for the evaluation of logic programs. The LPDA can be used to describe a significant number of evaluation strategies, ranging from the top-down OLD strategy to bottom-up strategies, with or without prediction. Two types of dynamic programming, i.e. tabular, interpretation are defined, one being more efficient but restricted to a subclass of LPDAs. We propose to evaluate a logic program by first compiling it into a LPDA according to some chosen evaluation strategy, and then applying a tabular interpreter to this LPDA. This approach offers great flexibility and generalizes the magic-set transformations. It explains in a more intuitive way some known magic-set variants and their limits, and also suggests new developments.
We present in this paper a structure–sharing framework originally developed for a Dynamic Programming interpreter of Logic Programs called DyALog. This mechanism should be of interest for alternative execution models of PROLOG which maintain multiple computation branches and reuse sub–computations in various contexts (computation sharing). This category includes, besides our Dynamic Programming model, the tabular models (OLDT, SLDAL, XWAM), the “magic-set” models, and the independent AND and OR parallelism with solution sharing models. These models raise the problem of storing vast amount of data, motivating us to discard copying mechanisms in favor of structure–sharing mechanisms. Unfortunately, computation sharing requires joining computation branches and possibly renaming some variables, which generally leads to complex structure–sharing mechanisms. The proposed “layer–sharing” framework succeeds however in remaining understandable and easy to implement.
DyALog, a tabular logic program evaluator with a modular architecture, is used to develop quickly an efficient interpreter for logic programs with boolean constraints [CLP(Bool) programs]. This is done by a coupling DyALOg with PECOS, a constraint solver over finite domains. This experiment, done in the context of Abstract Interpretation of logic programs, illustrates the “genericity” of our approach for developing interpretations on various abstract domains and for various semantics.
This paper presents Subsumption–oriented Push–Down Automata (SPDA), a very general stack formalism used to describe forest (“AND–OR” tree) traversals. These automata may be used for parsing or the interpretation of logic programs. SPDA allow a Dynamic Programming execution which breaks computations into combinable, sharable and storable sub–computations. They provide computation sharing and operational completeness and solves some of the problems posed by the usual depth–first, left–to–right traversals (as implemented in PROLOG). We give an axiomatization of SPDA and two examples of their use: the evaluation of logic programs and parsing with Tree Adjoining Grammars. SPDA may also serve in other areas such as Constraint Logic Programming, Abstract Interpretations, or Contextual parsing.
We show that the informal concept of packed shared forest often used to represent compactly the set of all parses of a sentence w.r.t. a Context-Free Grammar can be seen itself as a CF grammar that generates only thoses parses. This approach gives a better understanding of the real nature of these forest representation, and of the parsing process itself. The approach can be extended to the parsinf of ill-formed input, and to more complex formalisms such as Horn Clauses or Definite Clause Grammars.
We briefly recall principal abstract interpretation evaluation algorithms for Logic Programs before presenting DyALog, a Logic Programs Dynamic Programming interpreter likely to be an interesting generic abstract interpretation framework. DyALog ensures answer completeness, termination for finite domains, and computation sharing. Major resolution strategies can be simulated.