5 Terms and definitions
For the purposes of this international standard, the terms and definitions given in ISO 12620:200?, ISO 24610-1, ISO 24610-2, and the following apply:
- associative relation
- relation by which a linguistic unit is associated with other units. It is a virtual association which does not requires their effective presence and differs from a paradigmatic relation in that the latter only refers to linguistic units associated by substitutability.
- closed data category
- data category whose content
is constrained by a list of permissible values which
comprise its conceptual domain
NOTE: A typical closed data category might be /grammaticalNumber/, which can have as its content the values: /singular/, /plural/ or /dual/. - conceptual domain
- finite list of simple data categories that may be the values of a complex data category
- data category
-
result of the specification of a given data field or the
content of a closed data field
NOTE: A data category is to be used as an elementary descriptor in a linguistic structure or an annotation scheme. Examples are: /term/, /definition/, /part of speech/ and /grammaticalGender/. Data categories for the management of lexical resources and terminology are comparable to data element concepts in ISO/IEC 11179-3:2003. - directed acyclic graph
- DAG
graph with directed edges and no cycle - discourse
- feature specification
- the assignment of a value to a feature. In MAF, a feature shall denote a morpho-syntactic feature of a linguistic unit, such as the mood or tense of a verb.
- feature structure
- a set of feature specifications, used in MAF to express morpho-syntactic content.
- finite state automata
- FSA
finite set of transitions from state to state, with an initial state and a final one
See also DAG. - form
- any sequence of letters, pictograms and numerals used to write or pronounce a word
- inflection
- modification or marking of a so that it reflects grammatical (i.e. relational) information, such as grammatical gender, tense, person, etc.
- inflection paradigm
- a table illustrating the forms of an inflected word
- inflected form
-
form that a word can take when used in a sentence or a
phrase
NOTE: An inflected form of a word is associated with a combination of morphological features, such as grammatical number or case. - lattice
-
term often used in the NLP community to denote (with
some slight confusion with the notion of algebraic
lattice), an directed acyclic graph with an
initial node and a final node.
See also DAG
See also FSA - lemma
- lemmatised form
conventional form chosen to represent a lexeme (e.g., the infinitive form for French verbs). - lexeme
- Fundamental unit, generally associated to a set of forms sharing a common meaning.
- lexical entry
- container for managing a set of forms and possibly one or several meaning to describe a lexeme.
- lexicon
- resource comprising a collection of lexical entries for a language.
- morpheme
- smallest linguistic unit bearing a signification in a discourse and that cannot be divided into smaller meaningful units. A morpheme is either grammatical (grammeme) or lexical (lexeme).
- morphological feature
- morpho-syntactic feature
category induced from the inflected form of a word
NOTE: ISO 12620provides a comprehensive list of values for European languages. An example of a morphological feature is: /grammaticalGender/. - morphology of a word
- morpho-syntax of a word
description comprising the lemmatized form or forms of a word, plus additional information on its /part of speech/data categories, possibly its inflectional paradigm or paradigms, and possibly its explicitly listed inflected forms.
NOTE: The term morpho-syntax is often used in place of morphology as it describes such features as number, gender, case etc. which are essential for syntactic agreement. - multi-word expression
- MWE
an expression composed of an ordered group of words that has properties that are not predictable from the properties of the individual words or of their normal mode of combination.
NOTE: The group of words making up an MWE can be continuous or discontinuous.
EXAMPLE: "father in law" or "to be over the moon" that mean something different from what they appear to mean. - natural language processing
- NLP
the field of study covering knowledge and techniques which allow computerized processing of linguistic data. This field combines a variety of skills including linguistics, mathematical logic, statistics, and algorithms. - open data category
- data category whose content
cannot be fully enumerated due to the organic nature of
language
EXAMPLE: Typical open data categories might include /term/, /lemma/. - syntagmatic relation
- relation by which linguistic units in a discourse are associated.
- morpho-syntactic tag
- to an associative relation corresponds a feature, for which the related entities share the same value. The morpho-syntactic tag lists some of these features (part-of-speech, grammatical category, etc.).
- part of speech
- grammatical category
lexical category
word class
category assigned to a word based on its grammatical and semantic properties
NOTE: ISO 12620provides a comprehensive list of values for European languages. Examples of such values are: /noun/ and /verb/. - token
-
non-empty contiguous discourse sequence identified as
such by a morpho-phonological analysis or an automatic
processing of the discourse.
This can involve the recognition of a regular or algebraic language (matching of the separators), or a lexicological analysis (recognition of roots, morphological derivation and inflection, etc.).
- tokenization
- the process identifying tokens
- word-form
- morpho-syntactic unit
contiguous or non-contiguous entity from a speech or text sequence identified as such in an associative relation. This identification is the basis of morpho-syntactic tagging (part-of-speech, grammatical category, agreement feature, etc.). Morpho-syntactic units may have no acoustic or graphic realization, or correspond to one or more tokens. - romanization
- transliteration from a non-Latin script into a Latin script.
- script
- set of graphic characters used for the written form of one or more languages (ISO/IEC 10646-1, 4.14)
- simple data category
- data category that may be
the possible content of a closed data category, but
that cannot itself be further sub-divided
EXAMPLE: /masculine/, /feminine/, and /neuter/ are possible simple data categories associated with the conceptual domain of the closed data category/grammaticalGender/ as it is associated with the German language. - transcription
- form resulting from a coherent method of writing down speech sounds
- transliteration
- form resulting from the conversion of one writing system into another
- word
-
in the context of a given language, is a description
composed of at least a part of
speech and a lemmatized
form
NOTE: The description can include more morphological information and/or syntactic and semantic information. A word is either a single word or a multi-word expression. - word class
See also part of speech
↑ Contents « 4 Normative references » 6 Key standards used by MAF