5 Terms and definitions

For the purposes of this international standard, the terms and definitions given in ISO 12620:200?, ISO 24610-1, ISO 24610-2, and the following apply:

associative relation
relation by which a linguistic unit is associated with other units. It is a virtual association which does not requires their effective presence and differs from a paradigmatic relation in that the latter only refers to linguistic units associated by substitutability.
closed data category
data category whose content is constrained by a list of permissible values which comprise its conceptual domain
NOTE: A typical closed data category might be /grammaticalNumber/, which can have as its content the values: /singular/, /plural/ or /dual/.
conceptual domain
finite list of simple data categories that may be the values of a complex data category
data category
result of the specification of a given data field or the content of a closed data field
NOTE: A data category is to be used as an elementary descriptor in a linguistic structure or an annotation scheme. Examples are: /term/, /definition/, /part of speech/ and /grammaticalGender/. Data categories for the management of lexical resources and terminology are comparable to data element concepts in ISO/IEC 11179-3:2003.
directed acyclic graph
DAG
graph with directed edges and no cycle
discourse
feature specification
the assignment of a value to a feature. In MAF, a feature shall denote a morpho-syntactic feature of a linguistic unit, such as the mood or tense of a verb.
feature structure
a set of feature specifications, used in MAF to express morpho-syntactic content.
finite state automata
FSA
finite set of transitions from state to state, with an initial state and a final one
See also DAG.
form
any sequence of letters, pictograms and numerals used to write or pronounce a word
inflection
modification or marking of a so that it reflects grammatical (i.e. relational) information, such as grammatical gender, tense, person, etc.
inflection paradigm
a table illustrating the forms of an inflected word
inflected form
form that a word can take when used in a sentence or a phrase
NOTE: An inflected form of a word is associated with a combination of morphological features, such as grammatical number or case.
lattice
term often used in the NLP community to denote (with some slight confusion with the notion of algebraic lattice), an directed acyclic graph with an initial node and a final node.
See also DAG
See also FSA
lemma
lemmatised form
conventional form chosen to represent a lexeme (e.g., the infinitive form for French verbs).
lexeme
Fundamental unit, generally associated to a set of forms sharing a common meaning.
lexical entry
container for managing a set of forms and possibly one or several meaning to describe a lexeme.
lexicon
resource comprising a collection of lexical entries for a language.
morpheme
smallest linguistic unit bearing a signification in a discourse and that cannot be divided into smaller meaningful units. A morpheme is either grammatical (grammeme) or lexical (lexeme).
morphological feature
morpho-syntactic feature
category induced from the inflected form of a word
NOTE: ISO 12620provides a comprehensive list of values for European languages. An example of a morphological feature is: /grammaticalGender/.
morphology of a word
morpho-syntax of a word
description comprising the lemmatized form or forms of a word, plus additional information on its /part of speech/data categories, possibly its inflectional paradigm or paradigms, and possibly its explicitly listed inflected forms.
NOTE: The term morpho-syntax is often used in place of morphology as it describes such features as number, gender, case etc. which are essential for syntactic agreement.
multi-word expression
MWE
an expression composed of an ordered group of words that has properties that are not predictable from the properties of the individual words or of their normal mode of combination.
NOTE: The group of words making up an MWE can be continuous or discontinuous.
EXAMPLE: "father in law" or "to be over the moon" that mean something different from what they appear to mean.
natural language processing
NLP
the field of study covering knowledge and techniques which allow computerized processing of linguistic data. This field combines a variety of skills including linguistics, mathematical logic, statistics, and algorithms.
open data category
data category whose content cannot be fully enumerated due to the organic nature of language
EXAMPLE: Typical open data categories might include /term/, /lemma/.
syntagmatic relation
relation by which linguistic units in a discourse are associated.
morpho-syntactic tag
to an associative relation corresponds a feature, for which the related entities share the same value. The morpho-syntactic tag lists some of these features (part-of-speech, grammatical category, etc.).
part of speech
grammatical category
lexical category
word class
category assigned to a word based on its grammatical and semantic properties
NOTE: ISO 12620provides a comprehensive list of values for European languages. Examples of such values are: /noun/ and /verb/.
token
non-empty contiguous discourse sequence identified as such by a morpho-phonological analysis or an automatic processing of the discourse.

This can involve the recognition of a regular or algebraic language (matching of the separators), or a lexicological analysis (recognition of roots, morphological derivation and inflection, etc.).

tokenization
the process identifying tokens
word-form
morpho-syntactic unit
contiguous or non-contiguous entity from a speech or text sequence identified as such in an associative relation. This identification is the basis of morpho-syntactic tagging (part-of-speech, grammatical category, agreement feature, etc.). Morpho-syntactic units may have no acoustic or graphic realization, or correspond to one or more tokens.
romanization
transliteration from a non-Latin script into a Latin script.
script
set of graphic characters used for the written form of one or more languages (ISO/IEC 10646-1, 4.14)
simple data category
data category that may be the possible content of a closed data category, but that cannot itself be further sub-divided
EXAMPLE: /masculine/, /feminine/, and /neuter/ are possible simple data categories associated with the conceptual domain of the closed data category/grammaticalGender/ as it is associated with the German language.
transcription
form resulting from a coherent method of writing down speech sounds
transliteration
form resulting from the conversion of one writing system into another
word
in the context of a given language, is a description composed of at least a part of speech and a lemmatized form
NOTE: The description can include more morphological information and/or syntactic and semantic information. A word is either a single word or a multi-word expression.
word class

See also part of speech

Contents « 4 Normative references » 6 Key standards used by MAF



Copyright ISO 2007
Version rev4 -- This page generated on 2008-01-18T13:33:52+01:00