Multiple Context-Free Languages
If you already know what multiple context-free languages and grammars are, then you can head over to the ADP for MCFL page if you’re interested in how MCFGs were merged with ADP.
Mildly context-sensitive languages
Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if
- it contains all context-free languages,
- it admits limited cross-serial dependencies,
- all the languages are parsable in polynomial time, and
- all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear.
(Wikipedia)
One example of such a language formalism is the class of multiple context-free languages (MCFL) which are generated by multiple context-free grammars (MCFG). MCFG are a restricted instance of generalized context-free grammars.
Generalized context-free grammars (GCFG)
A GCFG $G = (N, T, F, P, S)$ consists of a set of composition functions $F$ and a set of productions $P$. $N$ are non-terminals, $T$ terminals, and $S$ the start symbol.
Composition functions have the form
, where $\gamma$ is either a string tuple or an application of a (possibly different) composition function.
Productions have the form $X \rightarrow f[Y, Z, …]$, where $Y, Z, …$ are non-terminal symbols and $f$ is a composition function.
Example
When the composition functions are unrestricted, then the class of languages generated by such grammars is equivalent to that of unrestricted grammars which belong to type-0 of the Chomsky hierarchy and are thus equivalent to turing machines.
A GCFG grammar is called a multiple context-free grammar when the functions are only defined as concatenations of constant strings and the components of the function arguments.
Multiple context-free grammars (MCFG)
A MCFG is a GCFG $G = (N, T, F, P, S)$ with the following properties:
- Functions are only defined as concatenations of constant strings and the components of the function arguments.
- Each component of a single function argument must only appear at most once in the right-hand side
- For all $A \in N$, every tuple derivable from $A$ must have the same size.
- All tuples derivable from the start symbol $S$ are 1-tuples.
The class of languages generated by MCFG is called the class of multiple context-free languages.
Example from Kato (2005) - Copy language
Alternative representation (Wild (2010), Nebel and Weinberg (2012))
The formalism provided by GCFG (and with that MCFG) makes use of rewriting functions which allows for an efficient representation in the case that functions are actually reused.
Instead of regarding MCFG as a specialization of GCFG, they can also be regarded as a direct generalization of regular CFG by adding vectors to rules. As we will later see, this is equivalent to inlining functions and can make the whole representation more compact (if functions couldn’t have been reused anyway) and understandable.
An MCFG is a tuple $G = (N, d, T, P, S)$ comprised of
$N$, a finite set of non-terminals,
$d$, a function from $N$ to $N$, assigning a dimension $d(A)$ to each $A \in N$,
$T$, a finite set of terminals,
$P$, a finite set of rules and
the start symbol $S \in N$ with $d(S) = 1$.
This representation of MCFG will further be called inlined MCFG, or short iMCFG.
Example from Kato (2005) translated to alternative representation
Another example
To distinguish between multiple uses of a nonterminal $N$ with $dim(N) > 1$, an upper index in parentheses is added.
Both representations, MCFG and iMCFG, can be automatically transformed into each other.
Transformation steps from MCFG to iMCFG
Apply all functions symbolically and remove them afterwards. E.g.
gets
Transformation steps from iMCFG to MCFG
Reverse the inlining by:
1. All terminating rules are left unchanged
2. For all non-terminating rules:
- Let $n$ be the number of nonterminal symbols on the right side (a symbol $N$ with $dim(N) > 1$ counts as one)
- Create a new $n$-ary function with the counted symbols in 2. and use them in the rule
E.g.
becomes
3. Eliminate identical functions
It is obvious that in some cases the original MCFG form (if there was one) cannot be reconstructed if it is not in a canonical form, e.g. if it has identical functions or functions which don’t use an argument. In all other cases, the original form can be reconstructed by eliminating identical functions of the generated MCFG.