Computational Structure of Human Language

Author: E. S. Ristad, Year:1990

The central thesis of this report is that human language is NP-complete. That is, the process of comprehending and producing utterances is bounded above by the class NP, and below by NP-hardness. This constructive complexity thesis has two empirical consequences. The first is to predict that a language model outside NP is unnaturally powerful. The second is to predict that a language model less complex than NP-hard is descriptively inadequate.

To prove the lower bound, I show that the following three subproblems of language comprehension are all NP-hard: decide whether a given sound is a possible sound of a given language; disambiguate a sequence of words; and compute the antecedents of pronouns. The complexity proofs are based directly on the empirical facts of the language user"s knowledge, under an appropriate idealization. Therefore, they are invariant across linguistic theories. No knowledge of linguistic theory is needed to understand the proofs, only knowledge of English.

To illustrate the usefulness of the upper bound, I show that two widely-accepted analyses of the language user"s knowledge (of syntactic ellipsis and phonological dependencies) lead to complexity outside NP (PSPACE-hard and Undecidable, respectively). Next, guided by the complexity proofs, I construct alternate linguistic analyses that are strictly superior on descriptive grounds, as well as being less complex computationally (in NP).

The report also presents a new framework for linguistic theorizing, that resolves important puzzles in generative linguistics, and guides the mathematical investigation of human language.

Artificial Intelligence Laboratory, MIT

Table of Contents

1 Foundation of the Investigation 1

1.1 The conceptual framework 2

1.2 The constructive complexity thesis 8

1.3 Direct complexity analysis 9

1.4 Summary of the report 10

2 Structure of Phonological Knowledge 13

2.1 Segmental phonology 15

2.1.1 Complexity of segmental recognition and generation 18

2.1.2 Restricting the segmental model 20

2.1.3 The SPE evaluation metric 30

2.1.4 Modeling phonological dependencies 34

2.2 Autosegmental phonology 35

2.2.1 Complexity of autosegmental recognition 35

2.2.2 Suprasegmental dependencies 38

3 Syntactic Agreement and Lexical Ambiguity 42

3.1 Morpho-syntactic dependencies 44

3.2 Complexity of linguistic transforms 51

3.3 Complexity of agreement interactions 57

3.4 Conclusions 62

3.4.1 Locality in linguistic theory 62

3.4.2 The search for uniform mechanisms 65

4 Complexity of Anaphora 68

4.1 Preliminaries 69

4.2 Two models of the anaphora problem 71

4.2.1 The agreement model 71

4.2.2 The referential dependence model 76

4.2.3 From satisfiability to referential dependence 84

4.3 Evidence for an NP upper bound 86

4.3.1 Copy theory of ellipsis 89

4.3.2 Complexity of anaphora in the copy theory 92

4.3.3 Ellipsis reconsidered 101

4.4 Analysis of linguistic theories 106

5 References 110

A Philosophical Issues 118

A.1 Implications of a complexity classification 118

A.2 Unbounded idealizations 123

A.2.1 Unbounded agreement features 125

A.2.2 Unbounded phonological features 127

A.2.3 Limiting unbounded idealizations 129

B Structure of Elliptical Dependencies 130

B.1 Previous work reconsidered 132

B.1.1 Covariance rediced to prediction of subjects 133

B.1.2 The problem of nonsubject covariance 134

B.1.3 Covariance reduced to bound anaphora 137

B.2 Invisible obviation 140

B.3 Necessary structure of an explicit theory 145

B.4 The proposal system of representation 146

B.4.1 Segregate phrase and thematic structure 146

B.4.2 Two relations of referential dependency 148

B.4.3 Ellipsis as a shared thematic-function 150

B.5 The space of elliptical structures 155

B.5.1 Possible domains of ellipsis 155

B.5.2 Invisible crossover 157

B.5.3 Invisible obviation 158

B.5.4 Recursive ellipsis 160

B.6 Conclusions 161