Can connectionist models of word pronunciation account for phonology?
Department of Psychology, Florida Atlantic University
In the past decade we have witnessed a heated controversy between eliminative connectionism and symbolic accounts of cognition. Despite dozens of connectionist models in diverse areas of cognition, the consequences of their architectural choices are not fully understood. How is a models ability to generalize constrained by its architecture? Are different domains comparable in the class of generalizations they support? Does the ability of an architecture to generalize in a given domain guarantee generalization in a different domain?
Marcus's book (Marcus, in press) provides a lucid discussion of these issues and some formal tools for their evaluation. He demonstrates a systematic link between a model`s representational choices and the scope of the generalizations it can acquire: Eliminative connectionist networks can generalize only within the set of features on which they were trained, whereas networks implementing variables can extend their generalization outside the training space. This conclusion has some interesting methodological implications: if the class of attained generalizations is constrained by representational choices, then one can infer these representations by observing the class of generalizations manifested in behavior. Furthermore, one can systematically evaluate the principled adequacy of an architecture across domains by examining the type of generalizations exhibited in these domains.
I would like to illustrate these conclusions by comparing the achievements of eliminative connectionism in two areas: word pronunciation and word formation. Eliminative connectionism has gained considerable success in modeling the acquisition of some aspects of word pronunciation. This success is sometimes considered as evidence for its adequacy in modeling language. I will evaluate these claims by examining the scope of the generalizations attained in these two domains. I conclude that these domains differ in the scope of their generalizations, hence, the success of connectionist accounts of word pronunciation is not indicative of their adequacy as models of language.
Connectionist accounts of word pronunciation
The rise of eliminative connectionist models of word pronunciation is best understood in reference to dual route accounts. According to dual route models (e.g., Coltheart, 1978; Coltheart, Curtis, Atkins & Haller, 1993), word pronunciation is achieved by two mechanisms: a direct route, retrieving a words pronunciation from the lexicon, and an assembly mechanism, computing the pronunciation by mapping graphemes to phonemes. Central to this proposal is the assumption that the two routes differ in their computational properties: the direct route is implemented by an associative process, whereas the assembly of phonology is achieved by a symbolic mechanism. The view of grapheme to phoneme mapping as a symbolic process has met with substantial criticism: First, the form of possible outputs of the assembly mechanism (regular words) is indistinguishable from the form of impossible outputs (irregular words). Second, there is strong empirical evidence for the sensitivity of grapheme to phoneme correspondences (GPCs) to the contents of neighboring units, a finding incompatible with their view as represented by variables (for a review, see Van Orden, Pennington & Stone, 1990). Finally, significant aspects of phonology assembly are successfully captured by eliminative connectionist models (e.g., Seidenberg & McClelland, 1989; Plaut, McClelland, Seidenberg, & Patterson 1996; Van Orden et al., 1990). Eliminativist reading models have further led to some new predictions. For instance, following the principle that dynamical systems organize through recurrent feedback, Van Orden and Stone predicted that the speed of settling the activation should depend not only on forward consistency, i.e., the consistency of mapping spelling to sound, but also on backward consistency, namely, the consistency of mapping sound back to spelling (Stone, Vanhoy and Van Orden, 1997; Van Orden & Goldinger, 1994). This prediction has recently received ample experimental support (Frost, Fowler & Rueckl, 1998; Ziegler & Ferrand, 1998; Stone et al., 1997).
2. Connectionist accounts of word formation
For proponents of eliminative connectionism, the rejection of the symbolic account of GPCs speaks to the principled inadequacy of symbolic accounts of cognition: if rules are unnecessary for the assembly of phonology, then they may also be superfluous in other domains of cognition, including language (e.g., Plaut et al., 1996; Van Orden et al., 1990). Marcuss thesis offers the means to evaluate this claim. The ability of pattern associator to account for a domain of human cognition should depend on the level of generalizations attained in this domain. Specifically, the success of eliminative connectionism must reflect a domain in which generalizations are limited to the training space. An inspection of the correspondence between spelling and sound supports this prediction. The mapping of letters to sounds is arbitrary. No child is expected to infer the pronunciation of a letter he has never seen before (e.g., D) from existing knowledge (e.g. O and I). Indeed, the "rules" specified in Coltheart et al (1993) are primarily correspondences between specific tokens. Although some correspondences (e.g., aCe--A) may be expressed by linking either tokens or variables, the latter formulation is likely to reduce the observational adequacy of the model, since it would preclude capturing the strong content sensitivity of these correspondences. Thus, the success of eliminative connectionism to generalize in this domain does not indicate that the mind eliminates variables, nor does it demonstrate that generalizations outside the training space may be acquired without their implementation: it simply reflects the fact that the acquisition of grapheme to phoneme correspondences does not require such knowledge.
To examine whether the success of eliminative connectionism in modeling GPCs is indicative of its viability as an account of language, we next need to examine whether linguistic generalizations are limited to the training space as well. The work on inflectional morphology suggests this is not the case (e.g.,Kim, Pinker, Prince & Prasada, 1991, Marcus, Brinkmann, Clahsen, Wiese, & Pinker, 1995; Pinker, 1991; 1994, Prasada & Pinker, 1993). I would like to corroborate this conclusion by a relatively new example from morphophonology.
A new example: Evidence from the OCP.
My example concerns co-occurrence restrictions on the structure of Hebrew roots. Hebrew words are formed by inserting a root, a series of generally three consonants, in a word pattern, which provides vowels and affixes. For instance, the verb bided, (he isolated), is derived by a inserting the root bdd into the _i_e_ word pattern. The root bdd contains a geminate, dd. The presence of gemination is characteristic of Hebrew roots. However, Hebrew exhibits an interesting asymmetry in the location of geminates: there are numerous roots whose form is XYY, but almost no roots of the form YYX (e..g, *bbd or *ddb. ). This asymmetry is explicable by a constraint on root structure (The Obligatory Contour Principle, OCP; McCarthy, 1986). On this account, root final geminates are derived from a biconsonantal lexical representation by rightwards reduplication. For instance, the root bdd is lexically represented as bd. This lexical representation, bd, reduplicates during word formation, prior to its insertion in the word pattern, yielding the triconsonantal root, bdd. Reduplication, however, is the copying of a variable: X->XX. Furthermore, according to the OCP, reduplication is constrained by the root, a second variable. Neither the reduplication base nor the root are representable, according to eliminative connectionism.
If mental representations eliminate variables, then the constraint on root structure is unlearnable. My work with Joseph Shimron and Dan Everett challenges this assertion (Berent & Shimron, 1997; Berent, Everett & Shimron, 1998). We demonstrated that Hebrew speakers constrain the location of geminates in the production and rating of new words. For instance, given a new root, bg, speakers form a word by reduplicating the second root radical (e.g., bigeg), but never the first (e.g., *bibeg). The behavior we observe is inexplicable by appealing to word structure: Hebrew frequently exhibits word initial geminates formed by concatenating distinct morphemes (e.g., titfor, she will sew, from the root tfr). Furthermore, speakers avoid root initial gemination regardless of the position of geminates in the word. For instance, root initial gemination is rejected in words like bibeg, hitbabagti, and mabbigim, despite the contrast in the surface location of the geminates. Likewise, reduplication cannot be attributed to an associative process that simply adds segments. If geminates were formed by segment addition, then the production of geminates vs. non geminates should have reflected their relative frequency in the language. Our findings are strongly incompatible with this view. If geminates were formed by segment addition, then the expected frequency of geminate vs. nongeminate responses for our materials would be 4.9%. Conversely, the observed frequency of geminates relative to the nongeminate responses is 329%. Thus, the behavior we observe is inexplicable by the distribution of tokens in the language. Instead, it reflects a generalization that links two variables: one is the reduplication base (X-->XX) and the other is the root.
3. Conclusions and Caveats
Connectionist accounts of word pronunciation have gained considerable success, a success often viewed as evidence for their adequacy as models of language. An inspection of the generalizations exhibited in these two domains reveals some marked differences: The mapping of graphemes to phonemes is limited to the training space, whereas the generalizations reflected in phonological knowledge exceed the training space. The scope of phonological knowledge is thus fundamentally different from the acquisition of correspondences between graphemes and phonemes. Given Marcuss proof that pattern associators cannot acquire functions linking variables by training on their instantiations, it is unlikely that eliminative connectionism can account for the acquisition of the root structure constraint described here.
I would like to conclude my commentary by expressing a few caveats. My discussion of word pronunciation was intentionally limited to a single aspect of dual route models, namely, the assumption that GPCs are achieved by a symbolic process. Evidently, this assumption was not necessary in the first place. It should be recognized, however, that precisely because this assumption is unnecessary, its rejection does not imply the rejection of dual route models. Dual route models embody additional assumptions regarding the segregation of the routes and the contrast between lexical and nonlexical representations, assumptions supported by independent empirical evidence. Their evaluation falls beyond the scope of this commentary.
The exclusive focus on GPCs also limits the evaluation of connectionist accounts of word pronunciation. The conclusion that GPCs are learnable by an associative process falls short of demonstrating that phonological representations may be assembled without appealing to variables. Indeed, the formation of adequate phonological representations goes far beyond the mapping of letters to sound. In fact, Plaut et al. (1996) clearly acknowledge that the success of their model largely stems from their representational scheme: the distinction between the units assigned to a segment depending on its syllabic position (onset, nucleus or coda). Because the implementation of syllable structure is critical for this models success, it cannot be considered as evidence for the absence of variables in the representation of printed words. More generally, reading ability may not be independent of linguistic competence (Berent & Perfetti, 1995; Perfetti, 1985). To the extent that linguistic knowledge appeals to variables, then the phonological representations assembled in reading may be highly structured as well. A full account of reading may ultimately require the acquisition of generalizations similar to those required to explain phonology. Marcuss work demonstrates that this outcome is unlikely to be achieved if mental variables are eliminated.
Berent, I. , Everett, D. & Shimron, J. (1998). Do phonological representations specify variables? Evidence from the Obligatory Contour Principle. Manuscript submitted for publication.
Berent, I. & Perfetti, C. A. (1995). A rose is a REEZ: The two-cycles model of phonology assembly in reading English. Psychological Review,102, 146-184.
Berent, I., & Shimron, J. (1997). The representation of Hebrew words: Evidence from the Obligatory Contour Principle. Cognition, 64, pp. 39-72.
Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies of information processing, (pp. 151-216). London: Academic Press.
Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual route and parallel distributed processing approaches. Psychological Review, 100, 589-608.
Frost, S. J., Fowler, C. A., & Rueckl, J. G. (1998). Bidirectional consistency: Effects of phonology common to speech and reading. Manuscript submitted for publication.
Kim, J., Pinker, S., Prince, A.& Prasada, S. (1991). Why no mere mortal has ever flown out to center field. Cognitive Science, 15, 173-218.
Marcus, G. (in press). The algebraic mind: Integrating connectionism and cognitive science.. Cambridge: MIT press.
Marcus, G., Brinkmann, U., Clahsen, H., Wiese, R., & Pinker, S. (1995). German inflection: The exception that proves the rule. Cognitive Psychology, 29, 189-256.
McCarthy, J. (1986). OCP effects: Gemination and antigemination. Linguistic Inquiry, 17, 207-263.
Perfetti, C. A. (1985). Reading Ability. New-York: Oxford University Press.
Pinker, S. (1991). Rules of language. Science, 253(2), 530-535.
Pinker, S. (1994). The language instinct. New-York: Morrow.
Plaut, D., McClelland, J., Seidenberg, M. & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56-115.
Prasada, S. & Pinker, S. (1993). Generalization of regular and irregular morphological patterns. Language and Cognitive Processes, 8, 1-56.
Seidenberg, M. S. & McClelland, J. L. (1989). A distributed developmental model of word recognition and naming. Psychological Review, 96, 523-568.
Stone, G. O., Vanhoy, M., & Van Orden, G. C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language 36, 337-359.
Van Orden, G. C., & Goldinger, S. D. (1994). Interdependence of form and function in cognitive systems explains perception of printed words. Journal of Experimental Psychology: Human Perception and Performance, 20, 1269-1291.
Van Orden, G. C., Pennington, B. F., & Stone, G. O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488-522.
Ziegler, J. C., & Ferrand, L. (1998). Orthography shapes the perception of speech: The consistency effect in auditory word recognition. A manuscript submitted for publication.