The Algebraic Mind, by Gary Marcus,
School of Computing Science
Judging from excerpts from his forthcoming book, The Algebraic Mind, Gary Marcus has produced a decidedly exciting, important work which will doubtless prove both influential and controversial. In his chapter, "Rules and Variables", Marcus makes a strong case that certain connectionist learning models (those which plausibly qualify as representationally eliminative) fail to exhibit some fundamental forms of generalization which humans demonstrably exhibit. Yet, this is the very class of models which has led some to hail connectionism as a new paradigm -- a viable alternative to classically symbolic models.
Moreover, Marcus offers a tightly argued account of why eliminative connectionist models fail at important generalization tasks, viz., they no not perform variable binding. (Variable binding occurs when we associate constants with the variables that occur in our mental representations.) In the case of certain of his examples (e.g., example 1), I think that even casual introspection reveals Marcus' explanation to be correct. (Introspection, I think, should be given some weight in cases where our thought processes are readily available to the conscious mind.) In the case of example 1, humans clearly do make comparisons between the given input-output training examples, and they notice that whatever input is given, will be reproduced as output. Humans could not notice this unless they employ variables of some kind. Marcus' remaining examples also confirm his explanatory thesis, in the sense that they satisfy the predictions of his theory. Importantly, though, Marcus produces direct, powerful arguments in favor of the variable binding account for these introspectively-opaque examples.
Having established a strong case that humans do perform variable binding in some widespread forms of generalization, Marcus stresses the notorious inability of eliminative connectionist networks to match this human capacity. In addition, he calls attention to a further problem when he notes:
"What these demonstrations also show, though, is that the network does not in fact possess or learn any abstract representation of relationships such as "sameness" or "mother". Instead, what the model learns is always piecemeal, a kind of learning that is node-by-node instead of across nodes. If the network does not have direct experience telling it what to do with some node when that node is activated, the network will not "know" what to do when that node is activated. Whereas people notice a generalization that holds across all features, back-propagating multilayer perceptrons learn their generalizations feature-by feature."
Now, this passage raises several points of interest, which include: 1) Some connectionists complain that Marcus expects too much of a simple network. One objection in particular, focuses on the claim that "If the network does not have direct experience telling it what to do with some node ... the network will not `know' what to do when that node is activated." The objection (voiced by Michael Spivey-Knowlton at a recent conference) runs as follows: "Of course, what else would you expect? It's as though you expect a human to know what to do when a third eye suddenly appears in that person's forehead."
Now, this objection, taken by itself, seems silly. After all, Marcus has produced ample experimental evidence that humans are able to generalize outside their training spaces. However, the objection could be buttressed by noting that humans do not employ just a single connectionist module when they perform such strong generalizations; they call upon a much larger system, which arguably contains many separate modules or networks. I imagine that Marcus would concede this last point, but would note that this ``larger system'' might very well have a classical structure and perform variable binding. (In personal communication, Marcus affirms my suggestion). To this I would add that I have provided extensive arguments to demonstrate that any modular architecture, capable of matching human cognitive power, would necessarily instantiate some form of classical architecture (technical report available firstname.lastname@example.org).
2) Marcus makes a crucial point when he says that "the network does not in fact possess or learn any abstract representation of relationships such as `sameness' or `mother'." The ability to notice that two distinct elements are qualitatively the same is clearly critical in cases such as Marcus' sentence continuation examples (A rose is a rose, A dweezil is a ... ).
I find Marcus' point here of special interest, because it is problematic (at least) whether a connectionist network could acquire, via learning, the concept of ``qualitative sameness''. (This is not to say that a network could not be provided with innate wiring to support such a concept.) As Plato (in the Theaetetus) observed long ago, ``sameness'' is not a sensory property to be found in any given physical object. It concerns an abstract, non-sensory relation between objects. It is far from clear that such a relation could be induced by connectionist learning methods, in the absence of prior "wiring" which strongly predisposed the network to acquire such a concept.
3) The quoted passage raises the question whether other kinds of connectionist networks, apart from the backpropagation networks Marcus considers, might possess the requisite generalization power and still avoid variable binding. For example, certain competitive learning models require that interactions occur between active nodes in hidden layers. Such interactions entail that certain kinds of comparisons between nodes, within a single layer, do enter into the learning process. Admittedly, it's uncertain whether the interactions are of the right kind. In particular, it's hard to see that comparisons of qualitative sameness are being made, except in terms of whether two nodes have equal activation. Nevertheless, the existence of learning algorithms, which are quite unlike backpropagation and primitive Hebbian learning, raises the stakes in this debate.
I agree with Marcus' stance (which he sketches near the end of "Rethinking Eliminative Connectionism", in press, Cognitive Psychology) that though there may conceivably be alternative learning models which both qualify as "eliminative", and possess the requisite generalization power, the burden of proof must rest with anyone who posits their existence . Not all connectionists view matters this way, though. There are influential connectionists who insist that, since we have no proof one way or the other on issues of this kind, the situation is epistemically symmetrical, and we should remain open-minded. I favor open-mindedness in the sense that I agree we cannot rule out certain possibilities. However, I think it's a mistake to suggest that matters are even nearly symmetrical. Here is why:
Any successful learning algorithm, in this domain, will necessarily have highly specific mathematical properties, just as a proof for a mathematical proposition must have very special properties. If one arbitrarily pinpoints some extremely complex mathematical proposition and asserts, `It is just as likely that this proposition is provable as that it is not', one may expect a (figurative) barrage of rotten tomatoes before an audience of mathematicians. Likewise, it would be imprudent to claim that the existence of the required learning algorithm is just as likely as its non-existence. More to the point, perhaps, connectionists (and I number myself among them) should not kid themselves about just how difficult some of the challenges are that Marcus has posed.