Learning in Human Communication
We can acquire sophisticated communication skills, including
gestures and manipulation of human languages, in a surprisingly short
time after our birth. The goal of our research group is to elucidate
the neural mechanism of this most mysterious human ability by focusing
on computational models and non-invasive brain imaging techniques.
Research Topics
1. Neural Mechanism of Reward-based Behavioral Learning
In many cases of cognitive learning, no explicit teacher signal
(error) is provided. However, we can achieve efficient learning
thorough interactions with the external world. The theory of reinforcement
learning aims to learn appropriate behaviors with only a rough evaluation
(reward) of actions provided from the external world. The basic
idea of the theory is to change behaviors in proportion to the difference
between the prediction of a reward and an actual reward.
To investigate a neural mechanism of human reinforcement learning,
we conducted an fMRI experiment for a stochastic decision task, where
a start and goal disk appeared on either of two boxes on the computer
screen. The subject was required to move the start disk to the goal
by pushing either the left or right buttons. The subject obtained
a monetary reward with a success, otherwise suffering the same
amount of monetary penalty with a failure. The actual disk movement
by a button push was controlled by probabilities. Therefore, with trial and error,
the subjects had to maximize their rewards by learning stochastic regularities
that control the disk movements.
We investigated four types of information processing that
were key to the reinforcement learning theory. The activity in the
caudate nucleus (1,2) and prefrontal cortex (7) was correlated with
how subjects changed behaviors in the early phase of learning (red,
learning rate index). The activity of the caudate nucleus was also
correlated with the short-term reward (blue). These observations
are the first piece of experimental evidence that the caudate nucleus plays
a central role in reinforcement learning that changes behaviors
guided by reward information processing. In contrast, the activity
in the dorsal premotor cortex (3, 4), supplementary motor area (5)
and lateral cerebellum (6) was correlated with how learning progressed
and converged (green, learning convergence index). In summary, the
caudate nucleus was involved in the early phase of learning that
requires a large amount of behavioral change, while the dorsal premotor
cortex, supplementary motor area and lateral cerebellum were involved
in the later phase of learning, which uses learned memory with learning
in progress. The activity in the orbitofrontal cortex (8) was correlated
with the accumulated reward (yellow), suggesting that this area
is involved in monitoring cumulative reward.
2. Computational Model for Generation and Recognition of Hierarchical
Motor Sequences
Another important aspect of cognitive learning is how to extract
and utilize hierarchical structures that exist in the external world
and the internal representation used in our behavioral selection.
For example, we can generate a variety of structured motor sequences
such as writing or speech, and learn to combine elemental actions
in novel orders.
We proposed a computational model called HMOSAIC (Hierarchical
MOdular Selection and Identification for Control) to explain such
hierarchical information processing. Each layer of HMOSAIC consists
of a set of paired control and predictive models. At the lowest
level, the control model computes a motor command, while the predictive
model predicts the consequence of the ongoing command. A responsibility
signal (posterior probability) of a module represents the accuracy
of the prediction generated by that particular module's forward model.
These responsibility signals are used not only to weight the outputs from
each control model, but also to guide competitive learning of
the predictive and control models resulting in a self-organization
of elementary movements.
In contrast, the higher-level receives
two inputs: an abstract (symbolic) desired trajectory and posterior
probabilities of its subordinate level, which represent the modules that
are playing a crucial role in the lower level under the current
behavioral situation. The higher control model generates, as a motor
command, prior probabilities for the lower-level modules, and therefore
prioritizes which lower-level modules should be selected. The higher
predictive model learns to estimate the posterior probability at
the next time step. The outputs from controllers, In addition to the
learning of both predictors and controllers, are weighted by the
precision of the prediction. Thus, the lower and higher-level modules
interact bi-directionally during learning and controlling hierarchically
organized movements. Our simulation confirmed that HMOSAIC can automatically
learn both elementary movements and their hierarchical temporal
order through sensorimotor learning, where the sequence-specific
neural firing pattern in the higher-level is similar to the neural
activity of the monkey supplementary motor area in sequential motor
control tasks.
People Involved in the above Topics
- Masahiko Haruno
- Satoshi Tada
- Brian Coe
- Mitsuo Kawato
Collaborator
- Daniel Wolpert
|