Œ€‹†ƒe[ƒ}

“Ί’JŒ«Ž‘@ATR”]ξ•ρŒ€‹†Š

mail to :
doya@atr.jp

Here are the outlines of my research topics, that have expanded and are ever expanding because of my deep curiosity about how we can learn to make our bodies move in such enjoyable ways.

Reinforcement Learning

How can we learn a variety of behaviors from trial and error? In my undergraduate thesis project, in hay days of microcomputers, I made a small robot that learns to walk by correlating random movement of the legs and feedback from a speed sensor. It was a fun project, but I later found that computational theory of 'learning from reward and punishment' was already formulated under the name of 'reinforcement learning.'Most researchers in reinforcement learning assume that the world's states, agent's actions, and the dynamics in time are all discrete, known as Markov decision problem (MDP). Since I am more interested in our action and movement in the real physical world, I formulated a continuos version of reinforcement learning paradigm, by extending the classic theory of optimal control to unknown, nonlinear environments.For more details, see:

  • Doya 2000, Neural Computation.

  • Morimoto and Doya 2001, NIPS13.

  • Doya 2001, IEEE Control System Magazine.

A Robot that Learns to Stand Up

These days, there are fancy humanoid robots walking on the street, well, not quite yet, but it's a shame that, once stumbled, most of them cannot stand up without human assistance. It's a further shame that human operators have to program every details of their movement.So, it's a simple three-link robot, but Jun Morimoto and I applied our continuous version of reinforcement learning algorithm to let this robot learn how to stand up on its own. It first learned to stand up after several hundred trials in simulation. Then we reconnected the 'brain' to the real hardware and it successfully learned to stand up after about one hundred trials, i.e., bumps and falls.

Early trials [QuickTime movie (6.1MB)]

After learning [QuickTime movie (3.3MB)]

For more details, see:

  • Morimoto and Doya 2001, Robotics and Autonomous Systems.

Parallel and Hierarchical ArchitecturesReinforcement learning is a quite versatile framework, but we know from our experience that learning is so hard in nonlinear and/or nonstationary environments. We extended the idea of using multiple prediction models for segmenting nonlinear/nonstationary dynamics to reinforcement learning paradigm.For more details, see:

  • Doya, Samejima, Katagiri, and Kawato, Neural Computation, 2002.

Metalearning and Communication

Thanks to recent advances in machine learning research, it is not rare to hear about robots and programs that learn on their own. However, most learning algorithms have a few parameters that critically affects the performance of learning, or even what is learned or not. Difficulty in setting such 'metaparameters' is the major hurdle that prevent fancy robots and algorithms from going out of the lab to the market.Our brain is a highly flexible learning system. It doesn't usually require an external operator who tunes the metaparameters of its learning. This suggests that our brain has certain mechanisms of 'metalearning,' adjusting its own learning procedure depending on the environment. Understanding of such mechanisms is the major research subject of my 'Creating the Brain' project, supported by CREST, JST.For more details, please visit our Metalearning Project Home Page.

Neuromodulators

One candidate of metalearning mechanisms in the brain is the 'neuromodulator system.' They are a kind of neurotransmitters, but their projections are spatially distributed and effects can be temporally extended. Typical neuromodulators that you'll find in any neuroscience textbook are dopamine, serotonin, noradrenaline, and acetylcholine.I proposed a comprehensive framework about how they support metalearning in the brain. Specific hypotheses are:

    • Serotonin controls the time scale of reward prediction.

    • Noradrenaline controls the randomness in action selection.

    • Acetylcholine controls what to be learned and what to be neglected.

For more details, see:

  • Doya 2000, Affective Minds.

  • Doya 2002, Neural Networks.

Cyber Rodent Project

In order to test all these ideas about what kind of mechanisms are necessary for efficient learning in real physical environment, we are building an experimental platform which we call 'Cyber Rodents.' They are small mobile robots that have the same basic constraints, and hopefully desires, as animals do: for example, finding energy sources and recharging themselves; defending their teritory; learning a variety of behaviors on their own, or more efficiently by sharing experience with their peers; and finding their mate to exchange 'genes' for possible evolution.

We have conducted some preliminary experiments of learning and evolution using a colony of four robots.

For more details, see:

  • Elfwing, Uchibe, and Doya 2003, GECCO.

Cerebellum, Basal Ganglia, and Cerebral Cortex

The cerebellum, a 'small brain' in the back of our head, and the basal ganglia, a yo-yo-like structure in between the brain stem and the cerebral cortex, are both believed to be essential for motor control. Damages to these structures cause pronounced motor deficits. However, it has been far from clear what these brain parts are doing in normal conditions. Furthermore, recent brain imaging data show that their role is not limited to motor control. Recent experiments and models suggest that the cerebellum learns and provides `internal models' of our body and the environment. Basal ganglia are also suggested to be involved in the selection of appropriate action based on the prediction of reward.Based on these and other theoretical works, I proposed a radically simple-minded view of the organization of the brain: phylogenetically newer parts of the brain are optimized for running different learning algorithms, namely, the cerebellum for supervised learning, the basal ganglia for reinforcement learning, and the cerebral cortex for unsupervised learning. For more details, see:

  • Doya 1999, Neural Networks.

  • Doya 2000, Current Opinion in Neurobiology.

Sequence Learning

I was deeply impressed with the series of experiments conducted by Okihide Hikosaka's group, showing that monkeys are good computer game players. Moreover, they nicely showed that different parts of the basal ganglia are involved in different aspects of sequence learning, the frontal part for quick learning of new sequences and the middle-to-rear part for quick execution of well-learned sequences.In order to give a computational account of this intriguing finding, Hiro Nakahara and I came up with a model in which different part of the cortico-basal ganglia loops used different representation of sequential movement, namely, the frontal part using visual coordinates and the rear part using motor coordinates. Our network model nicely replicated Hikosaka group's experimental results. Furthermore, model hypotheses were confirmed in behavioral experiments.For more details, see:

  • Nakahara, Doya, and Hikosaka 2001, Journal of Cognitive Neuroscience.

  • Bapi, Doya, and Harner 2000, Experimental Brain Research.

Chaos in the Inferior Olive

The inferior olive is believed to send the 'error signal' which guides learning in the cerebellum. But what is intriguing about inferior olive neurons is that, first, they fire in very low frequency and, second, they are coupled by 'gap junctions,' through which ions can flow through. Based on all the known electrophysiological data, Nicolas Schweighofer constructed a biophysical model of inferior olive neurons and showed in simulation that they can fire chaotically when connected by gap junctions. We are exploring the idea that such quasi-stochastic nature of inferior olive neurons are actually helpful in encoding critical error signal in a very low firing rate.For more details, see:

  • Schweighofer, Doya, and Kawato 1999, Journal of Neurophysiology