Reinforcement learning explained

Posted on 06-06-2019 , by: admin , in , 0 Comments

You have probably heard about Google DeepMind’s AlphaGo program, which attracted significant news coverage when it beat a 2-dan professional Go player in 2015. Later, improved evolutions of AlphaGo went on to beat a 9-dan (the highest rank) professional Go player in 2016, and the #1-ranked Go player in the world in May 2017. A new generation of the software, AlphaZero, was significantly stronger than AlphaGo in late 2017, and not only learned Go but also chess and shogi (Japanese chess).

AlphaGo and AlphaZero both rely on reinforcement learning to train. They also use deep neural networks as part of the reinforcement learning network, to predict outcome probabilities.

In this article, I’ll explain a little about reinforcement learning, how it has been used, and how it works at a high level. I won’t dig into the math, or Markov Decision Processes, or the gory details of the algorithms used. Then I’ll get back to AlphaGo and AlphaZero.

What is reinforcement learning?

There are three kinds of machine learning: unsupervised learning, supervised learning, and reinforcement learning. Each of these is good at solving a different set of problems.

Unsupervised learning, which works on a complete data set without labels, is good at uncovering structures in the data. It is used for clustering, dimensionality reduction, feature learning, and density estimation, among other tasks.

Supervised learning, which works on a complete labeled data set, is good at creating classification models for discrete data and regression models for continuous data. The machine learning or neural network model produced by supervised learning is usually used for prediction, for example to answer “What is the probability that this borrower will default on his loan?” or “How many widgets should we stock next month?”