Trying it out - Bottom Line: From Perceptrons to Neural Networks

Objective

After completing this lesson, you will be able to apply the concepts of the previous lesson - Bottom Line: From Perceptrons to Neural Networks.

Exercise

Your Turn!

Now it’s your turn. Download this document for a recap of the blocks we covered in this lesson and some hands-on exercises for you to explore.

What You Have Learned in This Lesson

In this video you learned about the basics of artificial neural networks, from early Rosenblatt perceptrons to multi-layered networks.

The "Sales Slip Algorithm"

The mathematical basis for the predecessor of modern neural networks – perceptrons that can be used for binary classification - can be traced back to the late 19th century and the works of mathematician and electrical engineer Oliver Heaviside.

He made contributions to linear algebra and invented what we called the "Sales Slip Algorithm" in the video. On a sales slip, you want to calculate the total sum a customer must pay and you’re getting that sum from first multiplying the unit price of each product with the quantity purchased and then adding up those values to the "Total".

In a perceptron, instead of unit prices and quantities you multiply the values for each feature with the corresponding weight before adding those up to what is often called a weighted sum.

Finally, you pass this "Total" value through a step function to find out whether a neuron is supposed to "fire" or not. The step function used in this lesson reports 0 for all values smaller than 0 and 1 for all other values. It is also named after Heaviside – the "Heaviside step function".

The sales slip on the left can be used as an analogy for the weighted sum in the perceptron. The Heaviside step function on the right determines which class (either 0 or 1) will be predicted.

Making a perceptron learn

So far, your perceptron is getting a prediction based on the weighted sum of inputs. However, weights are initialized with random values or zeros, the predicted output thus is also random or always zero.

To make a perceptron learn, you need to therefore include an error correction. This is done by adjusting the weights based on the so-called "delta rule" for perceptrons or a process called "backpropagation" in multi-layered networks.

This is how it works:

You build the delta between the desired result (the label of your sample) and the current prediction, which is 0 if the prediction was correct and -1 or 1 if the prediction was wrong. Then you multiply that delta with the learning rate and adjust the weights by that value.

Doing this repeatedly over several "epochs" will make your perceptron learn to solve certain binary classification problems.

Multilayer networks

However, for some data sets a single layer perceptron is enough. We could predict whether a day is a good day with 100% accuracy with a single layer perceptron.

Switching to the pancakes problem, we were unable to achieve similar results. The pancakes data set contains what is called an XOR (exclusive or) problem. Looking at the plot of the pancake data, it is not linearly separable, meaning there is not one line that you can place through the graph that clearly separates the categories "yummy pancake" and "meh pancake".

There is no line you can draw through the pancake data set that clearly separates the two categories “yummy pancake” and “meh pancake” - it is not linearly separable.

Such data sets require deeper networks. By adding "hidden layers" to the networks, they can learn how to solve more complex problems.

Next lesson