JuliaML — Flux a (very) beginner example — Part 2

Sébastien Dejean
3 min readJan 23, 2021

After in Part 1 in which we’ve created a basic neural network for managing a regression line, let’s explore the variation of parameters.

How long should we train the network

In the first test, we train 100 times, but can we lower it, or should we increase it to improve ?

Let’s try to create a plot with training from 1 times to 10000 times to see the evolution.

ColorSchemes

This julia package will help to change color to our plots.

as usual, we need to install it the first time:

using Pkg; Pkg.add(“ColorSchemes”);

We will be able to use predefined color palettes in our graph.

Let’s go !

We loop for N times, and for each loop we reset our model, and train for N*10 times.

We color the plots, as light blue for N low, and dark blue for N high. For the last loop, we color in thick red to be able to see where is the final training:

We see that how model converge quickly with little oscillation.

Gradient (Optimiser)

Is our example, we used ADAM gradient function.

It’s a well known gradient function, but is it the best one (at leat in our case) ?

So explore others …

Descent

This gradient function is the simpliest gradient function, and can converge very quickly, but not in all cases, and one parameter is required that influence a lot the process. The default parameter is 0.1

If we replace ADAM(), with Descent() with N=10:

If we replace ADAM(), with Descent() with N=20:

If we replace ADAM(), with Descent() with N=50:

As we can see, event with N=10 and a descent parameter of 0.01 we have converged. We can also notice tha between the difference in final step with ADAM() and Descent(0.01) which is significant.

Explore all others gradients by yourself or build you own !

Loss functions

For our loss function, we use “sum of squares”: loss(x, y) = sum((m(x).-y).²)

As this loss function will be minimized during training, we will be close to a regression line, with LMS (Least Mean Square).

But there are plenty of others strategies thet will need to be adapted depending on application :

Descent(0.01) and N=50

As we can see, my default “sum of squares” if converging very quickly with a descent (0.01)

see you soon !

--

--