# Neural Network in Never

## Introduction

Never is a functional programming language which includes matrices as first class objects. It is very likely that you hear about Never for the first time. I will demonstrate its major functions by implementing a simple neural network. In fact it is an example of neural network basic component known as perceptron. I will try to explain everything step by step so you can follow the article easily.

## Sigmoid

```
func sigmoid(x -> float) -> float
{
1.0 / (1.0 + exp(-x))
}
```

Almost all programming languages include a concept of a function. In the above
example function `sigmoid`

takes `x`

as formal parameter and
returns function value. The function transforms almost all values below zero
to zero and almost all values above zero to one. It is used in neural networks
to have zero-one results for parameters.

## Linear Congruential Generator

```
func randomize(seed -> int) -> () -> int
{
let v = seed;
func rand() -> int
{
v = (v * 1103515245 + 12345) % 2147483648
}
rand
}
```

Functions in Never can be nested to any level. They can also access parameters
or local variables defined in their syntactic scope. In the above
example function `rand`

can access and modify variable `v`

defined
in the scope of function `randomize`

. Each time the function is invoked
variable `v`

will be modified and can be treated as local counter.

Functions are also first class objects which means they can be accepted
or returned by other functions. In the above example function `rand`

is returned by function `randomize`

. What is also noticeable is that
`v`

becomes closed in the scope of function returned. Because of that
sometimes functions such as `rand`

are called closures.

In particular function `rand`

is a linear congruential generator or in simple
words a function which generates pseud-random values. It will be used later
to initialize neural network.

## Matrix Algebra

```
func print_matrix(W[D1, D2] -> float) -> int
{
let r = 0;
let c = 0;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
prints(W[r, c] + " ")
};
prints("\n")
}
}
```

Never supports matrices which can be passed and returned by functions as any
other type. In the above example function `print_matrix`

takes two dimensional
array `W`

as its formal parameter. Function dimensions are accessible
by parameters `D1`

and `D2`

. Such matrix passing style is known as
conformant arrays and lets to avoid problems of accessing matrces elements out
of their bounds. The above function prints matrix elements in row order. First
all elements in columns of row 0, next of row 1, and so on. Function `prints`

takes string as parameter and prints it on console. Addition operator is overloaded
(which means it can take arguments of different types) and in the above case adds
float and string parameter by changing float into its string representation.

```
func one_matrix(W[D1, D2] -> float) -> int
{
let r = 0;
let c = 0;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
W[r, c] = 1.0
}
}
}
```

Functions can also modify passed parameters. The above function initializes all matrix elements to one. Later this matrix will be used to perform matrix subtraction.

```
func rand_matrix(W[D1, D2] -> float, rand() -> int) -> int
{
let r = 0;
let c = 0;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
W[r, c] = (rand() % 1000) / 1000.0
}
}
}
```

```
func sigmoid_matrix(W[D1, D2] -> float) -> [_,_] -> float
{
let r = 0;
let c = 0;
let S = {[ D1, D2 ]} -> float;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
S[r, c] = sigmoid(W[r, c])
}
};
S
}
```

Of course we may initialize matrix elements to any value. Function `rand_matrix`

assigns values generated by function `rand`

to matrix elements. The example
also presents how function `rand`

is passed as a parameter. Second example
shows how function `sigmoid_matrix`

calculates sigmoid values for each
matrix elements.

```
func T_matrix(W[D1, D2] -> float) -> [_,_] -> float
{
let r = 0;
let c = 0;
let T = {[ D2, D1 ]} -> float;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
T[c, r] = W[r, c]
}
};
T
}
```

One of common matrix operation is transposition. The function `T_matrix`

returns matrix whose element at index `[r, c]`

is placed at index `[c, r]`

in the returned matrix.

```
func Hadamard_matrix(W1[D1, D2] -> float, W2[D3, D4] -> float) -> [_,_] -> float
{
let r = 0;
let c = 0;
let H = {[ D1, D2 ]} -> float;
for (r = 0; r < D1; r = r + 1)
{
for (c = 0; c < D2; c = c + 1)
{
H[r, c] = W1[r, c] * W2[r, c]
}
};
H
}
```

Matrices can also be multiplied. Hadamard multiplication is a special kind
of multiplication where each element of matrix `W1`

is multiplied
by corresponding element of matrix `W2`

. Of course each matrix should
have same dimensions size. Matrix multiplication which uses dot product
to calculate values of its elements is implemented in Never by operator `*`

.

As all basic building operations are described we can implement neural network.

## Forward Propagation

Supervised learning is one of methods used to generate neural network. The algorithm has set of inputs and set of expected results. The difference between expected result and output generated by the network is used to change neural network parameters. The algorithm may be formalized in matrix algebra.

As usual lets start with basic components which are needed. As stated above
we need set of inputs `x`

and expected results `y`

. They may be
expressed in Never as follows:

```
let x = [ [0, 1, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 1] ] -> float;
```

```
let y = [ [1, 0, 1, 0] ] -> float;
let yT = T_matrix(y);
```

Expected results `y`

are single values. To simplify notation instead of typing
`y = [ [1], [0], [1], [0] ]`

it is more convenient to define vector
`y = [ 1, 0, 1, 0 ]`

and transpose it. As you may notice it is expected
the the network will ignore boundary values `x0`

, `x2`

and
return only value of `x1`

.

Input weights `W`

are first initialized to random values. The following code
can set `[3, 1]`

matrix.

```
let W = {[ 3, 1 ]} -> float;
let rand = randomize(165);
rand_matrix(W, rand);
```

Now output calculation formula is simple. What is also worth noticing is that
array `s`

contains four values for each input sample.

```
let z = {[ 4, 1 ]} -> float;
let s = {[ 4, 1 ]} -> float;
z = x * W;
s = sigmoid_matrix(z);
```

This step finishes forward propagation. Actually, if values in matrix `W`

were correct then we would get also proper output values. Unfortunatelly
they need to adjusted by backpropagation phase.

## Backpropagation

The network needs to pass several learning cycles to change random values
`W`

set in initial phase into such which will give smallest error and
thus expected results.

Lets recall that matrix `s`

contains output values for each input value.
Error can be calculated by subtracting each element of `yT`

from `s`

.
Of course we keep these values in a matrix.

```
let err = {[ 4, 1 ]} -> float;
err = yT - s;
```

Now we should correct weights in matrix `W`

so that values for each input
will lead to smallest error. For that sigmoid derivation is used. One may
easily check that first derivative of function `s = 1.0 / (1.0 + exp(-x))`

is
`sD = s * (1 - s)`

. First derivative multiplied by error (or function
gradient) lets to move towards minimum of a function. This method is known
as gradient descent. Please note that we want to correct values in `W`

by
subtracting error. Also we want to move in gradientâ€™s opposite direction.
Thus plus sign in the equation. To summarize backpropagation equation is
`W = W + xT * err * sD`

.

All calculation are done for each input value at once. Matrix of input values
is given by transpose of matrix `T`

. To multiply values by their
corresponding elements matrix Hadamard multiplication is used.
Value `1`

is replaced by matrix with all elements initialized to `1`

.

```
let sD = {[ 4, 1 ]} -> float;
let one = {[ 4, 1 ]} -> float;
sD = Hadamard_matrix(s, one - s);
W = W + xT * Hadamard_matrix(err, sD)
```

Learning cycles are executed using loop over forward and backpropagation phases.

```
let i = 0;
for (i = 0; i < 1000; i = i + 1)
```

## Listing

```
func nn() -> int
{
let x = [ [0, 1, 0],
[1, 0, 0],
[1, 1, 1],
[0, 0, 1] ] -> float;
let xT = T_matrix(x);
let y = [ [1, 0, 1, 0] ] -> float;
let yT = T_matrix(y);
let W = {[ 3, 1 ]} -> float;
let z = {[ 4, 1 ]} -> float;
let s = {[ 4, 1 ]} -> float;
let sD = {[ 4, 1 ]} -> float;
let err = {[ 4, 1 ]} -> float;
let one = {[ 4, 1 ]} -> float;
let rand = randomize(165);
let i = 0;
one_matrix(one);
rand_matrix(W, rand);
for (i = 0; i < 1000; i = i + 1)
{
z = x * W;
s = sigmoid_matrix(z);
err = yT - s;
sD = Hadamard_matrix(s, one - s);
W = W + xT * Hadamard_matrix(err, sD)
};
z = ([[ 0, 1, 0 ]] -> float) * W;
s = sigmoid_matrix(z);
print_matrix(s);
z = ([[ 0, 1, 1 ]] -> float) * W;
s = sigmoid_matrix(z);
print_matrix(s);
0
}
```

The above code presents the whole network learning algorithm on one listing.

## Beginning at the End

```
func main() -> int
{
nn();
}
```

All code in Never begins in `main`

function. Thus we only execute function
`nn`

inside it.

[ X ] | y | expected |
---|---|---|

[ 1, 1, 1 ] | 0.96 | 1.00 |

[ 1, 1, 0 ] | 1.00 | 1.00 |

[ 1, 0, 1 ] | 0.00 | 0.00 |

[ 1, 0, 0 ] | 0.05 | 0.00 |

[ 0, 1, 1 ] | 1.00 | 1.00 |

[ 0, 1, 0 ] | 1.00 | 1.00 |

[ 0, 0, 1 ] | 0.05 | 0.00 |

[ 0, 0, 0 ] | 0.50 | 0.00 |

Of course it is interesting to verify what neural network responses are. I prepared a code which outputs values for every possible input. As you may see in seven out of eight cases the network returns expected values.

## Summary

The article demonstrates how Never language can be used to define a simple neural network. Also supervised neural network learning algorithm is demonstrated. I hope you liked the article and learnt something useful about neural networks, matrix algebra or Never programming language.