#### About

At Ekkono we do Machine Learning for embedded systems. Our SDK is small enough to fit into almost any system, yet powerful enough to help you predict the future.

As an example, we can predict which button you will randomly click next. Statistically, we have a 50 % chance of guessing. In reality, you may not be as random as you think.

Start by pressing any button you want 10 times, to let our SDK get to know you. Then we will predict what you will press next. After 100 presses we will display our predictions on screen, so that you know that we’re not cheating. The longer you play, the better we get.

Ekkono offers a variety of machine learning algorithms. Use the settings to play around with some of them. You can also learn more about the machine learning technologies under the *Theory* section in the settings.

**Tip**: You can also use the digits 1 and 0 on your keyboard.

#### Ekkono buttons demo

At Ekkono, we do machine learning for IoT with special emphasis on solutions at the edge. This means that Ekkono’s machine-learning tool is especially well equipped to solve use cases where you need to take instant actions on local data, individual conditions, not depend on constant connectivity and not be limited by latency in connectivity to the cloud.

If you want more information about Ekkono, please visit our website.

This demo shows the capability of machine learning. The Ekkono SDK contains a lot of functionality. We are especially proud of our incremental learning, which gets better the more you use it. This is a short introduction to the technologies used on this page.

### Random Forest

Random forest is a collection of decision trees. Trees, forest. Get it? The machine learning society is well known for their excellent sense of humour.

A decision tree works like a finite set of questions. Answering one question leads to the next, until finally you know what it is. Much like the game twenty questions where you will try to deduct what someone is thinking about just by asking 20 *yes* or *no* questions.

Each decision tree is constructed based on a subset of your previous actions. The subset part is important here. If every tree was constructed using the same parameters, they would all look the same. By only using some of the information, all of the decision trees as slightly wrong, yet by combining their knowledge, they are surprisingly accurate.

In our demo, we predict what you will do next by looking at what you have done before. The decision tree may look something like this:

- Did the user press 1 in the last key? (Answer:
**No**) - Ah, then perhaps the user pressed 0 four keys back? (Answer:
**Yes**) - Could it be that the total amount of 1's during the last five strokes are less then two? (Answer:
**Yes**) - Then you will most definitely press 0 the next time!

Decision trees are very useful in machine learning, since they are easy to read. The actual decision tree can be interpreted just like the text above. When many decision trees are combined into a random forest the prediction power is usually very accurate. Random forests are usually underestimated by its ease of use.

Random forests are fairly quick to train. The downside is that a random forest with many trees uses a lot of memory.

#### Settings

**Number of trees**- Sets the number of trees in the forest. More trees usually mean a more accurate model, but more training time and more memory consumption. Setting this value to 1 generates a Decision tree.

### Linear Regression

Linear regression uses regression to fit a line to the supplied data. Let's read that line again and try to understand what it means.

First - *Regression*. Looking at synonyms to that word we find that it means "The act or an instance of going back to an earlier and lower level especially of intelligence or behaviour". What does that have to do with line fitting? Well, the term "regression" was used by Francis Galton in his 1886 paper "Regression towards mediocrity in heredatory stature", where he talked about regression toward the mean. Then the name stuck.

OK, so what about fitting a line? Well, imagine that you sell Volvo cars. As a general rule of thumb, a newer car will render a higher price.

Of course, there are many factors that sets the value of the car. The type of engine, the equipment level, the mileage and the overall state of the car, to mention a few. In this case, however, we will only look at the price vs the manufacturing year.

Here is a graph of prices of Volvo cars that your competitors sell. The cars listed here are all manufactured in the years 2012 to 2020. Now, you are about to sell an older car, manufactured in 2011. What price should you ask for that car?

Enter Linear Regression. The first thing that Linear Regression does is that it adds a line anywhere on the graph, and then it calculates how well that line explains **all points** in the graph.

As you can see, this line explains the prices for cars manufactured in 2018 and 2019 rather well. The other year models are far off, however.

In linear regression we talk about a **Loss function** that explains how good or bad a line explains the data points. In this case, the loss function is the combined sum of the distances of each point in the graph and the line. In 2012, all of the prices are in the range from 100'000 to 200'000, while the line suggests 400'000. This means that the **loss function** will return a big value.

The task for our regression function is to minimize to loss function by changing the slope of the curve as well the point where the line intercepts the Y axis. Remember in school that every straight line in a 2D plane can be expressed in the form *kx + m*? Well, *k* is the slope and *m* is the intercept point.

By changing the *k* and *m* parameters gradually, using the derivative (which you also remember from school), we can alter the line until every point is as close to the line as possible. From this point, changing *k* or *m* in any way would make the loss function worse. When this happens, we have found the optimal setting.

The advantage of using a linear regression is that it can explain a function using only two values. If we want to calculate the price for a car manufactured in 2011, we can easily use the function above. To do this, we multiply the manufacturing year 2011 (x), with the slope of the curve 36'139 (k). Then we add the intercept point, which is 72'566'789.

The answer is that 108'740 is a suitable price for that car, according to this graph. As you can see, the full function of linear regression can be expressed by two values, the *slope* and the *intercept*. This makes the model very memory effective. However, training could be really slow.

Also, as you can understand, linear regression is not suitable everywhere. Why is that? Well, in the example with the cars above, if you would sell a car that is made in the year 2007 and earlier, then you will end up with a negative number. You have to give someone 35'816 SEK to take your car. There are ways to compensate for this, of course, like expressing the price in the form of a *x ^{2} formula to get a more suitable curve, or deduct 2000 from the year to get lower values of k and m.*

In the same way, which button you will press next may be a difficult thing to explain using only linear regression. Try it out!

#### Settings

**Learning rate**- How much the*slope*and*intercept*of the line will change in every iteration. Setting this value too high may crash the model, as the line will overcompensate for every error.**Ridge factor**- This will restrict the final sizes of the*slope*and*intercept*. In general, smaller values of your*weights*are better.

*
*

### Neural Networks

Neural networks is an old technology that has been used since the 1940's. It has gained in popularity in recent years as computers have increased in speed and memory while prices have decreased.

The implementation of Neural Networks that we use in the Ekkono SDK is called Multi Layered Perceptron, MLP.

A neural network consists of a number of layers with a number of nodes. The leftmost layer is called the *input layer*, and the rightmost one is called the *output layer*. Any layers inbetween are called *hidden layers*. Each node in every layer is connected to every node in the previous layer, as well as every node in the next layer.

To simplify the description, we can say that each node consists of a linear regression, but every time the linear regression predicts a negative number, it returns the value zero.

Turning negative values into 0 makes the resulting function of the full neural network to not be a linear function, which is actually the key to its success. It has been proved that a neural network can approximate any function, if it's just big enough. As an example, the XOR function can be described using only 2 hidden nodes.

Visualizing a neural network can be tricky, which is one great disadvantage compared to Random Forest. Even this small neural network with only two hidden nodes is quite difficult to understand. Look at the table below. If we insert the X_{0} and X_{1} in the table into the nodes, and then calculate the values for the corresponding H_{0} and H_{1}, you see that the output is indeed the XOR logic gate.

When you look at the table, all **blue** fields are greater than 0, and should be interpreted as 1, while the other fields should be interpreted as 0.

(The thing we do when we say that numbers lower than zero is zero, and larger numbers are one is called the *Activation function*. The *activation function* may look different in different applications.)

You may be disappointed that the Linear Regression section was so much larger than this Neural Network one, but if you understand the basics of linear regression, neural networks is just a continuation of the same theory.

#### Settings

**Layer**- The number of neurons in each layer of the network. The bigger the network is, the more complex functions it can explain. A larger network will take more time to train. The setting [1,0,0] will basicly be a linear regression*n***Learning rate**- How much the*slope*and*intercept*of each line in the neural network will change in every iteration. Setting this value too high will crash the model, as the line will overcompensate for every error.

*
*