It's easy and intuitive to say that it's logic that market fluctuations are not guided by time series, but by others such as fundamentals, world markets and by the sentiment of people, operators, bankers, politics and so on.
The first question we faced is how to put all this potential inputs into an algorithm that can use them to predict the future movements? Not mathematics or statistics, it's clear to evaryone that we can't add pears to apples! How to merge data from Twitter, with time series? How to compare sentiment and market indexes?
Before the introduction of AI to financial calculations this was possible only by creating multiple indexes and see their internal and external correlations, but there's usually poor mathematical correlation between them. But our common sense says no, it's not true. There must be some correlation!
The answer we wanto to provide is driven by Machine Learning algorithms and AI. These are more than mathematical algorithms, their are something that can start learning (create weights for variables) and learn more, day after day (repeat calculations n times and progressively reduce the errors), such behaviour fairly simulates that one of our brain.
Our brain knows that there must be some correllation between those entities, and AI does the same. The difference is that we cannot put on an algorithm the weights created by our brains (our synapsis) but we can do so with those one created by our computers.
Here's where we started.
AI algorithms models make use of trained models.
Training data can be of the most variable types, e.g. historical data of shares fluctuations, market indexes, foundamental data, sentiment data, etc. that can be found and retrieved from the most relevant and reliable fonts. Once data is collected it has to be organized into feature vectors, or better feature tensors, that means that they are normally n-dimensional matrixes.
Learning is then done by running the neural network(s) for a determinate number of epochs on the training dataset or for an indeterminate nr of epochs, for example continously once the model has reached the desired level of accuracy.
Once the model is well trained (this means that the error curve is minimized), e.g. on historical data, predictions can be run. At this point calculations are far faster as the output of the learning phase is a model that could be somehow reduced to a mathematical formula with addictions and weighted multiplications, where weights are given by the training phase (what we earlier compared to our synapsis).
The final output is given by prediction, that could be a one or more days scenario, according to how the training was run and the features set built.
The architecture is based over a dozen of different neural networks, with recalculate the data every day and more times per day.
Our data is constantly retrieved from various sources on the web and stored on high potentials non-relational database. All our algorithms are built on open source languages, mostly python, and lybraries such as Tensorflow, Pandas and NumPy.
One server is dedicated to data gathering and organization, another one is dedicated to train the networks, which is repeated on weekly and daily basis, according to type of nework, its target and the weekly schedules. All the sistem is the redunded on a backup mirrored architecture. The two architectures are keeped separate in order to avoid security issues to the entire system.
Our intention is immediately to provide a very resilient and structured service and in the times to costantly improve the algoritms and introduce at least two new networks every semester, in order to guarantee a constant improvement of performances.