Bayesian Filtering & Financial Applications

by Clay vanSchalkwijk on March 27, 2009

A friend of mine and I recently started a new project. After kicking around several ideas we finally reached a consensus on applying software prediction to financial data. This has been pursued pretty heavily but from a home brew stand point, we wanted to make software that could compete by mashing up existing data and technology available on the internet to make competitive and functioning software.

We intend on predicting the movement of stocks based on real time content analysis. This requires a good deal of machine learning and historical data, but even good content analysis is not enough. Using Bayesian Filtering with noise word reduction we plan on processing historical data and assigning the content to one of three categories: moveup, movedown, nomove. In order to train the filters, past press releases will be inserted into the filter mashed up with the stock data to track how the markets reacted to the context of the content. Over time, the software will be able to recognize keywords that trigger positive versus negative emotion in the market that would drive the price one way or the other. A score can be applied much like spam scores are applied and this number can be used as part of a greater overall algorithm to determine an action.

Just to bring a few readers up to speed on exactly how this will be applied, take the following formula:

Rather than training it to recognize the probability of spam we train it to recognize the probability that the word will trigger positive stock movement:

  • p is the probability that the content will result in positive movement.
  • p1 is the probability p(S | W1) that it is positive knowing it contains a first word (for example "capital");
  • p2 is the probability p(S | W2) that it is positive knowing it contains a second word (for example "boosted");
  • etc...

The entire body of the content will be processed against a known database of words and the market reaction to the presence of those words. The basic Bayesian filtering will need to be extended to deal with phrase recognition but overall a solid proven technology for machine learning to build from.

This information by itself, is nothing revolutionary but with strong pattern analysis like candlestick pattern recognition and other market indicators it can be used to create an accurate trading platform for marginal gains which over time can offer pretty high returns. There is certainly a lot of potential for this if it works, but it heavily depends on working accurately and there will be a lot of trial and error in the process.

For more reading on the concepts and components behind this idea, check out:

The nice part is all the historical data is out there around the internet which makes back-testing and scoring very easy to do and there will need to be a lot of testing.

Leave your comment

Required.

Required. Not published.

If you have one.