Genetic Programming for Price Estimation

Genetic programming uses the principles of Darwinian evolution to improve coded solutions to solve a particular problem, the focus is to determine a fitness function whose better quantifies the quality of each one of these solutions candidates. It is possible to automatically generate complex models using basic building blocks like a statistical function or mathematical model, to perform this task, this model uses a set of training data and another one for validation.

On this approach I’m going to use Eureqa as genetic programming platform. I’ll perform this experiment on Copper year average price, between 1908 and 2010, and all the data is going to be expressed by 2012 c/lb (the data is here).

Determine a future price, by way of speculation, is a complex task and usually consists in the study of the macroeconomic environment for demand and offer projections. To express the power of this technique only one variable is going to be inserted in models, the price.

This model includes the time driven price = f(t) with a t variable normalized from 0 < t < 104, from 1908 to 2010. The building blocks with presence in the model are:

  1. Constant
  2. Addition (+)
  3. Subtraction (-)
  4. Multiplication (*)
  5. Division (/)
  6. Square root (sqrt())
  7. Exponential (exp())
  8. Logarithm (log())
  9. Sine (sin())
  10. Cosine (cos())

This model evolve from those simple functions to a more complex and robust model.

f(t) = 201.52

____________________________________________________________________

f(t) = 198.37224 + exp(t – 97.702019)

____________________________________________________________________

f(t) = 160.41377 + t + 100.30139*cos(3382842.3 + 270026.06*t)

____________________________________________________________________

f(t) = 161.58005 + 0.76148409*t + 83.019974*cos(3387900.3 + 270026.06*t) + (16.248167*cos(247608.23*t) – 6.5617723)/(sin(0.14720485*t – 0.21756491) – 1.1168232)

____________________________________________________________________

f(t) = 153.59105 + t + 60.296329*cos(3308336.3 + 270026.06*t) + 0.50310683*t*cos(3308336.3 + 270026.06*t) + (12.723234*cos(247608.23*t) – 5.730453)/(sin(0.1461011*t) – 1.0796438)

____________________________________________________________________

f(t) = 158.66541 + 0.89893383*t + 63.091679*cos(3485792.3 + 270026.06*t) + 0.43873021*t*cos(3485792.3 + 270026.06*t) + (11.811392*cos(247608.23*t) – 4.8052697)/(sin(0.14508452*t) – 1.0796713)

____________________________________________________________________

You can see that the models can grow in complexity to fully adapt to the function, one of the reasons why, is because, it evaluates accuracy and correctness in the sense to not over-exploit the growth, this is limited, integrating a “size of solution” into the fitness function. Finally, the model statistics into the price function reveals that this model has top-level predictive qualities.

Now, here are the average price for the next 20 years.

  • 2012 c/lb year
  • 341.080   2011
  • 347.513   2012
  • 354.029   2013
  • 353.202   2014
  • 362.227   2015
  • 367.775   2016
  • 369.755   2017
  • 368.177   2018
  • 272.395   2019
  • 155.904   2020
  • 313.730   2021
  • 157.618   2022
  • 343.032   2023
  • 163.853   2024
  • 366.379   2025
  • 187.156   2026
  • 378.768   2027
  • 220.690   2028
  • 378.730   2029
  • 260.870   2030
  • 377.019   2031

Follow

Get every new post delivered to your Inbox.