Genetic Programming for Price Estimation
August 22, 2011 Leave a Comment
Genetic programming uses the principles of Darwinian evolution to improve coded solutions to solve a particular problem, the focus is to determine a fitness function whose better quantifies the quality of each one of these solutions candidates. It is possible to automatically generate complex models using basic building blocks like a statistical function or mathematical model, to perform this task, this model uses a set of training data and another one for validation.
On this approach I’m going to use Eureqa as genetic programming platform. I’ll perform this experiment on Copper year average price, between 1908 and 2010, and all the data is going to be expressed by 2012 c/lb (the data is here).
Determine a future price, by way of speculation, is a complex task and usually consists in the study of the macroeconomic environment for demand and offer projections. To express the power of this technique only one variable is going to be inserted in models, the price.
This model includes the time driven price = f(t) with a t variable normalized from 0 < t < 104, from 1908 to 2010. The building blocks with presence in the model are:
- Constant
- Addition (+)
- Subtraction (-)
- Multiplication (*)
- Division (/)
- Square root (sqrt())
- Exponential (exp())
- Logarithm (log())
- Sine (sin())
- Cosine (cos())
This model evolve from those simple functions to a more complex and robust model.
____________________________________________________________________
f(t) = 198.37224 + exp(t – 97.702019)
____________________________________________________________________
f(t) = 160.41377 + t + 100.30139*cos(3382842.3 + 270026.06*t)
____________________________________________________________________
f(t) = 161.58005 + 0.76148409*t + 83.019974*cos(3387900.3 + 270026.06*t) + (16.248167*cos(247608.23*t) – 6.5617723)/(sin(0.14720485*t – 0.21756491) – 1.1168232)
____________________________________________________________________
f(t) = 153.59105 + t + 60.296329*cos(3308336.3 + 270026.06*t) + 0.50310683*t*cos(3308336.3 + 270026.06*t) + (12.723234*cos(247608.23*t) – 5.730453)/(sin(0.1461011*t) – 1.0796438)
____________________________________________________________________
f(t) = 158.66541 + 0.89893383*t + 63.091679*cos(3485792.3 + 270026.06*t) + 0.43873021*t*cos(3485792.3 + 270026.06*t) + (11.811392*cos(247608.23*t) – 4.8052697)/(sin(0.14508452*t) – 1.0796713)
____________________________________________________________________
You can see that the models can grow in complexity to fully adapt to the function, one of the reasons why, is because, it evaluates accuracy and correctness in the sense to not over-exploit the growth, this is limited, integrating a “size of solution” into the fitness function. Finally, the model statistics into the price function reveals that this model has top-level predictive qualities.
Now, here are the average price for the next 20 years.
- 2012 c/lb year
- 341.080 2011
- 347.513 2012
- 354.029 2013
- 353.202 2014
- 362.227 2015
- 367.775 2016
- 369.755 2017
- 368.177 2018
- 272.395 2019
- 155.904 2020
- 313.730 2021
- 157.618 2022
- 343.032 2023
- 163.853 2024
- 366.379 2025
- 187.156 2026
- 378.768 2027
- 220.690 2028
- 378.730 2029
- 260.870 2030
- 377.019 2031




