Halfbakery: Survivor Bias Future Predictor

Business: Financial: Stock Market
Survivor Bias Future Predictor (+7) [vote for, against]
*Past Performance Is Not Indicative Of Future Results

Run trillions of pseudo-random simulations that predict: stock market prices, temperature, match outcomes, election outcomes etc against historical data - or whatever it is you're trying to predict.

Rank each seed by success rate. Keep trying more random simulations until you find one that has a 99% historical success rate. Make a big announcement in large letters:

Our new automated analyst is right 99% of the time*. Get yours today while quantities last.

And in the smallest letters you can get your hands on:

*Past Performance Is Not Indicative Of Future Results
-- ixnaum, Apr 25 2017

Survival Bias https://en.wikipedi...i/Survivorship_bias
[ixnaum, Apr 26 2017]

Gambler's Fallacy https://en.wikipedi...i/Gambler's_fallacy
[ixnaum, Apr 26 2017]

Overfitting https://en.wikipedia.org/wiki/Overfitting
[ixnaum, Apr 26 2017]

Become Astoundingly Rich Become Astoundingly Rich
An outline for a simplified physical implementation. [zen_tom, Apr 26 2017]

.. so I crunched some numbers in a simulation to see how many tries I needed to find a survivor that matched at least 99% of the flips

for 5 random coin flips, it takes ~50 tries

for 10 random coin flips, it takes ~ 1000 tries

for 20 random coin flips, it takes ~1 million tries

I guess I could have saved my self the simulation and used a little math instead ... oh well.
-- ixnaum, Apr 25 2017

This is just a very inefficient way of designing a very good model of a phenomenon.

You could do the same thing with any enginerring challenge. Bolt together random parts and see how fast it goes. Keep trying different random parts and eventually you will have a really fast car.
-- pocmloc, Apr 26 2017

Not only is it inefficient - it's plain wrong.

I was hoping that it would be apparent, but this is mostly one of those sarcastic ideas. It's poking fun at models that promise to predict future just because they succeeded in the past. I'm hoping it highlights how powerful survival bias can be (even though it's wrong). There is a bit of Gambler's fallacy thrown in for good measure too. See links
-- ixnaum, Apr 26 2017

I recognise the sarcasm. However, consider this: suppose you have a pseudorandom number generator algorithm which after various combinations of functions produces an accurate set of results up until now. Does that still completely rule out the possibility that you have in fact discovered an accurate algorithm which will continue to work?
-- nineteenthly, Apr 26 2017

Yes, why is it wrong?
-- pocmloc, Apr 26 2017

Yes, it is completely ruled out by the same mechanism that drives gambler's fallacy (link).

Basically it boils down to this: the subsequent outcomes are completely independent from the ones preceding them (even though it really doesn't seem so - thus the fallacy)
-- ixnaum, Apr 26 2017

Here is another way to explain it. Let's say that we were not talking about random number generators but instead we had a neural network that was trained to do the predictions. There is a well known problem called "overfitting" (link). What it means is that instead of making generalizations about how something works, the machine essentially learns to parrot back what it saw verbatim. This is useless, because unless it sees the exactly same thing in the future (which is unlikely) it will be wrong.

What I've done with this bad invention is invented the most extreme example of over-fitting.

So why bother inventing this? I find it useful as an illustration of the pitfalls of predictive models and how even bad models seem trustworthy. I believe the world is full of this snake oil where fake models are passed as the real deal. I'm hoping that the croissants given for this idea are for the sarcasm of it, not the practicality.
-- ixnaum, Apr 26 2017

With an idea like this, you could become astoundingly rich!
-- zen_tom, Apr 26 2017

No kidding.
-- ixnaum, Apr 26 2017

I absolutely get where you're coming from, [ixnaum]. But I don't agree that it completely rules out the discovery of a successful algorithm. What you describe is pretty similar to the way evolution works. Arbitrary genetic mutation usually leads to reduced fitness but occasionally increases it. What this does is to throw out the mutations, as it were, which don't work. If there is some kind of deterministic process behind something, this usually won't, but occasionally will, successfully predict the future of the process long term.

How does it differ from natural selection? Incidentally, my croissant is indeed sarcastic, don't worry.
-- nineteenthly, Apr 27 2017

There's other ways to achieve the same outcome. I heard about people writing a high-cost subscription investment advice newsletter. They divided the US into East/West and sent out free samples saying "Gold will go UP in value next week" to the East and "Gold will go DOWN in value next week" to the West. After gold did whatever it did, they ignored the 1/2 of the country which got the wrong prediction and sent another pair of newsletters to the 'successful' half of the country and said "Now Gold is going to go DOWN or UP" accordingly to the North and South regions. Once again, they were right half the time and told the NorthWest of the country, that their prescience was proven and if the prospective customers wanted to avail themselves of the newsletter's predictive powers in the future they needed to subscribe....
-- AusCan531, Apr 27 2017

Yeah, well, this gambler's fallacy thing is only believed by none gamblers
-- theircompetitor, Apr 27 2017

//I don't agree that it completely rules out the discovery of a successful algorithm. What you describe is pretty similar to the way evolution works.

Yes, this is the key difference. There are good algorithms and bad algorithms that exhibit identical results when predicting the past. Genetic algorithm with some radomness could be very useful.

I should have made it clear that in this idea the random simulations are performed by a pseudorandom number generator. The word "simulation" is poor word choice because simulation connotes some kind of modeling that's happening under the hood. This is not the case with pseudo random number generator.

To be more specific, when I did my proof of concept I just used rand(seed) function. I find it fascinating that even though the past results of rand(seed) function may be identical or even superior to useTheBestModelsOutThere() function, the results don't give the whole story. In fact the results confuse us about what algorithm is superior.
-- ixnaum, Apr 27 2017

random, halfbakery