Four years ago, just weeks before the first Penguin update, the MozCast project started collecting its first real data. Detecting and interpreting Google algorithm updates has been both a far more difficult and far more rewarding challenge than I ever expected, and I’ve learned a lot along the way, but there’s one nagging question that I’ve never been able to answer with any satisfaction. Can we use past Google data to predict future updates?

Before any analysis, I’ve always been a fan of using my eyes. What does Google algorithm “weather” look like over a long time-period? Here’s a full year of MozCast temperatures:

Most of us know by now that Google isn’t a quiet machine that hums along until the occasional named update happens a few times a year. The algorithm is changing constantly and, even if it wasn’t, the web is changing constantly around it. Finding the signal in the noise is hard enough, but what does any peak or valley in this graph tell you about when the next peak might arrive? Very little, at first glance.

It’s worse than that, though

Even before we dive into the data, there’s a fundamental problem with trying to predict future algorithm updates. To understand it, let’s look at a different problem — predicting real-world weather. Predicting the weather in the real world is incredibly difficult and takes a massive amount of data to do well, but we know that that weather follows a set of natural laws. Ultimately, no matter how complex the problem is, there is a chain of causality between today’s weather and tomorrow’s and a pattern in the chaos.

The Google algorithm is built by people, driven by human motivations and politics, and is only constrained by the rules of what’s technologically possible. Granted, Google won’t replace the entire SERP with a picture of a cheese sandwich tomo…

