Benford’s Law and Google

Posted on September 24, 2009
Filed Under Growing Links | Leave a Comment

How would Google figure out if the backlink growth rate of a website is NORMAL? Since websites get link spikes in different amounts for all different kinds of reasons including press releases and social media, how could anybody determine what is natural and what is not? Its like a link infusion from medical syringe pumps. Whats going on in those internet tubes?

There is a simple computation that Google may use. It is called Benford’s Law.

Also called the first digit law, the law states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. (from Wikipedia)

When Benford discovered this, he found that about 31% of the numbers had 1 as the first digit, 19% had 2, and so on until around 5% had 9. You can test the law in all sorts of situations, and many professionals do, especially for fraud detection purposes.

For starters, lets take a look at what Benford’s law would look like on graph.



Benford's Law



This above graph shows the Benford Law proportional curve for each of the possible first digits 1-9 that would appear in a list of numbers.

To illustrate how the law can apply to link pattern detection, I took the top 60 Google search results for three keyphrases and counted the number of occurences of the digits 1-9 that appeared in the first place digit of the total backlink number using Yahoo’s Site Explorer.

The three search phrases were “Yacht Charters”, “Real Estate Agencies”, and “Wikipedia.”
The first search result for “Yacht Charters” returned a website that had 6380 backlinks. Therefore, the first digit “6″ then counted as one occurence of the number 6. Recording the first digit from the total backlink number for the next 60 resulted in the following graph.

Yacht Charters

The “Yacht Charters” search category revealed anomalies at numbers 3 and 4.

The search phrase “Real Estate Agencies” came closer to the Benford distribution, but revealed an anomaly at the end.

Real Estate Agencies
Perhaps the number nine anomaly is a tell-tale red flag in need of further investigation?

The closest search category that came inline to the Benford proportions was “Wikipedia.”

Wiki

The above examples demonstrate a way that Google could check different search categories for abnormal backlink distributions. Also I must admit, I was a little surprised that the numbers coming from Yahoo Site Explorer matched the curve pretty closely despite knowing that the results were based on mathematical law.

Okay, so what does this have to do with link spikes? Well, your website could take on a HUGE web spike all at once or it could take on a series of web spikes and it doesn’t matter. It does matter if Benford’s Law reveals an unnatural link build. Even if the number of links built over time initially appear as a random or evenly ordered, the law still works.

Let me demonstrate this in the case of an individual website.

In a hypothetical set amount of backlinks added to a website each week:

January
12 backlinks added - Week 1
27 backlinks added - Week 2
86 backlinks added - Week 3
97 backlinks added - Week 4
February
72 backlinks added - Week 1
81 backlinks added - Week 2
97 backlinks added - Week 3
93 backlinks added - Week 4
March
89 backlinks added - Week 1
87 backlinks added - Week 2
1600 LINK SPIKE ! - Week 3
79 backlinks added - Week 4
April
92 backlinks added - Week 1
96 backlinks added - Week 2
91 backlinks added - Week 3
84 backlinks added - Week 4
May
908 LINK SPIKE ! - Week 1
93 backlinks added - Week 2
88 backlinks added - Week 3
94 backlinks added - Week 4
June
96 backlinks added - Week 1
2072 LINK SPIKE ! - Week 2
88 backlinks added - Week 3
71 backlinks added - Week 4

In this above example, although links were added weekly in what appears to be a balanced natural steady build of backlinks with a few link spikes thrown in, the digit patterns of the backlink amounts added each week are almost opposite to those of Benford’s law! Over 90% have 7, 8, or 9 as a first digit. Even if backlinks are spaced out and an attempt made to make the backlink number seem balanced (or even random), the backlinks fall out of a natural pattern.

Using Benford’s law, Google could check for anomalies in search categories. From that point, Google could run checks against backlinks added over time for individual websites using the law again. Even if you had a few random link spikes or link spikes quite often over a set period, Google would still be looking at the overall hidden pattern. Benford’s law would reveal any anomaly and indicate whether or not something is strange. Link spikes in themselves or as random events over time reveal nothing.

keep looking »