Google’s internal documentation with over 14K ranking features has been leaked to the public.

While Google’s lawyers are (most likely) busy cleaning up this mess, everyone involved in SEO is rushing to study the info inside.

And boy, oh boy, there’s a lot of great stuff to unpack!

Here’s the dealio

A few weeks ago, an anonymous source reached out to Rand Fishkin—Moz co-founder and creator of the Domain Authority metric, who has been out of SEO for six years and is now running SparkToro, yet is still very influential. The source claimed to have access to internal search documents and was motivated by frustration with Google’s dishonesty and the desire to expose the truth.

So, last Friday on May 24, Rand jumped on a video call with the anonymous source. And once it was verified that the leaker was indeed an insider, Rand was shown the aforementioned dataset.

Later on, Rand contacted some of the former Google employees he knew, showed them the docs, and got confirmation that the leaked data had all the necessary artifacts and did look authentic.

What’s inside?

You’ll find thousands of documents detailing the data Google collects and processes on websites. On top of that, there are also descriptions of various system functions, explanatory diagrams and charts.

This gem covers multiple search-related areas, including index organization, content evaluation, and ranking algorithms.

Note that there, unfortunately, wasn’t any indication of the importance of each parameter with regard to the algorithm. Moreover, some of the parameters are labeled as deprecated. However, their mere presence tends to say a lot.

The last significant data leak of this magnitude and scale involved Yandex, when their source code was leaked. Although some information on Google surfaced during last year’s court proceedings, they pale in comparison to this huge data leak.

What’s even more shocking than the list of parameters itself is how much of it actually contradicts with Google’s official statement.

So, what did Google keep under wraps?

  • The search giant does not use Domain Authority. As a matter of fact, the leaked doc includes the “siteAuthority” parameter that seems to influence site rankings.
  • There’s no Google Sandbox for new websites.The document states:

In the PerDocData module, the documentation indicates an attribute called hostAge that is used specifically “to sandbox fresh spam in serving time.”


  • User data from Chrome isn’t used for search-related purposes.

According to the docs, it definitely is! For example, at least to generate the “Sitelinks” SERP feature.

But there’s mooooore!

Read up on the importance of NavBoost, PageRank, authors, links, and criteria that lower a site’s trustworthiness.

Furthermore, explore how Panda works, the use of embeddings to assess content topics, how small sites are indeed neglected compared to big brands.

Check out the info on special whitelists for COVID, tourism and politics. For example, during elections, Google uses whitelists to promote or demote certain sites to supposedly prevent the spread of misinformation.

And this is just what Rand and Mike King managed to analyze over the weekend. I bet there’s enough data here to keep us busy all summer — and then some!

Let’s see what happens next 🤓

UPD: Erfan Azimi turned out to be the anonymous leaker. He published a video confession.

submitted by /u/SE_Ranking
[link] [comments]