Google TrustRank

by
Donna Warren

In March of 2005, Google registered the name “TrustRank” with the United Sates office of patents and Trademarks.

Around the same time, a paper titled, “Combating Web Spam with TrustRank” (www.vldb.org/conf/2004/RS15P3.PDF) was published by Zolt´an Gy¨ongyi and Hector Garcia-Molina, both from the Computer Science Department of Stanford along with Jan Pedersen from Yahoo, Inc. The paper explains the algorithm they designed and tested to combat “Web SPAM”.

Web Spam is defined as deliberately using tactics to mislead the search engines about the actual content of a web page. I’m sure we’ve all had the experience at least once of clicking on a search result only to find that the site has nothing to do with the search term we typed. These bad results are the result of “Black Hat SEO” tactics and are examples of Web Spam. Some of the spamming techniques commonly used are:
• Stuffing web page Meta tags with irrelevant keywords.
• Using hidden text with irrelevant or false keywords imbedded in the “white space” on the webpage.
• Creating a “honey pot” which is a website that gives away a useful item or contains good information but still employs “black Hat” SEO tactics invisible to the visitor.
• Creating multiple doorway pages to create bogus incoming links to a website to increase page rank.
• Creating a directory site where most if not all of the links point to a single or small group of websites.

Why All The Fuss?

Search engines compete on the relevancy of their search results. Google originally became the top search engine because its results were more relevant to the search term typed than any of the other search engines. The competition between the major search engines is fierce since their incomes depend on advertising revenue, which is determined by the number of users.

Google already employs human reviewers to check out high Page Rank websites for Web Spam. Unfortunately, it is entirely too expensive to manually monitor more than a small number of websites. The TrustRank algorithm would provide an automatic way to reasonably determine who the web spammers are.

TrustRank would be similar to the seals given by online agencies such as Verisign to online merchants. The seals testify to the fact that the business is a legitimate reputable business. The idea is to provide some measure of confidence among online shoppers. Their program has been very successful. TrustRank would provide a similar service to website owners when deciding to link to another website. It can provide a more relevant criterion than just a link popularity rating like page rank. Also, TrustRank will be more difficult to manipulate than page rank.

The basic concept behind TrustRank is quite simple. Create a seed group of around 200 trusted websites who meet the editorial criteria set by Google. If a trusted site links with another site, that site will be considered trustworthy because the probability of a seed site linking with a web-spamming site is very small. A high TrustRank can be a significant advantage for website owners that should discourage them from linking to other sites containing Web Spam or irrelevant content. TrustRank can also be used to flag high page rank sites with low TrustRank scores for manual editorial review.

One of the criteria for becoming a seed site is to only link with other reputable sites that meet the google criteria. The designers are using seed sites because they realize that it is very difficult for a computer to differentiate legitimate site activities from several forms of Web Spam. The seed sites are selected by humans to provide the computer with rock solid examples of both “good” and “bad” site practices.

The algorithm is based on complex set theory and provides a set of mathematical criteria that can determine the probability of a page being good or bad. The resulting information can then be used to order the search results presented to the user. The algorithm takes into account that an otherwise good website may have a few unintentional links to a website guilty of spamming.

In the experiments, the researchers used 31 million websites from Alta Vista. Using page rank as a primary criterion, they narrowed the list to 1,250 websites. Next, they selected 178 sites as “good seeds” and 135 websites as “bad seeds”. Finally, they ran the algorithm against 748 websites that contained the bad seeds. The test results were very close to the results obtained by having human editors evaluate the sites.

Summary

The TrustRank algorithm appears to be the next step in providing high quality relevant search engine results. I’ve heard some speculation that TrustRank will replace Page Rank, but I doubt it because this version of the algorithm actually uses page rank as part of it calculations because they believe that it is more important to accurately weed out spammers among the higher ranked sites than among the lower ranked sites.

It is quite possible that TrustRank will be the portion of the total ranking algorithm used by search engines that will ultimately make the final determination of a website’s SERP (search engine results page) listings. In fact, I see a future where selling links from sites with high TrustRank scores is just as profitable as selling links from high PageRank sites is today.

TrustRank is just the latest skirmish in the never-ending battle between any authority and the people determined to “get over” rather than play by the rules.

About the Author: Donna is a web designer, copywriter and is an adjunct in Information Technology at a local college.. She owns and operates DPW Enterprises Web Design & Copywriting Services and is a resident of the state of New Jersey.

Copyright © Donna Warren 2005, All Rights Reserved

[home] [category index page] [contact me]