THE VITAL IMPORTANCE OF ANCHOR LINK TEXT TO GOOGLE SUCCESS
Anchor link text is extremely important for SEO and web development purposes. This is a very appropriate time to introduce Sergey Brins and Lawrence Pages pioneering search engine work to you. If you believe you really want to make a difference to search engine success the following paper presented by these two young men at the 7th world wide web conference in 1998 is a must read. Today these men are billionaires.
As I write this section in July 2005 the share price of Google has just reached $300. Google is now the worlds biggest media company after only 7 years. Bigger even than Time-Warner.
At the time of the initial Google research publication being presented Sergey Brin and Lawrence Page were PhD students at Stanford University. They had built a prototype search engine based upon some 24 million web pages which they called Google (after a mathematical entity called a googol which represented the number 1 followed by 100 zeros.).
Brin and Page were acutely aware that the major search engines at that time did not always return quality results and that the commercial world manipulated the SERPs to suit advertisers.
The two young men based their new search engine upon what they called citations which had been the essence of academic success through the years. A citation was essentially a reference to a published academic paper. The more citations that an academic collected the more valuable was that work judged.
The equivalent on the web was the collection of backward links argued Brin and Page. As such a very important part of the algorithm developed was based upon backward links and later the text associated with each backward link.
The immediate problem they faced was that for every web page there was about 7 backward links thereby creating the need for more and more computer resources. Within the Google engine they developed means for parsing and storing all the words from every web page and for storing the complete HTML of every page. They were able to separate words into these groups:
- Words in the URL
- Words in the anchor link text on page and also from other web pages
- Words in the Title
- Words in bold
- Words capitalized
- Words in the main body text… Amongst others
In addition they were able to record how close each word was to every other word (they called this proximity) and also to relate each link to the source web page and the destination web page this was the information they used to calculate PageRank which we will leave to later.
They created word lists which they called Hits. They divided these Hits into fancy Hits and plain Hits.
Fancy hits were search words contained in TITLES and LINKS. These Hits were kept in what were called short barrels. All other Hits were kept in long barrels. These are the actual terms used by Brin and Page in their groundbreaking paper referred to.
What all this meant in practice was that in response to a search query the short barrels were searched first. If sufficient results were returned to match the query limit set by Google the long barrels were not even looked at since it was deemed that the more relevant web pages had already been found (by searching for TITLE and LINK words only). Now here lies the reason to making sure your keyword is in the TITLE and in any LINK.
The following description is a very short paraphrase of the full paper of course. Sergey Brin and Lawrence Page published the following table of how they carried out searches based upon a searcher typing in a query. In response to the query search Google would:
1. Parse the query.
2. Convert words into wordIDs.
3. Seek to the start of the doclist in the short barrel for every word.
4. Scan through the doclists until there is a document that matches all the search terms.
5. Compute the rank of that document for the query.
6. If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step 4.
7. If we are not at the end of any doclist go to step 4.
8. Sort the documents that have matched by rank and return the top k.
Note from the table that when all the query searching has been done only the top k (top 1,000) SERPs are returned. Note also that each return is ranked (item 5 above) for that query.
There is not much doubt in my mind that Google takes the following parameters into account for their ranking algorithm and there is little doubt a limit is placed upon on-page factors like these (the paper clearly states this to be the case just as clearly as it stated that the terms below were all considered)
Every Hit list contained the following information:
- Every word
- Plain text large font
- Plain text small font
- Plain text bold font
- Capitalization information
- Word proximity or position in document
- It differentiated between same word in Title, anchor text or URL… And others
Google allocated a weight to reach type of Hit and counted the frequency of each Hit of each type. My table below is a bit of pure conjecture but serves to highlight what I believe I understand from reading the research paper. Take note that weights are numbers I have allocated purely for illustration purposes.
Assume the search query word is garden.
Google counts every Hit of the query word garden in the following (and more of course) parameters.
Dotprod is the product of weight x frequency and is used only because the Brin, Page paper refers to the term as dotprod
If my assumptions are right and using these purely fictitious numbers once a total of 388 is reached it would not matter how many more instances of the Hit were recorded. It would not influence the on-page score because the MAXIMUM set by Googles algorithm had been reached at 388. This is why the point was made earlier that only so much optimization can be achieved by reference to on-page factors alone.
Without going into detail about PageRank but for the sake of completion the above total of 255 scoring points is modified by multiplying the 255 total by the actual PageRank (scoring points are my terms) to get the final rank mark as used by Google in arranging its SERPs.
For the sake of clarity lets assume that this web page had a PageRank of 1,000 then the total score for ranking purposes would be 255,000.
Lets assume that a competitive web page scored 139 points for similar on-page factors but had a PageRank of 2,000 then this latter web page would be ranked higher than the first one since it would have a total rank of 2,000 x 139 = 278,000.
If PageRank of another competing web page was 10,000 then the single presence of the word garden in TITLE and nowhere else would mean the total score would be 1,000,000 (100 x 10,000) and would rank above both the other examples.
If the PageRank of a competing page was close to 0 (in fact lets assume Zero although zero PageRank is actually impossible) then no matter if on-page score was 388 the total score would be Zero this is how Google can punish spammers.