How Search Engine Spiders See Your Website Content

You need to know how people see your website to know how to improve it. Fortunately, putting yourself in their shoes isn’t all that hard. What is difficult is putting yourself in the position of a search engine spider to know how it assesses your search engine optimisation (SEO) efforts.

Emulating Natural Means

The foremost tidbit of information you need to know is that search engines — especially search giant and industry leader Google — endeavour to glean how relevant and informative your website is to a search query by emulating people and how they would “naturally” do it. For instance, real world documents are proven to be important if they are referenced tie and again by other documents and publications. These annotations are the mark of relevance, importance, and significance of a document within a given field. In the same vein, websites are proven to be relevant and important to its niche based on how many and which other websites link to it.

Arriving Via Links

Indeed, links have become so important as to become the favoured way Google “finds” a webpage. In the old days of search you had to submit a URL to Google for the website to be crawled by its search engine spiders and indexed accordingly. Today, the submission option still exists as an outdated function. If you do not submit a URL, the only way for Google to find it is via links that point to it.

Most leading search engines behave this way, effectively making links the information superhighway that search engine spiders use to go to and from websites, crawl them, and index them. Of course, once they arrive from a certain link to a website, the assessment of organic SEO begins. In fact, it already started when the search engine analysed the anchor text of the link that pointed to the website — but more on that later.

Don’t hesitate to share this Search Engine Spiders content, I appreciate it simply because it serves to grow the blog. Also, in case you are looking for added assistance for your own search engine marketing campaigns, then simply go to the SEO Edinburgh page and get in touch.

One of the first things search engine spiders (or search engine algorithms, because, technically, spiders have one sole purpose: crawling) analyse is the URL of the webpage it lands on. URLs that have targeted keywords within them have more weight, and URLs that use human readable slugs plus a proper structure are also preferable, e.g.:

http://www.website.com/blog/category-name/page-1/properly-parsed-date/title-that-can-be-read-by-humans

is better than

http://www.website.com/blog/xyshhyt/00001/558-990/jyx-oik888-iro-post01

But again, search engine spiders crawl — it’s the algorithms that perform analyses. But for the sake of our analogy of seeing things the way search engine spiders do, we’ll keep mentioning them by default.

Crawling Strings

Search engine spiders can only crawl strings or text. Everything else on the webpage, they are effectively oblivious to. Of course, in the same vein, website cache versions are string-only. Now, here’s how spiders see the text in a webpage:

  • Text in Heading Tags: header tags in html code range from heading 1 to heading 6, represented by the HTML codes … to … . The text within these codes replacing the ellipsis is rendered larger and more prominent on the webpage. Heading tags are probably the simplest way to point out to both human readers and search engine spiders that some texts are more important than the rest.
  • Text as Anchors: anchors are the texts used to hyperlink to a different webpage. Any texts within the HTML codes are considered important by search engine spiders because they reference or annotate another specific webpage, meaning that the website content is related to that webpage in some fashion. As mentioned earlier, even before search engine spiders land on a webpage via external links, they already took note of the link. What they take note of is the anchor text and what it actually says (if it’s a keyword or not). This means that anchor text hint on what your website content is about and what the webpage you are linking to is about — making it a good gauge to figure out what your content is talking about.
  • Formatted Text: bold and italicised texts in the real world are used to emphasise some words for the reader to pick out, assisting them in identifying context or skimming important parts of an article. As search engine spiders emulate natural means in an effort to make organic SEO work better, they also take note of these formatted texts, though not as much as header tags.

Weighing and Counting

You already know some of the important parts of texts that search engine spiders pay due attention to: headings, anchors, and formatted texts. Of course, spiders crawl the rest of the website content too. As they do, they assign greater significance to important parts, all the while counting occurrences of words — the more times a word is mentioned, the higher the probability that that word is the topic of the article. If a word that appears multiple times throughout an article is also found to be in heading tags, in anchors, and in formatted forms, then that’s a huge red light that says that word is pretty much a keyword.

This is where the basics of keyword optimization comes from; it is also the basis of keyword density and placement. However, to counteract “keyword stuffing,” search engines count occurrences of a word (or an important form of the word — either in headings, anchors, or formatted text) only to a certain point. Beyond that, more occurrences are ignored, and sometimes too much can cause reprisals in the form of search engine penalties.

We just discussed the fundamentals of organic SEO for your webpages, but take note that spiders crawl ALL texts within a webpage, including:

  • Textual ads and text in ads
  • Alt descriptions, which is one way to make an image search engine friendly, and
  • File names, which is yet another way to make images — or anything else un-crawlable — more search engine friendly

This basically means you also need to keep an eye out on what you place on ads, alt tags, and file names, and leverage them for better organic SEO performance. This is the gist of how search engine spiders crawl your website content. Make good use of it as a basis for your SEO efforts.