Search Engine Spiders only can follow links from one page
to another and from one site to another. That is the primary reason why links to
your site inbound links are so important. Links to your website from other
websites will give the search engine spiders more food to chew on. The more
times they find links to your site, the more times they will stop by and visit.
Google especially relies on its' spiders to create their vast index of listings.
Nearly all search engines utilize spiders which are also known by their
original name, robots to go out and scour the web looking for web pages. These
search engine spiders then bring the data back to be indexed by the engine.
While nearly all search engines support these commands, there are still some
that don't. The ones in this article, however, are fairly universally understood
by search engine spiders, no matter from where they originate.
Since roughly 1996, individual meta commands have existed that can be used on
individual web pages to modify how these search engine spiders behave. The most
useful of these commands are fairly universal and respected by almost all search
engines. What follows is a list of some of the more popular spider commands and
instances in which you might want to use them.
<meta name="robots" content="index">
This meta command is one of the most common ones used – and it is also the least
necessary. It tells search engine spiders to come on in and put the page in
their index. However, all search engines do this by default anyway. Basically,
if you want to put it in there for fun, be my guest, but this command is not
giving you any special treatment. All search engines are going to index your
page, unless you specifically tell them otherwise.
<meta name="robots" content="follow">
The follow command is different from the index command. It basically requests
that the search engine spiders follow the links that are on a particular page.
Again, however, this piece of code is completely unnecessary because all search
engines are going to follow the links on a page, unless otherwise directed.
<meta name="robots" content="noindex">
The noindex command, the opposite of the index command, tells search engine
spiders not to index the content of a page. It's important to note however that
search engine spiders will still follow the links on a page that uses only this
command.
When not used for legitimate purposes, this tag can be dangerous because
it can put you at risk for penalization by most, if not all search engines. This
is because you can use a noindex tag to hide pages with multiple links that you
don't want visitors to see but that you do want all search engines to index.
There are however some legitimate uses for the noindex command. For example, if
you have a dynamic site and you've created static pages to replace some of your
dynamic pages, which can make them easier for search engine spiders to access,
you could put a noindex tag on the dynamic version.
<meta name="robots" content="nofollow">
This tag tells search engine spiders that it's OK to go ahead and index a page
and list it but that they shouldn't follow any of the links that are on the
page. This can be useful if, for example, you had some partners that requested a
link on your site that you felt obligated to give, but you wanted to hold onto
as much Page Rank as possible. Now this is of course between you and your
personal god, but you would be able to in effect have a partners page, add the
nofollow attribute to the meta tags, and basically not pass on any of your Page
Rank to any of the sites to which you are linking. The nofollow command in
effect tells all search engines that this is the end of the line.
<meta name="robots" content="noindex,nofollow">
Obviously, noindex and nofollow are powerful tags – and in combination, they can
make a page and the subsequent pages to which it links invisible to nearly all
search engines. This combination command tells search engine spiders, "Do not
read this page; do not follow any of the links on this page; do not include this
page in your index."
This command has its beneficial uses. For example, it can be placed on pages on
a site that have duplicate content for legitimate reasons. A website might have
both a page for the United States and a page for England that cover the same
product with exactly the same content. However, nearly all search engines would
see this as duplicate content and could devalue both pages. So placing this
command on one of them means that search engine spiders will walk on by and you
won't be penalized.
<meta name="robots" content="noarchive">
Finally, almost all search engines today, including Google and Yahoo, provide a
cached version of a site alongside its listing that provides a snapshot of what
the page used to look like. The noarchive tag, therefore, is available to be
used in circumstances where there is content on your website that is of a timely
nature and therefore that you might not necessarily want search engine spiders
to cache for people to have access to moving forward.
For example, a business might run a one-time special that has a
ridiculously low price to drum up some business while things are slow. The
business will want to be able to shut that sale down as soon as sales are back
up to a solid level. However, it is conceivable that someone could click on the
cached version of the business's site, see the old deal that was out there, and
insist on getting it for themselves. By using the noarchive tag, you are telling
search engine spiders, in effect, "This page is subject to frequent changes, and
I don't want my visitors to have access to some of this content at a later
time."