Page Rank is one of the methods Google uses to determine a
page's relevance or importance. It is only one part of the story when it comes
to the Google listing, but the other aspects are discussed elsewhere and
Page Rank is interesting enough to deserve a paper of its own. The Page Rank
theory holds that even an imaginary surfer who is randomly clicking on links
will eventually stop clicking.
Look at forums, blogs, and Google's your own guidelines for increasing
the number of pages Google indexes, and came up with our best guesses. The
running consensus is that a webmaster shouldn't expect to get all of their pages
crawled and indexed, but there are ways to increase the number.
PageRank
It depends a lot on PageRank. The higher your PageRank the more pages that will
be indexed. PageRank isn't a blanket number for all your pages. Each page has
its own PageRank. A high PageRank gives the Googlebot more of a reason to
return.
Links
Give the Googlebot something to follow. Links especially deep links from
a high PageRank site are golden as the trust is already established.
Internal links can help, too. Link to important pages from your homepage. On
content pages link to relevant content on other pages.
Sitemap
A lot of buzz around this one. Some report that a clear, well-structured
Sitemap helped get all of their pages indexed.
page load time and the ease with which the Googlebot can crawl a page may affect
how many pages are indexed. The logic is that the faster the Googlebot can
crawl, the greater number of pages that can be indexed.
This could involve simplifying the structures and/or navigation of the site. The
spiders have difficulty with Flash and Ajax. A text version should be added in
those instances.
Google's crawl caching proxy at any website or blog. This was part of the Big
Daddy update to make the engine faster. Any one of three indexes may crawl a
site and send the information to a remote server, which is accessed by the
remaining indexes like the blog index or the AdSense index instead of the
bots for those indexes physically visiting your site. They will all use the
mirror instead.
So the crawl caching proxy work like this: if service X fetches a page, and then
later service Y would have fetched the exact same page, Google will sometimes
use the page from the caching proxy. Joining service X AdSense, blogsearch, News
crawl, any Google service that uses a bot doesn’t queue up pages to be include
in our main web index. Also, note that robots.txt rules still apply to each
crawl service appropriately. If service X was allowed to fetch a page, but a
robots.txt file prevents service Y from fetching the page, service Y wouldn’t
get the page from the caching proxy. Finally, note that the crawl caching proxy
is not the same thing as the cached page that you see when clicking on the
“Cached” link by web results. Those cached pages are only updated when a new
page is added to our index. It’s more accurate to think of the crawl caching
proxy as a system that sits outside of webcrawl, and which can sometimes return
pages without putting extra load on external sites.
Verify
Verify the site with Google using the Webmaster tools. There are many page rank
sites that will check this.
Content, content, content
Make sure content is original. If a verbatim copy of another page, the
Googlebot may skip it. Update frequently. This will keep the content fresh.
Pages with an older timestamp might be viewed as static, outdated, or already
indexed.
Staggered launch
Launching a huge number of pages at once could send off spam signals. In
one forum, it is suggested that a webmaster launch a maximum of 5,000 pages per
week. This could be a costly mistake.
Find the top queries that lead to your site and remember that anchor text helps
in links. Use Google's tools to see which of your pages are indexed, and if
there are violations of some kind. Specify your preferred domain so Google knows
what to index.