Webmaster

Google for Webmasters Tutorial: Crawling and Indexing

Google+ Pinterest LinkedIn Tumblr

Now that you know how to have Google find your site and how to block Googlebot from specific pages, the next step is making the appropriate content on your site accessible to users and Google. Accessible, in this context, means that both Googlebot and users — including those using screen readers or mobile devices — can navigate from page to page and, within reason, enjoy the core content throughout your site. It's important to make your site accessible… to ensure a good experience for your users and also to help Google understand and list more of your pages. In striving to make your pages accessible, it's helpful to understand what Googlebot can and cannot most effectively tackle. HTML files and other document types comprised mostly of text are pretty straightforward for Googlebot. Music, images, and movies are harder for Googlebot to understand. So, too, are dynamic pages — those pages with frequently changing or on-the-fly-generated content — potentially problematic.

You can see your site almost as Googlebot does by viewing your site in a text browser, like Lynx, or in a different browser with images, JavaScript, and Flash turned off. As previously noted, images can be tough for Google to index. There are some things you can do, however, to help us better understand the images on your site. Annotate your image in alt text, as shown above, and optionally in plain visible text near your image. Your visible comment before or after the image can be whatever you like, but it's best to stick with a concise version for the alt-text; no need, for instance, to include the word "image" or "photo," since Googlebot already sees the image tag.

Using descriptive file names can be helpful to Google, and also for your users who may download your images. “googlebot.jpg,” for instance, instead of “photo.jpg.” By annotating your images in these ways, you're not only helping sight-impaired users who may be accessing your site with a screen reader, but you're also giving Google a better understanding of the images and improving the chances of your images showing up for relevant queries in Google Image Search. Along with images, many web designers like to integrate rich-media or interactive aspects into their site, often using technologies like Flash or AJAX. While these can provide an engaging experience for users, Googlebot may have trouble discovering or following links on these sites. For example, textual content is sometimes stored in Flash as images, making it difficult for Google to capture the words, much less understand the meaning of the pages.

With careful planning, however, sites can include dynamic and media-rich elements while still remaining reasonably accessible to users and Googlebot. Consider structuring your site so that these elements are "extras," with your site's core information and navigation rendered in plain text for Googlebot and all users without Flash. This is otherwise known as "graceful degradation." For additional useful suggestions, check out the two blog entries listed on this page. After you've ensured that your site is both findable and accessible, don't let your great content languish with uninspired introductions. Think of the titles and descriptions on your pages together as an advertising billboard: You have just a few words to let people know what each page is about and convince them that it's worth a visit. The title tag of your page is likely to be displayed anytime Google shows your page in its search results, and it's also what people will typically see in various places in their web browser and even on social sharing sites on the web. Therefore, it's important to have a concise, descriptive title for each page on your site.

Google may draw from several different sources for the descriptive snippets in search listings, including meta descriptions, so you'll also want to make sure your meta descriptions are thoughtfully drafted for each page on your site. Note that you can use Google’s Webmaster Tools’ “Content Analysis” feature to help you optimize your page titles and descriptions. It's great having your pages in Google, but what happens when you find copies of your pages, either indexed from your site or — with or without your permission — on other sites? This is known as duplicate content, and we know that most of the time it's unintentional. Your editorial, for example, ends up getting indexed on one of your site's topics pages, then on your monthly archives page…

And perhaps then even on a syndicated partner's page. In cases like this, there are steps you can take to help Google determine which is the best copy to show in search results. With duplicate content on your own site, your best bet is to minimize the duplication in the first place. Use 301 redirects to forward visitors to a preferred page, consistently link to this preferred version, and list it in place of other versions in your XML Sitemap. If you're syndicating your content, you may wish to ask your partners to include a link on each of their pages back to the original source on your domain. And lastly, if you find someone copying your site and you want it removed from Google’s search results, you can file a Digital Millennium Copyright Act notice, otherwise known as a “DMCA” takedown request. For additional tips, check out the Webmaster Central blog post referenced here.