Pages

Friday, October 17, 2014

Google Recommends Using Both XML Sitemaps And RSS/Atom Feeds For Optimal Crawling

Google published a set of best practices for XML sitemaps and RSS/Atom feeds on its Webmaster Central Blog this week, explaining which fields in sitemaps are important, when to use XML sitemaps and RSS/Atom feeds, and how to optimize them for Google.
Google explains the differences between the formats by saying that XML sitemaps describe a whole set of URLs within a site, while RSS/Atom feeds only describe the most recent changes. As a result of XML sitemaps containing more information, they are typically larger than RSS/Atom feeds. XML sitemaps are also downloaded less frequently.
As a website owner, which format should you use if you want your website to be crawled to the best of Google’s ability. Google recommends using both, explaining how the formats compliment each other.
XML sitemaps give Google information about all the pages on your site, while RSS/Atom feeds let Google know what has been most recently updated on your site. Google also adds that “submitting sitemaps or feeds does not guarantee the indexing of those URLs.”

sitemap

Here is a condensed version of best practices Google laid out in their blog article:

  • The two most important pieces of information for Google are the URL itself and its last modification time.
  • Only include URLs that can be fetched by Googlebot (ie, don’t include URLs blocked by robots.txt).
  • Only include canonical URLs.
  • Specify a last modification time for each URL in an XML sitemap and RSS/Atom feed
  • For a single XML sitemap, update it at least once a day and ping Google each time.
  • For a set of XML sitemaps, maximize the number of URLs in each XML sitemap. The limit is 50,000 URLs or a maximum size of 10MB uncompressed. Ping Google when each XML sitemap is updated.
  • When a new page is added or an existing page meaningfully changed, add the URL and the modification time to the RSS/Atom feed.
  • In order for Google to not miss updates, the RSS/Atom feed should have all updates in it since at least the last time Google downloaded it.