I've got two websites: the main site and a blog that has posts that relate closely to the content of the main site. The blog is in the subdomain "blog."
wwww.example.com
blog.example.com
Content on the main site doesn't change that often, but content on the blog changes weekly. The blog is just a Wordpress site and capable of generating it's own sitemap.
I'm struggling to figure how to arrange the sitemaps.
I can think of two options, but I'm not sure what the best one is. Maybe there's a third option that I don't even know about.
Option 1
<sitemap>
<loc>http://www.example.com</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
<loc>http://www.example.com/about</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
<loc>http://blog.example.com</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
</sitemap>
In this scenario, I suppose the search engines would crawl to the blog subdomain, but I'm not sure how they'd find the sitemap there. I'd just trust that the search engines find the site map based my uploading them directly via webmaster tools.
`
Option 2
The main sitemap could list multiple sitemaps.
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>http://www.example.com/sitemaps/sitemap.xml</loc>
<lastmod>2006-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://blog.example.com/sitemap.xml</loc>
<lastmod>2006-10-01</lastmod>
</sitemap>
</sitemapindex>
What's the best practice here?
Neither of your options is allowed.
You need a separate Sitemap file for each host (e.g., subdomain). And you can’t link to these Sitemap files from the same Sitemap index file, i.e., each host also needs its own Sitemap index file (if you want to use one).
About how search engines can find the Sitemaps (index) files, see my answer to the question What should be the name of the sitemap file for Google SEO?
Source
From the Sitemaps protocol specification about file location:
From the Sitemaps protocol specification about index files: