Hello here is a blog post I created outlining the process:
http://jasonthomascarter.blogspot.com/2013/08/the-friendly-developers-guide-to.html
Here we go! First we will start with Robots.txt file for the Windows Store website. http://apps.microsoft.com/robots.txt
Websites use robots.txt to guide web crawlers on how to behave, what they want them to see, and what they don't want them to see.
http://www.robotstxt.org/
Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
In this case the Sitemaps.xml index file is what we are interested in. http://apps.microsoft.com/windows/sitemap_index.xml
http://www.sitemaps.org/
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.
The Sitemap Index file lists out each of the individual Sitemap XML files. As of this writing, the apps.microsoft.com website has 141 individual Sitemap XML files.
Inside the individual files are thousands of URL's to the web pages of Windows Store apps. Such as this URL to the PlayTo Receiver app:
http://apps.microsoft.com/windows/en-us/app/playto-receiver/72a6ba17-2d4e-4a1c-bcfb-cdc5d4b32d0e
These webpages for the apps do include a good bit of information that we could scrape from the HTML but we can do better.... The key information we are getting here is the App ID's and which geographies those App ID's are published to in the store.
For example:
72a6ba17-2d4e-4a1c-bcfb-cdc5d4b32d0e
and en-us etc...
And now to the good stuff, there a few web services we can call using our newly found App Id's and geography information.
https://services.apps.microsoft.com/browse/6.2.9200-1/615/en-US_en-US/c/US/cp/10005001/Apps/72a6ba17-2d4e-4a1c-bcfb-cdc5d4b32d0e
There is plenty of information available through this web service but it's not clearly spelled out by the element names. Here is my interpretation of many (not all) of them to start you off.
sSubCategoryName = rawXML.getElementsByTagName('Sc')[0].getElementsByTagName('N')[0].childNodes[0].nodeValue;
sSubCategoryId = rawXML.getElementsByTagName('Sc')[0].getElementsByTagName('I')[0].childNodes[0].nodeValue;
sHasTrial = rawXML.getElementsByTagName('Try')[0].childNodes[0].nodeValue;
sDescription = rawXML.getElementsByTagName('D')[0].childNodes[0].nodeValue;
sDeveloper = rawXML.getElementsByTagName('Dev')[0].childNodes[0].nodeValue;
sWebsite = rawXML.getElementsByTagName('Ws')[0].childNodes[0].nodeValue;
sSupport = rawXML.getElementsByTagName('Sws')[0].childNodes[0].nodeValue;
sPrivacy = rawXML.getElementsByTagName('Pu')[0].childNodes[0].nodeValue;
sCategoryName = rawXML.getElementsByTagName('C')[0].getElementsByTagName('N')[0].childNodes[0].nodeValue;
sCategoryId = rawXML.getElementsByTagName('C')[0].getElementsByTagName('I')[0].childNodes[0].nodeValue;
sPrice = rawXML.getElementsByTagName('P')[0].childNodes[0].nodeValue;
sForegroundColor = rawXML.getElementsByTagName('Fg')[0].childNodes[0].nodeValue;
sBackgroundColor = rawXML.getElementsByTagName('Bg')[0].childNodes[0].nodeValue;
sAppIcon = rawXML.getElementsByTagName('Ico')[0].childNodes[0].nodeValue;
sAppName = rawXML.getElementsByTagName('T')[0].childNodes[0].nodeValue;
sPackageFamilyName = rawXML.getElementsByTagName('Pfn')[0].childNodes[0].nodeValue;
sResourceId = rawXML.getElementsByTagName('R')[0].childNodes[0].nodeValue;
sId = rawXML.getElementsByTagName('I')[0].childNodes[0].nodeValue;
sCapabilities = sCapabilities + arrCapabilities[k].childNodes[0].nodeValue+",";
sUpdate = rawXML.getElementsByTagName('Ud')[0].childNodes[0].nodeValue;
sFeatures1 = rawXML.getElementsByTagName('Dbp')[0].childNodes[0].nodeValue;
sFeatures2 = rawXML.getElementsByTagName('Dbp')[1].childNodes[0].nodeValue;
sFeatures3 = rawXML.getElementsByTagName('Dbp')[2].childNodes[0].nodeValue;
sFeatures4 = rawXML.getElementsByTagName('Dbp')[3].childNodes[0].nodeValue;
sFeatures5 = rawXML.getElementsByTagName('Dbp')[4].childNodes[0].nodeValue;
sFeatures6 = rawXML.getElementsByTagName('Dbp')[5].childNodes[0].nodeValue;
sFeatures7 = rawXML.getElementsByTagName('Dbp')[6].childNodes[0].nodeValue;
sFeatures8 = rawXML.getElementsByTagName('Dbp')[7].childNodes[0].nodeValue;
sFeatures9 = rawXML.getElementsByTagName('Dbp')[8].childNodes[0].nodeValue;
sScreenshot1 = rawXML.getElementsByTagName('Ss')[0].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot2 = rawXML.getElementsByTagName('Ss')[1].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot3 = rawXML.getElementsByTagName('Ss')[2].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot4 = rawXML.getElementsByTagName('Ss')[3].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot5 = rawXML.getElementsByTagName('Ss')[4].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot6 = rawXML.getElementsByTagName('Ss')[5].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot7 = rawXML.getElementsByTagName('Ss')[6].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot8 = rawXML.getElementsByTagName('Ss')[7].getElementsByTagName('U')[0].childNodes[0].nodeValue
sScreenshot9 = rawXML.getElementsByTagName('Ss')[8].getElementsByTagName('U')[0].childNodes[0].nodeValue
sCaption1 = rawXML.getElementsByTagName('Ss')[0].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption2 = rawXML.getElementsByTagName('Ss')[1].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption3 = rawXML.getElementsByTagName('Ss')[2].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption4 = rawXML.getElementsByTagName('Ss')[3].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption5 = rawXML.getElementsByTagName('Ss')[4].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption6 = rawXML.getElementsByTagName('Ss')[5].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption7 = rawXML.getElementsByTagName('Ss')[6].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
sCaption8 = rawXML.getElementsByTagName('Ss')[7].getElementsByTagName('Cap')[0].childNodes[0].nodeValue
There is more to the Windows Store than just apps, there are also reviews, and lots of them. There are currently over 210,000 reviews for apps in the Windows Store. These reviews come from all over the world, so you'll need some country codes, your handy dandy App Id's and pay attention to the pn/1 at the end, you will find that this service only returns 10 reviews per page, just increment this value to pn/2, pn/3 etc.. until you run out of reviews.
var arrCounntryCodes = ["AE", "AR", "AT", "AU", "BE", "BG", "BH", "CA", "CH", "CL", "CN", "CO", "CR", "CY", "CZ", "DE", "DK", "DZ", "EE", "EG", "ES", "FI", "FR", "GB", "GR", "HK", "HR", "HU", "ID", "IE", "IL", "IN", "IQ", "IT", "JO", "JP", "KW", "KZ", "LB", "LK", "LT", "LU", "LV", "LY", "MA", "MT", "MX", "MY", "NL", "NO", "NZ", "OM", "PE", "PH", "PK", "PL", "QA", "RO", "RS", "RU", "SA", "SE", "SG", "SI", "SK", "TH", "TN", "TR", "TT", "UA", "US", "UY", "VE", "VN"];
https://services.apps.microsoft.com/4R/6.2.9200-1/1/en-US/m/US/Apps/f514d64b-8705-43b7-a400-c4f4f3dedfc0/Reviews/all/s/date/1/pn/1
This one is much more descriptive with the element names, so I don't see any need for further explaination of this. You can see the full name, display name, the image the user has chosen to represent themselves with.
Next up we can do a little bit of seaching...
https://services.apps.microsoft.com/search/6.2.9200-1/615/en-US_en-US/m/US/c/US/il/en-US/cp/10005001/query/cid/0/pf/1/pc/0/pt/x64/af/0/lf/0/s/0/2/pn/0?phrase=Software Developer
Here it gets a little cryptic again, but by now you should be used to it. You can get the App ID from the I element and take it from there back to the browse service f514d64b-8705-43b7-a400-c4f4f3dedfc0
So there you have it, the basics of pulling tons of information out of the Windows Store that you can then do what you please with. If you find this useful and/or make some apps utilizing the information, have some additional information to share or otherwise please let me know in the comments