Here's my dilemma, I need to parse a series of 606 weeks worth of Billboard 200 charts for the position of 36 different albums. Here's what I've got so far...
https://docs.google.com/file/d/0B_tgNfDq0kXAakR5eHZ3bzJQVkk/edit?usp=sharing
Billboard just redid their website, so now Excel's webquery returns a very pretty and clean table. I created two formulas in columns A and B of my worksheet, A has the list of relevant dates (specifically every Saturday from 8/18/2001 until this week), and B makes hyperlinks to the charts based on the dates. The rest of the chart is color-coded for the benefit of my advisors, who will also be reviewing the sheet.
I've also manually webqueried the first chart, dated 2001-08-18, into its own worksheet. This worksheet hasn't been touched - its exactly what the webquery returned.
As you can see, the table spans to column G, and each entry takes up 3 rows. My focus is on columns C and D. Column D's rows for each entry are (top-to-bottom) Title, Artist, and Imprint|Label. The first row in Column C for each entry contains that week's position from 1 to 200. The pattern that emerges is that every 3rd row starting with 4 (so 7, 10, 13...) contains a position and album title (col C & D, resp.), and every 3rd cell starting with 5 contains an artist.
I'm going to try to explain what I'm imagining in plain English, but be forewarned that this may fail miserably.
So the macro would take two cells as input (can they even take input?), album title and corresponding artist. This input should tell the macro both the string stored in each cell and the location - e.g, E1, C2 - of said cell. The macro should loop through each URL in column A from row 3 to 608, querying the URL into a new sheet. This new sheet should be made active, and then every 3rd row starting with 4 should be searched sequentially for the album title string. Upon finding a match, the cell one row beneath the matching query cell should be compared to the artist name string. Should both strings match the content of their corresponding query cells, the number (from 1 to 200) in column C and the same row as the matching album title cell should be copied to the cell in the 'bb200' sheet corresponding to the URL queried and the album title searched. The loop should now recur on the next URL in the sequence. In the event no match is found (the album didn't chart that week, or BB returned a wonky table), the corresponding cell should be left blank. The macro should exit once the list of URLs is exhausted.
My problem is twofold: first off, is my thought process regarding the macro fundamentally sound? And second-but-most-importantly, I haven't the faintest clue where to even start writing this in VBA. I have studied Java, C, and most recently C++ (OpenGL, specifically). I'm totally unfamiliar with the VBA syntax and API, and frankly, my time on this is far too short to sit down and formally learn the language. After this, I plan on doing so in short order, but this assignment is due Monday and I had no idea how massive an undertaking it was going to end up being.
FOR THE RECORD, the macro is not the assignment, but the data to be collected is integral to completing it. To those curious, the assignment is to produce a complete rough draft of my senior thesis by Monday. This data will be used to create several graphs that my advisors have instructed me to include with my writing. The paper itself is already written based on simply reading each album's sales performance off the site.
You'd be helping me go above-and-beyond, as most of the other graduating seniors are turning in some seriously half-baked graphical representation. The only other student that went this far invented an instrument and provided schematics and autoCAD drawings. However, on that I digress.
Thanks in advance for the help!!
I think this should get you about 90%. The only thing this won't do is the web query.
For that part, I propose that you use the macro recorder to do a web query, and then post that code in a revision and we'll add it in and tailor it to your needs. You've gotta do some work on this :)
Good luck!
Edit this also assumes that there is only going to be one match in each web query. In the event there is more than one, it would only return the last match. That seems like a safe assumption given the nature of the data, but if that's not the case, let me know and I can tweak it.