I have a data warehousing problem, needing to query over a large dataset. For the sake of this example lets say a typical state would have 30 million users with activity stats for each. Ideally I could buy a data warehousing tool (Vertica, Infobright, etc...) but that's not in the cards or the budget.
Right now I'm considering using Solr to query HBase. While I believe HBase could scale up to the needs, I worry about Solr. It's optimized as a search engine, i.e. the first pages of results return before the last and there's no support for something like a database cursor. Tests so far have shown that getting a large result set out of Solr have been slower than I would've liked. For instance comparing a query that would retrieve half of the available users (one which ultimately returned 500 mb of data) in the community version of Infobright finished in under a minute, for Solr it took 12 minutes.
Is there something other than Solr that's better suited to query this data? Are there any optimizations that would help with bulk data input and output?