I have a data processing system that generates very large reports on the data it processes. By "large" I mean that a "small" execution of this system produces about 30 MB of reporting data when dumped into a CSV file and a large dataset is about 130-150 MB (I'm sure someone out there has a bigger idea of "large" but that's not the point... ;)
Excel has the ideal interface for the report consumers in the form of its Data Lists: users can filter and segment the data on-the-fly to see the specific details that they are interested in (because they're not really interested in the many thousands of rows, they know how to apply multiple filters to get the data they want) - they can also add notes and markup to the reports, create charts, graphs, etc... They know how to do all this and it's much easier to let them do it if we just give them the data.
Excel was great for the small test datasets, but it cannot handle these large ones. Does anyone know of a tool that can provide a similar interface as Excel data lists - the ability to dynamically create and change filters on multiple fields; but can handle much larger files?
The next tool I tried was MS Access, and found that the Access file bloats hugely (30 MB input file leads to about 70 MB Access file, and when I open the file, run a report and close it the file's at 120-150 MB!), the import process is slow and very manual (currently, the CSV files are created by the same plsql script that runs the main process so there's next to no intervention on my part). I also tried an Access database with linked tables to the database tables that store the report data and that was many times slower (for some reason, sqlplus could query and generate the report file in a minute or soe while Access would take anywhere from 2-5 minutes for the same data)
(If it helps, the data processing system is written in PL/SQL and runs on Oracle 10g.)
Access would be a good tool to use in this case as it has no practical row limit unlike excel. The hard part is weaning people off excel when they are used to the power of custom filters. It is very possible in access to get something that approximates this but its never going to be exactly the same unless you embed an excel control into your forms.
As for the manual part, you can script the database to import files in using VBA. For example lets say this main task of your dumps the files in overnight to a folder with a new file each night. You could make a “watchdog” access database that has a form open with an “OnTimer” event that looks at that folder every few minutes, when it finds a new file it starts the import. When your users get to work in the morning the data is already loaded.
As for the bloating, yes it can be a problem however all you need to do is a quick compact and repair on the file and it will shrink it down.
EDIT:
You can set an access db to be compacted on close through the options. I cant remember exactly where it is and at work we only have access 97 (but oddly enough office 2003). The other option is to compact through code. Here is a link to explain how
http://forums.devarticles.com/microsoft-access-development-49/compact-database-via-vba-24958.html
Interesting one; there's not a lot in the mid-range for this kind of problem. Access should be it, but as you've found out, is pretty terrible in several ways, and is probably a bit too advanced for many end-users.
On the other hand, if you've got a database server there, it seems a shame not to use its power. There are several tools of varying cost and complexity that will allow you to set up fairly user-friendly server-based reports where you can give users the ability to set up parameters for their own reports, that then get filtered on the server and can have their results exported to Excel, such as Oracle Discoverer, or Microsoft Reporting Services (which can be set up to report directly on Oracle databases, even though it's SQL Server based.)
We use Microsoft Reporting Services; I build reports in Developer Studio that allow the users to go to a web page, filter by any of several pre-defined criteria, run the report (with the hard work being done on the server) and export the results to Excel for further processing.
A halfway house where you set up reports that can filter the initial data down to fewer than a million rows and then export to Excel might be the way to go...
What version of Excel, it can handle pretty large volumnes of data now. In Excel 2007, the worksheet size is 16,384 columns by 1,048,576 rows. Are you really sending more than a million records in a report? WHy, who would look at data like that?
I would suggest you use an Excel front-end to a shared RDB backend.
Build a custom filtering system for Excel (I would use a VBA & ADO XLA parked on a server but there are several technologies available) that the End-Users drive and which generates the SQL to bring back to Excel the subset of data they want to play with (chart, calculate, print etc).