I am given a task to convert a huge table to custom XML file. I will be using Java for this job.
If I simply issue a "SELECT * FROM customer", it may return huge amount of data that eventually causing OOM. I wonder, is there a way i can process the record immediately once it become available, and remove the record from memory after that during sql retrieving process?
--- edited on 13 Jul 2009
Let me elaborate my question. I have 1 db server and 1 application server. When I issue a select query in application, the data will travel from db server to app server.
I believe (correct me if I am wrong) ResultSet will need to wait until receiving all records in the query. Even if we set fetch size as 4, for a 1000-record table, we still end up having 1000 records in heap memory of app server, is it correct? Fetch size only affect the number of round trip from/to db server.
My question is, how to start processing on that 4 (or any number) records immediately after it's arrival to app server, and dispose it to free up memory in app server?
A concept for exporting the entire table.
(Note to experts: I'm aware of its shortcomings.)Edit The shortcomings (thanks @J.S.):
I think you could use the same solution as this one. A scrollable resultset.
With a little more information I can get a more helpful answer.
If you are using MySQL:
from http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html:
If you are using JDBC you can use a ResultSet with a cursor which you iterate through one record at a time. You need to makes sure then that you write your XML out to a file one record at a time rather than using DOM to build the XML.
At which stage is the OOM error occurring, is it on data retrieval or processing data to XML file?
If its data retrieval, get the data in batches. Get the total number of rows first, order the selects by the primary key and limit the rows selected to chewable sizes.
If its at creating the XML file, send the XML node of each customer to System.out.println, don't hold it in memory. Launch the program via commad line and redirect all output to a file;
As you loop through the record all is saved in the file.
One rule of thumb that I've learnt from my experience is that you NEVER bring ALL the data from the database to your application server. One thing you can do is implement a procedure to page your data.
You can bring one page of data containing around 1000-5000 records, process them, then again fetch the data for the next page.