First, I had a problem with getting the data from the Database, it took too much memory and failed. I've set -Xmx1500M and I'm using scrolling ResultSet so that was taken care of. Now I need to make an XML from the data, but I can't put it in one file. At the moment, I'm doing it like this:
while(rs.next()){
i++;
xmlStringBuilder.append("\n\t<row>");
xmlStringBuilder.append("\n\t\t<ID>" + Util.transformToHTML(rs.getInt("id")) + "</ID>");
xmlStringBuilder.append("\n\t\t<JED_ID>" + Util.transformToHTML(rs.getInt("jed_id")) + "</JED_ID>");
xmlStringBuilder.append("\n\t\t<IME_PJ>" + Util.transformToHTML(rs.getString("ime_pj")) + "</IME_PJ>");
//etc.
xmlStringBuilder.append("\n\t</row>");
if (i%100000 == 0){
//stores the data to a file with the name i.xml
storeKBR(xmlStringBuilder.toString(),i);
xmlStringBuilder= null;
xmlStringBuilder= new StringBuilder();
}
and it works; I get 12 100 MB files. Now, what I'd like to do is to do is have all that data in one file (which I then compress) but if just remove the if part, I go out of memory. I thought about trying to write to a file, closing it, then opening, but that wouldn't get me much since I'd have to load the file to memory when I open it.
You are assembling the complete file in memory: what you should be doing is writing the data directly to the file.
Additionally, you might consider using a proper XML API rather than assembling XML as a text file. A short tutorial is available here.
Why not write all data to one file and open the file with the "append" option? There is no need to read in all the data in the file if you are just going to write to it.
However, this might be a better solution:
The BufferedOutputStream will buffer the data before printing it, and you can specify the buffer size in the constructor if the default value does not suit your needs. See the java API for details: http://java.sun.com/javase/6/docs/api/.
Ok, so the code is rewritten and I'll include the whole operation:
But generateXML part still takes a lot of memory (if I'm guessing correctly, it takes bit by bit as much as it can) and I don't see how I could optimize it (use an alternative way to feed the writer.print function)?
I have never encountered this usecase but I am pretty sure vtd-xml supports xml's of size more than 1 GB. It is worth checking out @ http://vtd-xml.sourceforge.net
Or you can also follow all the below article series @ http://www.ibm.com/developerworks/ "Output large XML documents"