coldFusion jvm Garbage Collection not

2019-06-24 02:39发布

问题:

I have a cfc method that is looping through a list and making a series of SOAP calls via cfhttp. Then inserts the result into a database.

The process itself works great the problem is that the java memory slowly fills up and eventually (depending on the number of elements in the records being returned) just stops working. There is no error or anything visible it just stops. If I look at the application log file via the coldfusion admin I see one or both of the following errors:

GC overhead limit exceeded The specific sequence of files included or processed is:

or

Java heap space The specific sequence of files included or processed is:

Below is simplified version of the code I am running:

<cfsetting requesttimeout="3600">
<cfloop condition="thisPass lt 10">
    <cfscript>
        runtime = CreateObject("java","java.lang.Runtime").getRuntime();
        objSystem = CreateObject( "java", "java.lang.System" );
        soapBody = '';
        soapResponse = '';
        thisStruct = '';
        lock scope='application' type='exclusive' timeout='60' {


//This is where I am trying to manage the memory and call garbage collection


        try {
            freeMemory = runtime.freeMemory()/1024/1024;
            writeOutput("- fm = "&freeMemory);
            if (freeMemory < 200){
                objSystem.gc();
                sleep(1000);
                writeDump(' - dumping freeMemory');
             }
         }
         catch(any error) {
            writeDump(' - trying to dump GC as '&now()& ' freeMemory = '&freeMemory);
         }
         }
        </cfscript>
        <cfsavecontent variable="soapBody">
            <?xml version="1.0" encoding="utf-8"?>
            [ BUILD SOAP ENVELOP ]
        </cfsavecontent>

        <cfhttp url="[URL]" method="post" result="httpResponse" 
                        timeout="600" resolveurl="false">
                <cfhttpparam type="header" name="SOAPAction" value="[URL2]" />
                <cfhttpparam type="xml" value="#trim( soapBody )#"/>
            </cfhttp>


            <cfscript>
                soapBody = "";
                soapResponse = httpResponse.fileContent;
                soapResponse = xmlParse( soapResponse );
                thisStruct = xmlSearch(soapResponse,'/soap:Envelope/soap:Body/')[1].xmlChildren[1].xmlChildren[1].xmlChildren;
                writeOutput("-"&arrayLen(thisStruct)&' records');
                getPageContext().getOut().flush();
                if(arrayLen(thisStruct) == 2500){
                    thisPass = thisPass+1;
                } else {
                    writeOutput("- total records = "&(2500*(thisPass-1))+arrayLen(thisStruct));
                    thisPass = 100; // since looping while thisPass lt 10 this should prevent the next iteration
                }
            </cfscript>

            <cfloop from="1" to="#arrayLen(thisStruct)#" index="i">
                [RUN PROC TO INSERT RECORDS]
            </cfloop>
        </cfloop>

The GC seems to sometimes release a bit of memory but not with any reliability. I understand that GC() is only a recommendation for java to release some of the unused memory but I am unsure how I can get it to FORCE it to release memory. It is possible that there is a leak somewhere but I am not seeing it. I am hoping that this is something obvious that I have over looked and I admit that my java knowledge is extremely limited.

Is there some java guru out there that can see my error?

UPDATE : Here is a sample of the output incase this is helpfule to see the decline of memory.

there are 236 lists to loop through

  1. 88185 - fm = 293.564407349 -6 records- total records = 6
  2. 88389 - fm = 290.86995697 -116 records- total records = 116
  3. 88390 - fm = 308.382568359 -262 records- total records = 262
  4. 88839 - fm = 292.707099915 -2032 records- total records = 2032
  5. 91088 - fm = 290.711753845 -6 records- total records = 6
  6. 92998 - fm = 287.754066467 -5 records- total records = 5
  7. 95510 - fm = 309.919425964 -91 records- total records = 91
  8. 96478 - fm = 292.035064697 -1180 records- total records = 1180
  9. 96479 - fm = 259.001213074 -1113 records- total records = 1113
  10. 96480 - fm = 261.121406555 -110 records- total records = 110
  11. 96796 - fm = 267.235244751 -2 records- total records = 2
  12. 96799 - fm = 265.037582397 -0 records- total records = 0
  13. 97435 - fm = 263.589103699 -2500 records - fm = 227.629760742 -2500 records - fm = 200.85987854 -2500 records - fm = 202.156776428 -2500 records - fm = 166.366210938 - dumping freeMemory -656 records- total records = 10656
  14. 98173 - fm = 160.579734802 - dumping freeMemory -35 records- total records = 35
  15. 99111 - fm = 176.218482971 - dumping freeMemory -0 records- total records = 0
  16. 100998 - fm = 194.708694458 - dumping freeMemory -185 records- total records = 185
  17. 101811 - fm = 160.61415863 - dumping freeMemory -2500 records - fm = 112.862670898 - dumping freeMemory -2500 records - fm = 86.2071380615 - dumping freeMemory -2500 records - fm = 52.9639358521 - dumping freeMemory -1064 records- total records = 8564
  18. 105014 - fm = 56.1721343994 - dumping freeMemory -14 records- total records = 14
  19. 105992 - fm = 73.0022964478 - dumping freeMemory -14 records- total records = 14
  20. 107539 - fm = 75.9522399902 - dumping freeMemory -93 records- total records = 93
  21. 107580 - fm = 58.345199585 - dumping freeMemory -2500 records

回答1:

It's not a case of Java not managing GC very well, it's a case of you not managing what you're putting into (and taking out of ~) memory in such a way that things are available for the garbage collector to clean 'em up. You're (trying to ~) treat the symptom rather than the problem here.

Have a look at what's going into memory and why stuff that you expect to be GCed isn't being GCed. It could be that you've got references to stuff in a shared scope (unexpectedly), or something like that. Troubleshooting this sort of thing with ColdFusion over the top of Java is a bit of a dark art though.

But don't try to "fix" the "problem" of things not being GCed when you force a GC, fix the problem that is causing your memory to a) fill up; b) to not be GC-able when you think it ought to be.



回答2:

I can say that the force GC approach did not help me much when I tried it. In general here is a list of things to try first:

  • Make sure that in the objects that you are calling variables are use var or local
  • If you can move processing to the database do so
  • Remove <cfdump>, <cfdump> is very heavy. If you really need to use it, focus only on the part of the variable that is needed. Also use text format
  • Change the JVM to use more memory
  • Use Java 1.7 and G1GC . (Do this with caution as it may not yet be supported)
  • Use fewer query of queries

Here is the list of what I would consider for your code above

  • I would move everything inside the look into a function and var scope soapbody soapResponse, thisStruct
  • After I was done with thisStuct, I would StructClear(thisStruct);
  • Get write of the writedump(); they area heavy like <cfdump>


回答3:

After a lot searching I found that the best "fix" for this particular problems was to remove the code that was attempting to do the garbage collection and increase the java heap sizes. In the /jrun/bin/jvm.config file.

By changes the Arguments to VM to :

java.args=-server -Xms2048m -Xmx2048m -Xmn1024m -Dsun.io.useCanonCaches=false -XX:MaxPermSize=192m -XX:+UseParallelGC -Xbatch -Dcoldfusion.rootDir={application.home}/ -Djava.security.policy={application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/coldfusion.policy -Djava.security.auth.policy={application.home}/servers/cfusion/cfusion-ear/cfusion-war/WEB-INF/cfusion/lib/neo_jaas.policy

I was able to increase the initial heap size (Xms) and the max heap size (Xmx) to 2048m and the and the "young generation" heap size (Xmn) to 1024m Since it was suggested to me that the young generation should be smaller than the initial and max for better garbage collection.

As suggested by James, I commented out the actual processes (which are in a function and var'd) and then uncommenting them one at a time running everything each time. What I learned by this was that the large SOAP responses were what was filling up the memory and not a leak as I had feared.

As Adam mentioned, it it wasn't about Java managing the GC but rather there wasn't enough room to manage was I was throwing at at (for some reason CF doesn't like to deal with 2500 record SOAP responses very well) go figure.

Adam was also correct saying that trouble shooting Java memory in CF is a "dark art" By using the server monitor http://localhost/CFIDE/administrator/monitor/launch-monitor.cfm and going to the statistics tab under Memory Usage -> Memory Usage Summary I could watch the memory slowly fill up and dump itself even when after a fresh reboot no processes where running. Never was able to figure out why but I could see what levels were running and that I was hitting the top.

The default memory allocated in the jvm.config file was 512m and simply wasn't enough to handle what was going on. There may be a better way to handle this overall process. May have to implement Henry's suggestion and put it in the database and chunk through it, although that sounds very clumsy to me.

I am sure that other then CF now taking up a huge chunk of resources right out of the gate there may be other problems with this solution but (for now) it seems to be working as needed.

UPDATE : I changed the cfc function so that instead of inserting everything into the database I wrote all the XML to files and and then read the file and insert into the database. Somehow the writing to file allowed java to "breath" long enough to do the GC() itself.



回答4:

AFAIK CF is not good at running long requests. From what I heard memory are not released until the request is finished.

What we have tried is breaking down a long request into smaller requests that CF can manage, and the memory will usually be released once the request is done.

One legacy system we use would insert the task(s) into a DB table and CF Scheduler will work on a batch at a time. I hated that because of the latency, but that's what needed to be done back in CF7 days, and it doesn't seem to have improved ever since.