I needed to parse the usage log file of my showTweets plugin. I got a bit scary, because the bloody thing was over 800 megabytes in size!
As a logical solution, I thought of using <cfloop file="x" />, because that way not the whole file needs to be read into memory. But when I ran the code, it crashed anyway with an outOfMemory error :-(
I just recently learned about how the Java Virtual Machine and Railo (CFML) interact. I used to think, that if I did the following code:
<cfset x = fileRead("someVeryBigFile.log") />
<cfset x = "" />
that the memory which was previously used for the contents of "someVeryBigFile.log", was instantly cleared when I assign a new value to x. But that turns out to be wrong. Instead, the JVM keeps all the unnecessary garbage, and only cleans it's house every now and then. Especially when there are no visitors, and it is quiet. So what happened when I looped over the mega file, and did some line parsing, was that the pile of garbage was stacked 7 feet high all over the JVM! Even though a <cfdump eval=variables /> only showed 10 small variables.
one great tip which I got, was to use <cfsleep time="300" /> within the loop. Not on every iteration, but at, let's say every 10.000th line. That way, Java's Garbage Collector has some spare time to clean the house. Euh, JVM. And I can say, it really works!
Some other tips:
- Use <cfsilent> around heavy loops. Otherwise, you might be creating a million tabs and spaces.
- Save temporary data to disk, and execute the file multiple times. For example:
<cfif fileExists("tempdata.log")>
<cfset parsedData = evaluate(fileRead("tempdata.log")) />
<cfelse>
<cfset parsedData = {} />
</cfif>
<cfparam name="application.currentlogline" default="0" />
<cfset counter = 1 />
<h1>We will start at line <cfoutput>#application.currentlogline#</h1>
<cfloop file="somefile.txt" index="line">
<cfif application.currentlogline lt counter>
<cfif not structKeyExists(parsedData, listfirst(line, ' '))>
<cfset parsedData[ listfirst(line, ' ') ] = 1 />
<cfelse>
<cfset parsedData[ listfirst(line, ' ') ] += 1 />
</cfif>
</cfif>
<cfset counter++ />
<!--- get out of the loop after 20.000 lines executed --->
<cfif application.currentlogline+20.000 lte counter>
<cfset application.currentlogline = counter-1 />
<cfbreak />
</cfif>
</cfloop>
<h1>We finished at line <cfoutput>#application.currentlogline#</h1>
<!--- write temporary data to disk --->
<cffile action="write" file="tempdata.log" output="#serialize(parsedData)#" />
| Viewed 600 times







Recent Comments