CFCSV, a new CFML tag for Railo! - version 1.1, sept. 22, 2011

Ever had to work with CSV files? It stands for Comma Separated Values, and although the file format looks incredibly simple, it can be a pain to work with. An example of CSV data:

firstname,lastname,"lastname, firstname",age,description
Paul,Klinkenberg,"Klinkenberg, Paul",31,"CFML junkie, father, Railo Extension manager"
Luce,Klinkenberg,"Klinkenberg, Luce",0,"Main actions during the day:
- sleeping
- drinking
- crying
- creating dirty diapers"
"Emma ""my love""",Klinkenberg,"Klinkenberg, Emma ""my love""",30,Perfect :-)

Now how would you parse this data? Line by line probably, and then do a ListToArray(line, ","). But that would not work in this case, because you would get 5 items instead of 4. And you would get 8 rows, instead of 4. These caveats, and some other troubles, make sure that CSV files can be pretty hard to parse. But not anymore!!!

Thanks to Ben Nadel's extensive research into CSV parsing, and a bit of extra coding and tweaking on my part, you can now start using <cfcsv> in Railo!

Usage

The tag has 2 actions:

  • Parse: can take a file path or a string, and returns the results as a query (default), or as an array of arrays.
  • Create: takes a query, and returns a string of CSV data.

Attributes

Name Used whenRequired Type Default value Description
Action parse, create yes String
Must be one of the following: parse, create
Variable parse, create no String cfcsv Variable name which will hold the returned value
Textqualifier parse, create no String, 1 character
"
The character with which string values are 'quoted'. For example, when a string contains the csv file delimiter, a line break, or the textqualifier itself, then the string will be quoted to indicate it is one string.
Delimiter parse, create no String, 1 character
, The character used to delimit values. By default, this is a comma, but it can also be a TAB, a semicolon, or any other character.
Output parse no String query Must be one of the following: query, array.
File parse no (either File or Data attribute must be given)
String
An absolute or relative path to a csv file.
Data parse no (either File or Data attribute must be given) String

A string containing csv data.
Trimendoffile parse no Boolean true
Should we remove any line breaks at the end of the file. When set to false, and the last line ends in a line break, you will get an extra (empty) row of data.
Hascolumnnames parse no Boolean true Whether the data to parse has a first line with column names. When true, a returned query will use these headers as query column names. Otherwise, the query will have column names "col1", col2, etcetera. This setting is only used for query output.
Query create Yes, when action is create. No otherwise
Query

The query to create the csv data from.
IncludeColumnNames
create no Boolean true
Must we include the query column names as the first line of the returned csv data.

Examples

The output for these examples can be viewed here.

<!--- Parse a csv file, and return the resulting query into 'variables.myQuery'.
Also note the relative file path usage! --->
<cfcsv action="parse" file="testdata.csv" variable="myQuery" delimiter=";" />
<cfdump eval=myQuery />

<!--- some test data, ending with extra line breaks --->
<cfset myData = "test,test2
data 1,data 2
'quoted here, with a
line break',second col

" />
<!--- Parse a string of CSV data. The text qualifier is a single quote --->
<cfcsv action="parse" data="#myData#" variable="parsedDataQuery" textqualifier="'" />
<cfdump eval=parsedDataQuery />

<!--- Create CSV data from a query, returning it as 'variables.csvData' --->
<cfcsv action="create" query="#parsedDataQuery#" variable="csvData" />
<pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre>

<!--- Create CSV data from a query, without a first line of headers (query column names) --->
<cfcsv action="create" query="#parsedDataQuery#" variable="csvData" includeColumnNames=false />
<pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre>

<!--- Create CSV data from a query, using non-default delimiter and text-qualifier --->
<cfcsv action="create" query="#parsedDataQuery#" variable="csvData" delimiter="|" textqualifier="'" />
<pre><cfoutput>#htmleditformat(csvData)#</cfoutput></pre>

<!--- Parse a csv file, returning it as an array of arrays --->
<cfcsv action="parse" file="testdata.csv" variable="myArray" output="array" delimiter=";" />
<cfdump eval=myArray />

<!--- Return results as an array of arrays, and not removing optional trailing line breaks.
This will result in a few extra empty arrays, because every line will be seen as a row of data. --->
<cfcsv action="parse" file="testdata.csv" variable="myArray" output="array" trimendoffile=false delimiter=";" />
<cfdump eval=myArray />

<!--- Not removing optional trailing line breaks.
This will result in a few extra query rows, because every line will be seen as a row of data. --->
<cfcsv action="parse" file="testdata.csv" variable="myQuery" output="query" trimendoffile=false delimiter=";" />
<cfdump eval=myQuery />

<!--- Do not use first line as headers. This will give the query column names called 'col1', 'col2', etc. --->
<cfcsv action="parse" file="testdata.csv" variable="myQuery" hascolumnnames=false delimiter=";" />
<cfdump eval=myQuery />

Change log

Version 1.1, sept. 22 2011: Changed the way the regex pattern is built, so we can also use tabs /spaces as delimiters.

Installation

You can install the custom tag by using your Railo server admin.

Go to your Server admin (default is http://localhost/railo-context/admin/server.cfm), click on Extension > Applications, and install the CFCSV custom tag from the Railo Extension Store.

Lastly, you have to restart Railo to be able to use the tag.

The source code

You can view all source code from my subversion repository browser:

  • cfcsv.cfc: the code that does all the hard work
  • CSV.cfc: the Railo custom tag which invokes cfcsv.cfc

Thanks!

A big thank you goes out to Ben Nadel and Steven Levithan, who wrote the CSV parsing code. Ben never stops to amaze me with his incredible amount of blog posts and researches into all kinds of Coldfusion related subjects. You rock dude!

Questions? Complaints?

Leave a message; I appreciate it!

del.icio.us Digg StumbleUpon Facebook Technorati Fav reddit Google Bookmarks
| Viewed 7137 times
  1. Craig Kaminsky

    #1 by Craig Kaminsky - February 4, 2011 at 5:21 PM

    Paul: this looks way cool! Thanks, so much, for developing and posting it.
  2. Matt Woodward

    #2 by Matt Woodward - February 5, 2011 at 8:20 PM

    Nice work Paul!

    This prompted me to want to take a moment to plug the CFML Conventional Wisdom list, which is a public list where any CFML developer can go to discuss new additions to CFML:
    http://groups.google.com/group/cfml-conventional-wisdom

    One of the awesome things we're starting to see happen now that we're almost three years into the open source revolution in CFML is more community contributions to the engines by way of patches to the source as well as extensions like this. And it's really fantastic that an addition can go from idea to implementation so quickly.

    The only caveat I'd like to throw out there is we do run the risk of winding up with different implementations of a lot of features between Railo and OpenBD (and potentially Adobe CF if they decide to introduce some of the additions to the language that develop in the open source projects), which is why I'd recommend that people considering making additions to the engines open up discussion on the CFML Conventional Wisdom list to get feedback, as well as see if there's more or less an agreed upon standard that comes out of the discussions.

    For example, OpenBD already has CSVRead() and CSVWrite() functions, but no tag, so it'll be interesting to see if the tag you came up with more or less matches how the functions in OpenBD work.

    http://www.openbluedragon.org/manual/?/function/csvread
    http://www.openbluedragon.org/manual/?/function/csvwrite

    What's noteworthy is the fact that those functions started with a different name and some different functionality entirely, but after a discussion on the CFML Conventional Wisdom list things were adjusted:
    http://groups.google.com/group/cfml-conventional-wisdom/browse_thread/thread/c8e8e6e3e7aa4974/a5caadc9cb7cdaf8#

    Similarly with REEscape(), a discussion got started on the CFML Conventional Wisdom list for that as well, and there's already been some very good insight that will impact the end result that gets put into OpenBD (and I believe Peter Farrell already created the initial implementation):
    http://groups.google.com/group/cfml-conventional-wisdom/browse_thread/thread/c12d45d68889352f

    There were also been a host of IP-releated functions introduced to OpenBD a while that probably overlap quite a bit with CFDNS.

    My point isn't to stifle anyone's motivation to add new tags or functions. Really I'm just making a modest plea to go to the CFML Conventional Wisdom list and propose your idea so the new functionality can be discussed and, potentially at least, a standard or convention can be agreed upon. Let's just keep the lines of communication open!

    This is precisely the sort of issues we're going to be discussing heavily at OpenCF Summit, so hope to see a lot of you there.
    http://www.opencfsummit.org
  3. Michael Brennan-White

    #3 by Michael Brennan-White - February 6, 2011 at 1:22 AM

    Paul,

    I don't see this tag as an option to install though I was able to install the log analyzer.
  4. Paul Klinkenberg

    #4 by Paul Klinkenberg - February 6, 2011 at 12:29 PM

    @Michael: then you were in the Railo WEB admin. For these tags, you need to be in the server admin. Just change the "web.cfm" in the current url to "server.cfm".
    Cheers, Paul
  5. Paul Klinkenberg

    #5 by Paul Klinkenberg - February 6, 2011 at 1:53 PM

    @Matt: thanks for the extensive reply. I was kind of baffled to find out that nobody around me ever told me about this list, even though I have been coding things that were discussed on that list! Anyway, as you might have read on the list already, I now joined it, and will participate in the discussions :-)
    Thanks for pointing it out! Cheers, Paul
  6. Ben Nadel

    #6 by Ben Nadel - February 7, 2011 at 11:30 PM

    It's pretty badass how easy it is to augment the Railo code-base. Awesome stuff.
  7. Jean-Marc Bideaud

    #7 by Jean-Marc Bideaud - March 11, 2011 at 5:47 PM

    Paul I just discovered and tested your CFCSV Railo tag, it's great !
    I just have a problem when parsing large csv files, I get a Java heap space error message : java.lang.OutOfMemoryError.
    I understand I should adjust my Java memory settings but don't know how to do this : I just installed Windows Railo 3.2.1 Tomcat version from getRailo.org. It's running as 'Apache Tomcat Railo' service, I don't know how to modify the memory environment.
    Could you please explain me how to do this or direct me to a documentation fit for my problem ? thanks, Jean-Marc
  8. Jean-Marc Bideaud

    #8 by Jean-Marc Bideaud - March 11, 2011 at 6:15 PM

    @Jean-Marc Bideaud
    Paul I just found a way to increase JvmMx in Tomcat Windows registry and I am now able to parse a 12Mo csv file in a query of 110 000 records ! it's really nice for me, thanks for your work.
  9. Paul Klinkenberg

    #9 by Paul Klinkenberg - March 11, 2011 at 9:52 PM

    Hi Jean-Marc, that's all great to hear. I wanted to send you the following link as an answer, but see now that your problem is already fixed: http://groups.google.com/group/railo/msg/17583acaf9d06909
    Have a nice weekend! Paul
  10. Thomas Alexander

    #10 by Thomas Alexander - July 6, 2011 at 7:46 AM

    Hi
    I'm bit new to coldfusion.

    I was trying to parse a csv that i'm getting as an API response. For certain fields values i'm having extra ", characters, so the parser splits the csv from there also

    This is the format i'm having , can you please check if it could be parsed.

    "1","1","1297794014","11","1","0","0","0","0","Tom Alex","","","test@test.com","test","testekrjsktj","system \"ns1.exmaple.com\", has identified"
    "4","1","1298054610","21","2","0","5","1","0","Test2 User <>","","","Tom Alex ","Re: [#1] test","jdhfksjdhfkjsdhf",""
    "8","1","1298072657","23","2","1","5","7","0","Test 3 <>","","","","","Hi guys",""
  11. Arthur Blake

    #11 by Arthur Blake - September 16, 2011 at 8:59 PM

    I couldn't get it to work as is with TAB as a field delimiter, but if I change the regular expression escaping on line 73 to this:

          <cfif arguments.delimiter EQ chr(9)>
             <cfset local.escapedDelimiter = "\t" />
          <cfelse>
             <cfset local.escapedDelimiter = regExSafe(arguments.delimiter) />
          </cfif>

    Then it works.

    Also, I can't seem to use it with local scoped variables (or indeed variables with any scope) in the variable parameter, only unscoped variables seem to work... Not sure how to fix that one. Any ideas?
  12. Paul Klinkenberg

    #12 by Paul Klinkenberg - September 22, 2011 at 11:59 PM

    Hi Arthur, thanks for reporting it! After a bit of investigation, it turned out that the manner in which the regex pattern was built, caused the problem.
    It was written with the "verbose flag" on, which makes sure that whitespace in the regex is ignored. Normally no problem, except for when you're using a tab (or space) as delimiter. I uploaded version 1.1.3 now, which is installable from the Railo extension store.
    Regarding the unscoped variables: that's not working atm. If you reaally want it, the code is opensource :)
    Cheers, Paul
  13. Arthur Blake

    #13 by Arthur Blake - October 4, 2011 at 3:30 PM

    Thanks for the fix, Paul. Is your fix checked in? When I go to your SVN repository browser, I only see files that were last checked in in Feb...
    Also, I cannot seem to use a command line client to access your SVN repo. Is there an anonymous log in for view only access, that I could use?
  14. Paul Klinkenberg

    #14 by Paul Klinkenberg - October 4, 2011 at 10:47 PM

    Hi Arthur, the changes indeed hadn't been committed yet. Have done that now. Are you going to fix the unscoped variables issue?
    I want to move the svn projects to github, so other people can collaborate. But for now, I made the repos read-only for anonymous connections :-)
    Cheers, Paul
  15. Arthur Blake

    #15 by Arthur Blake - October 4, 2011 at 10:55 PM

    Yes, I may try to fix the unscoped variables issue. That would be great if you switch to github. Let me know if you do that.
  16. Brook

    #16 by Brook - October 11, 2011 at 1:09 AM

    I made a couple of modifications to this to read and parse a range of rows so that I could process large files in chunks. This only applies when a file argument is supplied. I added startrows and maxrows arguments:

    <cfargument name="startRow" type="numeric" default="1" hint="Row to start reading at" />
          <cfargument name="maxRows" type="numeric" default="0" hint="Maximum rows to read from the start row. 0 for all" />

    And then updated this code:

    <cfif len( arguments.file )>
       <!--- Read the file into Data. --->
       <cfset arguments.csv = fileRead( arguments.file ) />
    </cfif>

    to:

    <cfif len( arguments.file )>
    <cfif arguments.startRow is not 1 or arguments.maxRows is not 0>
          <!--- create the buffer so we can append one line at a time --->
          <cfset arguments.csv = createObject("java","java.lang.StringBuffer").Init()>
    <cfif arguments.maxRows is not 0>
    <cfset arguments.maxRows = arguments.maxRows>
              <cfelse>
             <cfset arguments.maxRows=0>
    </cfif>

          <!--- Read the file into Data. --->
          <cfloop file="#arguments.file#" index="local.line" from="#arguments.startRow#" to="#arguments.maxRows#">
           <cfset arguments.csv.append(local.line & chr(13) & chr(10))>
          </cfloop>
           <cfelse>
          <cfset arguments.csv = fileRead( arguments.file ) />
        </cfif>
    </cfif>
  17. Dino Edwards

    #17 by Dino Edwards - February 1, 2012 at 7:19 PM

    I'm having an issue using this tag attempting to parse a csv file with about 77,000 records. If I use query or array to output the results it always gives only one record.
  18. Paul Klinkenberg

    #18 by Paul Klinkenberg - February 1, 2012 at 8:04 PM

    Hi Dino, I assume you used an incorrect delimiter, or something is wrong with the line ends in the file. If it still fails, after you checked it again, then you can mail me an example of the csv file if you wish, and I will take a look. My address is paul@ this site.

    Good luck, Paul
  19. Dino Edwards

    #19 by Dino Edwards - February 1, 2012 at 9:26 PM

    The file is actually generated by logparser.exe which is a MS utility that converts .evt files to .csv so I'm assuming it's a comma delimited file, I looked at it and it looked fine to me but I probably missed something. I will attach a sample of the file since it's 57MB and e-mail it to ya.
  20. Hendrik Kramer

    #20 by Hendrik Kramer - March 22, 2012 at 11:23 AM

    Would it be possible to add a charset="..." attribute to this tag to control the charset for the file encoding? I could do

    <cfset data = fileRead( "test.csv", "utf-8" ) />
    <cfcsv action="parse" data="#data#" variable="q" hascolumnnames=true delimiter="," />

    but not provide a charset in the cfcsv tag itself.

    <cfcsv action="parse" file="test.csv" variable="q" hascolumnnames=true delimiter="," />

    Of course, I could set the default encoding in the railo administrator, but I have to work with CSV files in different encodings (Shift_JIS, ISO-8859-1, UTF-8 and more).

    Would be great!
  21. Paul Klinkenberg

    #21 by Paul Klinkenberg - March 22, 2012 at 11:58 AM

    Hendrik, I'll put it on the todo list :)
    Paul
  22. Colorado Techie

    #22 by Colorado Techie - August 21, 2012 at 6:17 PM

    Thank you for building this!
  23. Colorado Techie

    #23 by Colorado Techie - August 21, 2012 at 6:58 PM

    Question: do you have something that does the reverse? I have an array of arrays that I want to save to CSV. I wrote my own CSV builder, but it is brutally slow and inefficient. It would be awesome to have something like that built in to railo/cfml.
  24. Paul Klinkenberg

    #24 by Paul Klinkenberg - August 21, 2012 at 8:50 PM

    Hi Colorado, or should I call you Techie ;)
    It is a nice idea to be able to have an array of arrays as input, next to the current query input. I will put it on one of my todo lists...
    It would be fairly easy though, you could use something like:
    <cfset delimiter = "," />
    <cfsavecontent variable="csv"><cfloop array="#myArray#" index="arr"><cfloop array="#arr#" index="item"><cfif find(delimiter, item) or find('"', item) or find(chr(10), item)>"#replace(item, '"', '""', 'all')#"<cfelse>#item#</cfif>#delimiter#</cfloop>#chr(65535)#</cfloop></cfsavecontent>
    <cfset csv = trim(replace(csv, "#delimiter##chr(65535)#", chr(10), "all")) />

    Hope this helps :) Paul
  25. Colorado Techie

    #25 by Colorado Techie - March 9, 2013 at 6:46 AM

    Paul,

    I got the Array of Arrays to save back to CSV (using StringBuffer: http://www.bennadel.com/blog/305-ColdFusion-CFFile-vs-Java-java-io-BufferedOutputStream.htm ).

    Unfortunately, I'm having a problem with the cfcsv tag now... My file has 91,000 rows in it (it is a 32 MB text file) and cfcsv truncates the data at about 61,000 rows.

    I ran into a similar problem using my own code (with StringBuffer) and I ended up writing out my StringBuffer data to the file in increments of 10,000 rows and that fixed the problem.

    Unfortunately, I'm not sure what to do about the cfcsv issue. I'm not getting any OutOfMemory errors nor any stack traces... I just try to read in the file, then do ArrayLen(arrRows) and it outputs 61,488 instead of 91,509 (which is how many rows my file has).

    Any ideas?
  26. Colorado Techie

    #26 by Colorado Techie - March 9, 2013 at 8:47 AM

    Nevermind, I figured it out.... it turns out I was using | as a delimiter and double quote as a qualifier, but the file wasn't following the standard for a few lines and that is where it truncated my file.

    So, anyone having file truncation issues should make sure that their data is formed correctly.

    (I had the following line:

    element1|element2|ele "ment" 3|element4

    if you use a double quote as a qualifier, the third element needs the qualifiers to be escaped AND the entire element needs to be wrapped in the qualifier.

    Bleh. Anyhow, thanks again for this function, I use it all the time!)
  27. Paul Klinkenberg

    #27 by Paul Klinkenberg - March 9, 2013 at 3:20 PM

    Great to read you've got it sorted out already :)
  28. Brandon Culpepper

    #28 by Brandon Culpepper - March 14, 2013 at 9:50 PM

    Paul, I am using Railo 4.0.3.006 and in the Server Admin and I do not see this extension in the Applications. Screenhot: http://d.pr/i/Gj02

    Is cfcsv no longer available to be installed in this manor?
    Thanks for the help!
    Brandon
  29. Brandon Culpepper

    #29 by Brandon Culpepper - March 14, 2013 at 10:37 PM

    I decided to just use the cfcsv.cfc by itself. Works like a charm! Thank you so much for contributing to the community!

    Still curious as to why it isn't showing up in the extensions store though.
    Thanks again!

    Brandon
  30. Paul Klinkenberg

    #30 by Paul Klinkenberg - March 14, 2013 at 10:40 PM

    Hi Brandon,
    It is not showing up in the server admin anymore, because the backend Mura plugin we use in the extension store, only has 2 options for install type: Web or Server. it should have a third option: Both.
    Now every time I update a plugin in the backend, the install type "Both" gets reset to Web :-( Still need to fix that.

    The tag is available from the web admin though.

    Cheers, Paul
  31. Colby Litnak

    #31 by Colby Litnak - August 1, 2013 at 6:17 PM

    HI Paul,

    cfcsv works great... except I think one of the engineers I work with found a bug in it. It looks like if there is some tabs after a quoted row while parsing the csv file, it will stop processing.

    take the following csv, for example (replace the [tab] with the actual control character):
    col1,col2,col3
    "1val1","1val2","1val3"[tab][tab]      
    "2val1","2val2","2val3"

    It returns a query with just the first row and the 3rd value omitted.
    The engineer made a slight change to the regex in cfcsv.cfc to grab it and continue parsing, but it still leaves the third value blank. In any case, I thought you should know.

    Here is the new line 86 of cfcsv.cfc
    <cfsavecontent variable="local.regEx"><cfoutput>\G(?:#local.escapedTextqualifier#([^#local.escapedTextqualifier#]*+(?>#local.escapedTextqualifier##local.escapedTextqualifier#[^#local.escapedTextqualifier#]*+)*)#local.escapedTextqualifier#|([^#local.escapedTextqualifier##local.escapedDelimiter#\r\n]*+))(#local.escapedDelimiter#|\r\n?|[\t]*+\r\n|\n|$)</cfoutput></cfsavecontent>

    We might play around with it some more to get it to work a little bit better.
  32. Paul Klinkenberg

    #32 by Paul Klinkenberg - August 1, 2013 at 8:59 PM

    Hi Colby, thanks for the feedback. I wouldn't call it a bug, since a tab character is just like any other character. So you've got malformed csv data ;)
    You could remove the extra tabs before passing it to cfcsv: [cfset csvdata = rereplace(csvdata, '\t+([\r\n]|$)', '\1', 'all') /], or make sure there aren't any tabs at the end of a line.

    In the MS Access world we live in (assuming that's where it came from), it seems regular to have extra tabs, but it just isn't valid csv if Access outputs it as "string"[tab][tab]. It should have outputted it as "string",[tab],[tab]

    Anyway, in a future version I might optionally trim all lines, as in [cfcsv trimlines="true" .../]

    Cheers, Paul
(will not be published)
Leave this field empty