realbasic-nug
[Top] [All Lists]

Re: Reading comma delimited data

To: REALbasic Network Users Group <realbasic-nug at lists dot realsoftware dot com>
Subject: Re: Reading comma delimited data
From: Richard Gaskin <ambassador at fourthworld dot com>
Date: Fri, 30 Aug 2002 10:32:15 -0700
On Thursday, August 29, 2002, at 03:37 PM, chris wrote:
> 
> Does anyone know an easy way to read comma delimited data "properly".
> That is, knowing that fields are delimited by commas, but to ignore
> commas that are inside quotes surrounding a field... BUT, I can't just
> scan for quotes as part of the delimiter because number fields will have
> no quotes.
> 
> So a typical record will look like the following:
> 
> 0124,"Chris","Dog, Cat",78,"Macintosh","27-44157"
> 
> note that the number only fields have no quotes, but a number MIGHT be
> inside quotes if it is being treated as text. And inside some quotes can
> be a comma.

Love that reliable consistency, eh? :)

> I had been doing this by reading the data as a TextInputStream. Read a
> record using ReadLine, then use NthField with a comma as a seperator, and
> then use ReplaceAll to strip any quotes.
> 
> The snag I have hit is fields that have a comma as part of the text in
> the field are being picked up as a field break by NthField.

If you do this enough you'll eventually discover another snag:  it's
perfectly conformant to MS's use of CSV to allow line endings within field
data.  If you're reading line by line without regard to whether you're
inside quotes, your code can mistake an in-data return with the end of a
record, breaking that record and all subsequent records.

I spent an inordinate amount of time tossing CSV parsing around with some
pals a while back, and in spite of our desire for efficiency the most
reliable algorithm we found was walking through the characters, keeping a
flag when you hit an unescaped quote and clear it when you hit the next;
while the flag is set you just pour the data into an array element; when you
hit the closing quote you go back to checking for commas, putting everything
in between commas into an array element.

CSV is a silly (er, "inherently inefficient") format.  If possible, use
anything else -  you'll find more efficient ways to parse nearly any other
format on the planet. :)

-- 
 Richard Gaskin 
 Fourth World Media Corporation
 Custom Software and Web Development for All Major Platforms
 Developer of WebMerge 2.0: Publish any database on any site
 ___________________________________________________________
 Ambassador at FourthWorld dot com       http://www.FourthWorld.com
 Tel: 323-225-3717                       AIM: FourthWorldInc



<Prev in Thread] Current Thread [Next in Thread>