git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [CSV] Inconsistent record separator behavior


>Maybe I'm just not getting it, but it feels pretty messed up :-)


Mutual feeling, and +1 for consistency. From what I understood, users should be able to parse these crazy CVS's, but if they tried to re-create them, with comments, then they wouldn't be able to avoid the println/newline (so it wouldn't be parseable later with the same reader).


We probably need a ticket for it to aggregate the discussion and maybe a possible solution.

Cheers

________________________________
From: Benedikt Ritter <britter@xxxxxxxxxx>
To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>; brunodepaulak@xxxxxxxxxxxx 
Sent: Thursday, 23 August 2018 7:10 AM
Subject: Re: [CSV] Inconsistent record separator behavior



Hi Bruno,

Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
<brunodepaulak@xxxxxxxxxxxx.invalid>:

> Hi,
>
>
> Will try to look at the code and give a better answer during the weekend.
> But risking a silly question, would it mean that users are not able to
> parse a CSV unless each CSV row is separated by LF or CRLF?


Yes.


> I remember getting a CSV in a government website some time ago that was
> formatted in a very strange way, and if I remember well it was a small
> file, but without LF or CRLF. I think it was using | to separate the rows,
> and , for columns.
>

I didn't know that there are formats that don't use a new line as line
separator.


>
>
> Quick search returned at least another person with similar issue
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>
>
> Not sure if I understood the problem well, but in case it makes sense...
> my suggestion would be to perhaps confirm if we could change
> CSVPrinter.printComment to accept other characters for line ending?
>

The inconsistency I'm seeing is, that we an the one hand accept any
character sequence as a record separator. Comments in a way a like special
records to me. But our implementation seems to put them on a new "line"
using the println() method. The println() method in turn uses the record
seperator to start a new record. So it's not necessarily a new line.
Nevertheless while processing a comment, we look out for CR and LF and then
we call println() again. Maybe I'm just not getting it, but it feels pretty
messed up :-)

Regards,
Benedikt



>
>
> Thanks!
>
> Bruno
>
>
> ________________________________
> From: Benedikt Ritter <britter@xxxxxxxxxx>
> To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, 21 August 2018 7:13 PM
> Subject: [CSV] Inconsistent record separator behavior
>
>
>
> Hi,
>
>
> we have this strange handling of record separator / line endings in CSV:
>
>
> Users can use what ever character sequence they like as a record separator.
>
> I could for example use the ! character to mark the end of a record.
>
> Then we have CSVPrinter.printComment(String). This inserts comments into a
>
> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>
> turn uses the record separator to indicate a new record...
>
>
> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>
> as record separator? Maybe we should deprecate
>
> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>
> users can choose between LF and CRLF. This way we can make the behavior
>
> between parsing and printing consistent.
>
>
> Thoughts?
>
> Benedikt
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx

>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx