git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [CSV] Inconsistent record separator behavior


On 23 August 2018 at 07:10, Benedikt Ritter <britter@xxxxxxxxxx> wrote:
> Hey sebb,
>
> Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <sebbaz@xxxxxxxxx>:
>
>> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>> <brunodepaulak@xxxxxxxxxxxx.invalid> wrote:
>> >
>> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
>> >
>> >
>> > Mutual feeling, and +1 for consistency. From what I understood, users
>> should be able to parse these crazy CVS's, but if they tried to re-create
>> them, with comments, then they wouldn't be able to avoid the
>> println/newline (so it wouldn't be parseable later with the same reader).
>> >
>> >
>> > We probably need a ticket for it to aggregate the discussion and maybe a
>> possible solution.
>>
>> I'm wondering whether we need to be as flexible when *creating* the CSV
>> files.
>>
>> "Be liberal in what you accept, and conservative in what you send" (Jon
>> Postel)
>>
>> In this case send == create, as it might be sent to other less liberal
>> readers.
>>
>> I don't have a problem with the output being less flexible, so long as
>> it is sufficiently flexible (which I think it likely is already).
>>
>> I don't think consistency is necessary - or even desirable - here.
>>
>
> okay, but wouldn't you expect that you can use a CSVFormat instance to read
> a file that you created with it? This is currently not the case.

Sorry, I misread the problem.

Yes, it should be able to read what it writes.

So the issue remains: should the reader be able to parse the unusual
format, or should the writer not be able to create it?

I don't have a particular view on that, except that allowing LF and
CRLF only seems too restricting.
We should allow at least CR alone. I don't know whether there are any
other reasonable separators.

Perhaps we could just document the method to warn that using anything
other than CR, LF or CRLF will produce an output file that is not
parseable?

> Regards,
> Benedikt
>
>
>>
>> > Cheers
>> >
>> > ________________________________
>> > From: Benedikt Ritter <britter@xxxxxxxxxx>
>> > To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>;
>> brunodepaulak@xxxxxxxxxxxx
>> > Sent: Thursday, 23 August 2018 7:10 AM
>> > Subject: Re: [CSV] Inconsistent record separator behavior
>> >
>> >
>> >
>> > Hi Bruno,
>> >
>> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
>> > <brunodepaulak@xxxxxxxxxxxx.invalid>:
>> >
>> >> Hi,
>> >>
>> >>
>> >> Will try to look at the code and give a better answer during the
>> weekend.
>> >> But risking a silly question, would it mean that users are not able to
>> >> parse a CSV unless each CSV row is separated by LF or CRLF?
>> >
>> >
>> > Yes.
>> >
>> >
>> >> I remember getting a CSV in a government website some time ago that was
>> >> formatted in a very strange way, and if I remember well it was a small
>> >> file, but without LF or CRLF. I think it was using | to separate the
>> rows,
>> >> and , for columns.
>> >>
>> >
>> > I didn't know that there are formats that don't use a new line as line
>> > separator.
>> >
>> >
>> >>
>> >>
>> >> Quick search returned at least another person with similar issue
>> >>
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>> >>
>> >>
>> >> Not sure if I understood the problem well, but in case it makes sense...
>> >> my suggestion would be to perhaps confirm if we could change
>> >> CSVPrinter.printComment to accept other characters for line ending?
>> >>
>> >
>> > The inconsistency I'm seeing is, that we an the one hand accept any
>> > character sequence as a record separator. Comments in a way a like
>> special
>> > records to me. But our implementation seems to put them on a new "line"
>> > using the println() method. The println() method in turn uses the record
>> > seperator to start a new record. So it's not necessarily a new line.
>> > Nevertheless while processing a comment, we look out for CR and LF and
>> then
>> > we call println() again. Maybe I'm just not getting it, but it feels
>> pretty
>> > messed up :-)
>> >
>> > Regards,
>> > Benedikt
>> >
>> >
>> >
>> >>
>> >>
>> >> Thanks!
>> >>
>> >> Bruno
>> >>
>> >>
>> >> ________________________________
>> >> From: Benedikt Ritter <britter@xxxxxxxxxx>
>> >> To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>
>> >> Sent: Tuesday, 21 August 2018 7:13 PM
>> >> Subject: [CSV] Inconsistent record separator behavior
>> >>
>> >>
>> >>
>> >> Hi,
>> >>
>> >>
>> >> we have this strange handling of record separator / line endings in CSV:
>> >>
>> >>
>> >> Users can use what ever character sequence they like as a record
>> separator.
>> >>
>> >> I could for example use the ! character to mark the end of a record.
>> >>
>> >> Then we have CSVPrinter.printComment(String). This inserts comments
>> into a
>> >>
>> >> CSV output. It detects CRLF and call println() on the CSVFormat, which
>> in
>> >>
>> >> turn uses the record separator to indicate a new record...
>> >>
>> >>
>> >> So now I'm thinking: Does it make sense to use anything else but LF or
>> CRLF
>> >>
>> >> as record separator? Maybe we should deprecate
>> >>
>> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>> >>
>> >> users can choose between LF and CRLF. This way we can make the behavior
>> >>
>> >> between parsing and printing consistent.
>> >>
>> >>
>> >> Thoughts?
>> >>
>> >> Benedikt
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
>> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
>> >
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
>> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
>> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx