git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [CSV] Inconsistent record separator behavior


Hey sebb,

Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb <sebbaz@xxxxxxxxx>:

> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> <brunodepaulak@xxxxxxxxxxxx.invalid> wrote:
> >
> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >
> >
> > Mutual feeling, and +1 for consistency. From what I understood, users
> should be able to parse these crazy CVS's, but if they tried to re-create
> them, with comments, then they wouldn't be able to avoid the
> println/newline (so it wouldn't be parseable later with the same reader).
> >
> >
> > We probably need a ticket for it to aggregate the discussion and maybe a
> possible solution.
>
> I'm wondering whether we need to be as flexible when *creating* the CSV
> files.
>
> "Be liberal in what you accept, and conservative in what you send" (Jon
> Postel)
>
> In this case send == create, as it might be sent to other less liberal
> readers.
>
> I don't have a problem with the output being less flexible, so long as
> it is sufficiently flexible (which I think it likely is already).
>
> I don't think consistency is necessary - or even desirable - here.
>

okay, but wouldn't you expect that you can use a CSVFormat instance to read
a file that you created with it? This is currently not the case.

Regards,
Benedikt


>
> > Cheers
> >
> > ________________________________
> > From: Benedikt Ritter <britter@xxxxxxxxxx>
> > To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>;
> brunodepaulak@xxxxxxxxxxxx
> > Sent: Thursday, 23 August 2018 7:10 AM
> > Subject: Re: [CSV] Inconsistent record separator behavior
> >
> >
> >
> > Hi Bruno,
> >
> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> > <brunodepaulak@xxxxxxxxxxxx.invalid>:
> >
> >> Hi,
> >>
> >>
> >> Will try to look at the code and give a better answer during the
> weekend.
> >> But risking a silly question, would it mean that users are not able to
> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >
> >
> > Yes.
> >
> >
> >> I remember getting a CSV in a government website some time ago that was
> >> formatted in a very strange way, and if I remember well it was a small
> >> file, but without LF or CRLF. I think it was using | to separate the
> rows,
> >> and , for columns.
> >>
> >
> > I didn't know that there are formats that don't use a new line as line
> > separator.
> >
> >
> >>
> >>
> >> Quick search returned at least another person with similar issue
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >>
> >>
> >> Not sure if I understood the problem well, but in case it makes sense...
> >> my suggestion would be to perhaps confirm if we could change
> >> CSVPrinter.printComment to accept other characters for line ending?
> >>
> >
> > The inconsistency I'm seeing is, that we an the one hand accept any
> > character sequence as a record separator. Comments in a way a like
> special
> > records to me. But our implementation seems to put them on a new "line"
> > using the println() method. The println() method in turn uses the record
> > seperator to start a new record. So it's not necessarily a new line.
> > Nevertheless while processing a comment, we look out for CR and LF and
> then
> > we call println() again. Maybe I'm just not getting it, but it feels
> pretty
> > messed up :-)
> >
> > Regards,
> > Benedikt
> >
> >
> >
> >>
> >>
> >> Thanks!
> >>
> >> Bruno
> >>
> >>
> >> ________________________________
> >> From: Benedikt Ritter <britter@xxxxxxxxxx>
> >> To: Commons Developers List <dev@xxxxxxxxxxxxxxxxxx>
> >> Sent: Tuesday, 21 August 2018 7:13 PM
> >> Subject: [CSV] Inconsistent record separator behavior
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >> we have this strange handling of record separator / line endings in CSV:
> >>
> >>
> >> Users can use what ever character sequence they like as a record
> separator.
> >>
> >> I could for example use the ! character to mark the end of a record.
> >>
> >> Then we have CSVPrinter.printComment(String). This inserts comments
> into a
> >>
> >> CSV output. It detects CRLF and call println() on the CSVFormat, which
> in
> >>
> >> turn uses the record separator to indicate a new record...
> >>
> >>
> >> So now I'm thinking: Does it make sense to use anything else but LF or
> CRLF
> >>
> >> as record separator? Maybe we should deprecate
> >>
> >> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
> >>
> >> users can choose between LF and CRLF. This way we can make the behavior
> >>
> >> between parsing and printing consistent.
> >>
> >>
> >> Thoughts?
> >>
> >> Benedikt
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
> >> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
> >
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
> > For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@xxxxxxxxxxxxxxxxxx
> For additional commands, e-mail: dev-help@xxxxxxxxxxxxxxxxxx
>
>