Subject: Re: Re: monotone & CVS import



On Wed, Nov 12, 2003 at 10:28:40AM -0500, graydon hoare wrote:
> space character is an unfortunate special case becuse some data in
> monotone is whitespace delimited (manifest entries and historical
> rename certs). so .. hmm. I think manifests would survive introducing
> space as an allowed character since they have exactly 40 characters of
> hex as their first component, and then 2 spaces, then "rest of line" as
> the path name. can't add '\n', but I don't think many files have that.

sha1sum appears to have interestingly undocumented behaviour here.

$ touch "`echo -e 'file_with_newline\nin'`"
$ touch 'file_with_backlash_n\nin'
$ touch 'normal_file'
$ sha1sum *
\da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_backlash_n\\nin
\da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_newline\nin
da39a3ee5e6b4b0d3255bfef95601890afd80709 normal_file

If I'm reading this right, it means that a checksum whose first letter
is "\" enables backslash processing of the filename.

This appears only to be for a few
$ touch "`echo -e 'file_with_non_printing_char\02in'`"
$ touch "`echo -e 'file_with_tab\tin'`"
$ sha1sum *
da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_wierd_charin
da39a3ee5e6b4b0d3255bfef95601890afd80709 file_with_tab in

Investigation shows that the \02 and tab are embedded literally.
Spaces are handled the same way. So this backslash processing appears
to be quoting of backslashes and newlines only? I'm too lazy to check
the source right now.

(Are there any weird systems that allow nuls in filenames? I wouldn't
put it past some version of Windows. That would be fun...)

> historical rename certs will break if we add ' ', but I don't know if
> anyone's using them aside from me so far, and I can repair my own just
> by re-issuing the cert. is anyone else using them yet? would '\n' be a
> good separator for structured data inside such certs? or '\0', just to
> be safe? how about a netstring, len:<len bytes...> ? it'd be nice to
> keep it possible to print the cert value to stdout.

How about <len1>:<len bytes>\n<len2>:<len bytes>? That should be easy
to parse for both computers and humans, while getting that nice sexpy
goodness going...

-- Nathaniel

--
"If you can explain how you do something, then you're very very bad at it."
-- John Hopfield




Programming list archiving by: Enterprise Git Hosting