Subject: libx11: Issues with Data32/_XData32
I've been debugging an issue in gtk2-perl causing it to SIGBUS on
sparc64, and traced it back to what seems to be dodgy code inside
libx11. One of the tests calls gdk_window_set_opacity, which calls
XChangeProperty with a pointer to a guint32, cast to char*, with the
length set to 32 bits as expected. However, this data pointer then gets
cast to a (long *) on line 83 of ChProp.c when calling Data32. Indeed,
the definition of Data32 specifies that data is a (long *) on sparc64,
since LONG64 is defined. This feels very wrong, but in and of itself is
not too bad (I believe it violates strict aliasing).
The problem really comes inside the implementation of _XData32. The
destination buffer, buf, is an (int *), but data is still a (long *).
These are not the same size. The issues with this are as follows:
1. Incrementing data increases the pointer by 8, so it skips over every
other 4-byte int, and reads twice as many bytes as it should.
2. On big-endian systems, (int)*(long *)an_int_pointer will end up
getting the 4 bytes *after* an_int_pointer, which means it gets
completely the wrong data even in the case of len being 4 (bytes).
3. On sparc64, pointers have strict alignment requirements - they must
be size aligned (ie (int *) must be 4-byte aligned, (long *) must be
8-byte aligned). In this case, the data pointer (which came all the
way from gdk_window_set_opacity) is only 4-byte aligned (which is
fine, since it's a pointer to a guint32), but it has been cast to a
(long *) and dereferenced, so it is required to be 8-byte aligned.
This is the cause of the observed crash.
However, I wonder how 1. is not seen currently given the wide use
of X11 on amd64. I can only assume 2. not being seen is because the only
64-bit big-endian architectures are generally used for servers, and
there aren't many of them.
I don't know what the solution to ...
the problem is. There are various
places in the call stack where this could be fixed up. The "correct" fix
seems to be to change Data32 to take an (int *) (or some other data type
of your choosing if you're worried about int's size not being portable)
and fix up the casts at all the call sites, but this is an intrusive
change to a public header, and I worry that there are things out there
relying on the current behaviour.
Alternatively, Data32 could be altered to still take a (long *), but
cast it back to an (int *). This has the advantage of not touching
public headers, but the prototype is then lying about what it's actually
doing, and this still has the problem of breaking behaviour.
The final, ugly alternative, is to alter XChangeProperty so it makes a
copy of data as an array of long, padding or sign-extending each
element, before passing it to Data32.
I can't claim to have spent much time looking through the code, so it's
highly likely I've missed something. Could those with more knowledge
please comment on the above?