Subject: Re: Strange failures on 4.9.6-3 kernel



On 9 Feb 2017, at 23:08, Adhemerval Zanella <adhemerval.zanella@xxxxxxxxxx>
wrote:
> On 09/02/2017 20:14, James Clarke wrote:
>>> On 9 Feb 2017, at 21:31, Adhemerval Zanella <adhemerval.zanella@xxxxxxxxxx>
>>> wrote:
>>>
>>> Hi all,
>>>
>>> While testing glibc on the kindly provided T5 machine from Debian
>>> environment,
>>> I started to see some strange issues on sparc64 where glibc is failing on
>>> mostly static tests.
>>>
>>> Funny thing is I checked the latest working revision I used to update 2.25
>>> release page [1] and now the tests that used to pass are now failing. In
>>> fact I checked even the 2.23 and 2.24 glibc releases and both show the same
>>> issues as master branch, so I am almost ruling out a glibc regression
>>> (which
>>> was my first idea).
>>>
>>> I noted that the machine kernel was updated (from 4.9.2-2 to 4.9.6-3), but
>>> I am not sure if this is something to kernel. I haven't recorded the
>>> gcc revision I used on my initial testings. The static tets are failing due
>>> a memcpy call that issues bogus instructions:
>>>
>>> (gdb) r
>>> Starting program: /home/azanella/glibc/glibc-git-build/elf/tst-tls1-static
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x0000000000000340 in ?? ()
>>> (gdb) bt
>>> #0 0x0000000000000340 in ?? ()
>>> #1 0x0000000000101fd8 in __libc_setup_tls () at libc-tls.c:180
>>> #2 0x0000000000101950 in __libc_start_main (main=0x4e8, argc=<optimized
>>> out>, argv=0x7feffffef78, init=0x4a8, fini=0x220, rtld_fini=0x0,
>>> stack_end=0x1)
>>> at libc-start.c:189
>>> #3 0x0000000000100704 in _start () at ../sysdeps/sparc/sparc64/start.S:88
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>
>>> (gdb) up
>>> [...]
>>> 0x0000000000101fc8 <+344>: add %l4, %o0, %o0
>>> 0x0000000000101fcc <+348>: mov %i1, %o1
>>> 0x0000000000101fd0 <+352>: call 0x2949c0
>>> 0x0000000000101fd4 <+356>: stx %o0, [ %i4 + 0x20 ]
>>> => 0x0000000000101fd8 <+360>: sethi %hi(0x4800), %g3
>>>
>>> It seems 0x2949c0 is a unknown address, where it should be the memcpy one.
>>
>> Do you have the .o still for this? I would be interested to see what the
>> relocation was. One thing that has changed within the last week is enabling
>> PIE by default in GCC, though this call is a plain PC-relative one.
>>
>> Regards,
>> James
>>
>
> Yes, objdump shows:
>
> $ objdump -r string/memcpy.o
> string/memcpy.o: file format elf64-sparc
>
> RELOCATION RECORDS FOR [.text]:
> OFFSET TYPE VALUE
> 0000000000000010 R_SPARC_GOT22 __memcpy_niagara4
> 0000000000000014 R_SPARC_GOT10 __memcpy_niagara4
> 0000000000000028 R_SPARC_GOT22 __memcpy_niagara2
> 000000000000002c R_SPARC_GOT10 __memcpy_niagara2
> 0000000000000040 R_SPARC_GOT22 __memcpy_niagara1
> 0000000000000044 R_SPARC_GOT10 __memcpy_niagara1
> 0000000000000058 R_SPARC_GOT22 __memcpy_ultra3
> 000000000000005c R_SPARC_GOT10 __memcpy_ultra3
> 0000000000000068 R_SPARC_GOT22 __memcpy_ultra1
> 000000000000006c R_SPARC_GOT10 __memcpy_ultra1
> 0000000000000088 R_SPARC_GOT22 __mempcpy_niagara4
> 000000000000008c R_SPARC_GOT10 __mempcpy_niagara4
> 00000000000000a0 R_SPARC_GOT22 __mempcpy_niagara2
> 00000000000000a4 R_SPARC_GOT10 __mempcpy_niagara2
> 00000000000000b8 R_SPARC_GOT22 __mempcpy_niagara1
> 00000000000000bc R_SPARC_GOT10 __mempcpy_niagara1
> 00000000000000d0 R_SPARC_GOT22 __mempcpy_ultra3
> 00000000000000d4 R_SPARC_GOT10 __mempcpy_ultra3
> 00000000000000e0 R_SPARC_GOT22 __mempcpy_ultra1
> 00000000000000e4 R_SPARC_GOT10 __mempcpy_ultra1
>
> [debug relocations...]
>
> Which is expected to use GOT relocations for PIE. And if I build the
> same object with -fno-pie I do see:
>
> string/memcpy.o: file format elf64-sparc
>
> RELOCATION RECORDS FOR [.text]:
> OFFSET TYPE VALUE
> 0000000000000010 R_SPARC_HI22 __memcpy_niagara4
> 0000000000000014 R_SPARC_LO10 __memcpy_niagara4
> 0000000000000028 R_SPARC_HI22 __memcpy_niagara2
> 000000000000002c R_SPARC_LO10 __memcpy_niagara2
> 0000000000000040 R_SPARC_HI22 __memcpy_niagara1
> 0000000000000044 R_SPARC_LO10 __memcpy_niagara1
> 0000000000000058 R_SPARC_HI22 __memcpy_ultra3
> 000000000000005c R_SPARC_LO10 __memcpy_ultra3
> 0000000000000068 R_SPARC_HI22 __memcpy_ultra1
> 000000000000006c R_SPARC_LO10 __memcpy_ultra1
> 0000000000000088 R_SPARC_HI22 __mempcpy_niagara4
> 000000000000008c R_SPARC_LO10 __mempcpy_niagara4
> 00000000000000a0 R_SPARC_HI22 __mempcpy_niagara2
> 00000000000000a4 R_SPARC_LO10 __mempcpy_niagara2
> 00000000000000b8 R_SPARC_HI22 __mempcpy_niagara1
> 00000000000000bc R_SPARC_LO10 __mempcpy_niagara1
> 00000000000000d0 R_SPARC_HI22 __mempcpy_ultra3
> 00000000000000d4 R_SPARC_LO10 __mempcpy_ultra3
> 00000000000000e0 R_SPARC_HI22 __mempcpy_ultra1
> 00000000000000e4 R_SPARC_LO10 __mempcpy_ultra1
>
> I think no one rally tried to build the glibc with a default pie gcc so it
> might the side-effects of it. I tried to build with CC='gcc -fno-pie', but
> it failed on sunrpc/cross-rpcgen again with a segfault due a bogus jump
> from a possible mis-relocation.
>
> I am rebuilding gcc 6 without default pie to check if I can rebuilt and
> run glibc correctly.

I meant libc-tls.o's supposed call to memcpy in __libc_setup_tls.

Regards,
James



Programming list archiving by: Enterprise Git Hosting