Re: Help with bad errors on 4.6.1
Il ven 9 mar 2018, 19:30 Enrico Olivelli <eolivelli@xxxxxxxxx> ha scritto:
> Thank you Ivan!
> I hope I did not mess up the dump and added ZK ports. We are not using
> standard ports and in that 3 machines there is also the 3 nodes zk
> ensemble which is supporting BK and all the other parts of the application
> So one explanation would be that something is connecting to the bookie and
> this makes the bookie switch in a corrupted state by double releasing a
I did some experiments and it is easy to reproduce the bookie side error
and the double release with a forged sequence of bytes (just using nc from
But this seems not enough to break the bookie.
I guess there is some corruption on client side and the error on the bookie
is only and effect, as Ivan is saying.
My colleagues left the system running with a deep level of debug during
next weekend, hopefully we will get some other stacktrace
> Il ven 9 mar 2018, 18:23 Ivan Kelly <ivank@xxxxxxxxxx> ha scritto:
>> I need to sign off for the day. I've done some analysis of a tcpdump
>> enrico sent to me out of band (may contain sensitive info so best not
>> to post on public forum).
>> I've attached a dump of just first bit of the header. Format is
>> <sequence in dump> <whether a request or response>(<remote port>)
>> <hexdump of payload>
>> There are definitely corrupt packets coming from somewhere. Search for
>> lines with CORRUPT.
>> 0247 - req (049546) - 00:00:00:08:ff:ff:ff:fe:00:00:00:0b CORRUPT
>> It's not clear whether these are originating at a valid client or not.
>> These trigger corrupt responses from the server, which I guess is the
>> double free manifesting itself. Strangely the
>> corrupt message seems to have a lot of data in common with what seems
>> like an ok message (it's clearer on fixed width font).
>> 0248 - resp(049720) -
>> 0249 - resp(049546) -
>> 00:00:00:10:ff:ff:ff:fe:00:00:02:86:00:07:e2:b1:00:00:00:00 CORRUPT
>> There's also some other weird traffic. Correct BK protobuf traffic
>> should be <4 bytes len>:00:03:....
>> There seems to be other traffic which is being accepted at the same
>> port, but looks like ZK traffic.
>> Anyhow, I'll dig more on monday.
>> On Fri, Mar 9, 2018 at 3:27 PM, Ivan Kelly <ivank@xxxxxxxxxx> wrote:
>> > On Fri, Mar 9, 2018 at 3:20 PM, Enrico Olivelli <eolivelli@xxxxxxxxx>
>> >> Bookies
>> >> 10.168.10.117:1822 -> bad bookie with 4.1.21
>> >> 10.168.10.116:1822 -> bookie with 4.1.12
>> >> 10.168.10.118:1281 -> bookie with 4.1.12
>> >> 10.168.10.117 client machine on which I have 4.1.21 client (different
>> >> process than the bookie one)
>> > Oh. This dump won't have the stream we need then, as that will be on
>> > loopback. Try adding "-i any" to the tcpdump. Sorry, I didn't realize
>> > your clients and servers are colocated.
>> > -Ivan
> -- Enrico Olivelli
-- Enrico Olivelli