[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with bad errors on 4.6.1

Latest findings, some good news, and some very bad.

Good news:
I was wrong, I did not switch back the system to Java 8 correcly.

The problem is on Bookie side and occours only if the bookie in on Java 9.

Bad news:
I have a fix. The fix to use Unpooled ByteBufs in serializeProtobuf:

private static ByteBuf serializeProtobuf(MessageLite msg, ByteBufAllocator
allocator) {
        int size = msg.getSerializedSize();
        ByteBuf buf = Unpooled.buffer(size, size);

I will continue to track down to the cause, I think it is on the read-path
(not sure).

On client side we have a flag to not use pooled ByteBufs on Channel
Allocator, the most trivial fix at the moment is to make the same on Bookie
side as an hotfix for branch 4.6.

Before jumping to this extreme hotfix solution I will dig into the issue,
now that I know that the problem is ONLY on Java 9 and on the Bookie it
will be simpler to find a reproducer.

It remains the point that in other systems I have and in test cases there
is no failure.

Honestly I have no Java 9 bookie in production, only Java 8 bookies, maybe
this is the motivation of the fact that no one ever reported this problem
from production


2018-03-14 17:27 GMT+01:00 Ivan Kelly <ivank@xxxxxxxxxx>:

> >> > @Ivan
> >> > I wonder if some tests on Jepsen with bookie restarts may find this
> kind
> >> of
> >> > issues, given that it is not a network/SO problem
> >> If jepsen can catch then normal integration test can.
> I attempted a repro for this using the integration test stuff.
> Running for 2-3 hours in a loop, no bug hit. Perhaps I'm not doing
> exactly what you are doing.
> tests/integration/enrico-bug/src/test/java/org/apache/
> bookkeeper/tests/integration/
> -Ivan