git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Nodejs connector high latency


Hi Andy

Thanks for the response.

I have 2 DC's of 5 nodes each. I'm doing write against DC2 with local quorum consistency. nodetool confirmed that all space is consumed on DC2 and I did not set any replication for DC1. This is the expected behavior.

Regarding node.js behavior, I validated that node is not responsible for the delay since the application is deployed on a single server where latencies are high(25%-40% higher).

I turned on trace and found that there is some communication going on across DC, which introduced this delay. 

Here is the attached events for one such session:

activitysourcesource_elapsedthread
READ message received from /DC2-node3DC1-node25MessagingService-Incoming-/DC2-node3
Executing single-partition query on rolesDC1-node2179ReadStage-2
Acquiring sstable referencesDC1-node2212ReadStage-2
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstonesDC1-node2247ReadStage-2
Key cache hit for sstable 4DC1-node2273ReadStage-2
Key cache hit for sstable 6DC1-node2301ReadStage-2
Merged data from memtables and 2 sstablesDC1-node2352ReadStage-2
Read 1 live rows and 0 tombstone cellsDC1-node2384ReadStage-2
Read 1 live rows and 0 tombstone cellsDC1-node2411ReadStage-2
Enqueuing response to /DC2-node3DC1-node2428ReadStage-2
Sending REQUEST_RESPONSE message to /DC2-node3DC1-node2475MessagingService-Outgoing-/DC2-node3-Small
REQUEST_RESPONSE message received from /DC1-node2DC2-node3119313MessagingService-Incoming-/DC1-node2
Processing response from /DC1-node2DC2-node3119367RequestResponseStage-3
reading digest from /DC2-node4DC2-node3119489Native-Transport-Requests-8
reading digest from /DC2-node1DC2-node3119518Native-Transport-Requests-8
reading digest from /DC1-node2DC2-node3119538Native-Transport-Requests-8
Sending READ message to /DC2-node4DC2-node3119536MessagingService-Outgoing-/DC2-node4-Small
Executing single-partition query on rolesDC2-node3119554ReadStage-5
Sending READ message to /DC2-node1DC2-node3119562MessagingService-Outgoing-/DC2-node1-Small
Acquiring sstable referencesDC2-node3119570ReadStage-5
Sending READ message to /DC1-node2DC2-node3119579MessagingService-Outgoing-/DC1-node2-Small
Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstonesDC2-node3119589ReadStage-5
Key cache hit for sstable 1DC2-node3119609ReadStage-5
Merged data from memtables and 1 sstablesDC2-node3119651ReadStage-5
Read 1 live rows and 0 tombstone cellsDC2-node3119671ReadStage-5
REQUEST_RESPONSE message received from /DC2-node4DC2-node3120682MessagingService-Incoming-/DC2-node4
Processing response from /DC2-node4DC2-node3120714RequestResponseStage-10
REQUEST_RESPONSE message received from /DC2-node1DC2-node3120995MessagingService-Incoming-/DC2-node1
Processing response from /DC2-node1DC2-node3121088RequestResponseStage-10
READ message received from /DC2-node3DC1-node25MessagingService-Incoming-/DC2-node3
Executing single-partition query on rolesDC1-node267ReadStage-4
Acquiring sstable referencesDC1-node290ReadStage-4
Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstonesDC1-node2127ReadStage-4
Key cache hit for sstable 4DC1-node2165ReadStage-4
Key cache hit for sstable 6DC1-node2201ReadStage-4
Merged data from memtables and 2 sstablesDC1-node2267ReadStage-4
Read 1 live rows and 0 tombstone cellsDC1-node2301ReadStage-4
Read 1 live rows and 0 tombstone cellsDC1-node2318ReadStage-4
Enqueuing response to /DC2-node3DC1-node2329ReadStage-4
Sending REQUEST_RESPONSE message to /DC2-node3DC1-node2399MessagingService-Outgoing-/DC2-node3-Small
MUTATION message received from /DC2-node3DC2-node22MessagingService-Incoming-/DC2-node3
MUTATION message received from /DC2-node3DC2-node52MessagingService-Incoming-/DC2-node3
REQUEST_RESPONSE message received from /DC1-node2DC2-node3226207MessagingService-Incoming-/DC1-node2
Appending to commitlogDC2-node242MutationStage-9
Processing response from /DC1-node2DC2-node3226315RequestResponseStage-5
Adding to txn_game memtableDC2-node265MutationStage-9
Determining replicas for mutationDC2-node3226427Native-Transport-Requests-8
Enqueuing response to /DC2-node3DC2-node291MutationStage-9
Appending to commitlogDC2-node3226454Native-Transport-Requests-8
Sending REQUEST_RESPONSE message to /DC2-node3DC2-node2137MessagingService-Outgoing-/DC2-node3-Small
Adding to txn_game memtableDC2-node3226475Native-Transport-Requests-8
Sending MUTATION message to /DC2-node5DC2-node3226569MessagingService-Outgoing-/DC2-node5-Small
Sending MUTATION message to /DC2-node2DC2-node3226570MessagingService-Outgoing-/DC2-node2-Small
REQUEST_RESPONSE message received from /DC2-node2DC2-node33MessagingService-Incoming-/DC2-node2
Processing response from /DC2-node2DC2-node340RequestResponseStage-7
REQUEST_RESPONSE message received from /DC2-node5DC2-node32MessagingService-Incoming-/DC2-node5
Processing response from /DC2-node5DC2-node397RequestResponseStage-8


1) Why DC1 is at-all involved while performing local quorum write to DC2 ?
2) How to read source_elapsed? It seems cumulative per host, but somewhere they reset.


Thanks
Tarun Chabarwal


On Sun, Nov 4, 2018 at 11:33 PM Andy Tolbert <andrew.tolbert@xxxxxxxxxxxx> wrote:
Hi Tarun,

There are a ton of factors that can impact query performance.

The cassandra native protocol supports multiple simultaneous requests per connection.  Most drivers by default only create one connection to each C* host in the local data center.  That being said, that shouldn't be a problem, particularly if you are only executing 20 concurrent requests, this is something both driver clients and C* handles well.  The driver does do some write batching to reduce the amount of system calls, but I'm reasonably confident this is not an issue.

It may be worth enabling client logging to see if that shines any light.   You can also enable tracing on your requests by specifying traceQuery as a query option (example) to see if the delay is caused by C*-side processing.

Also keep in mind that all user code in node.js is handled in a single thread.  If you have callbacks tied to your responses that do non-trivial work, that can delay subsequent requests from being processed, which may give impression that some queries are slow.

Thanks,
Andy

On Sun, Nov 4, 2018 at 8:59 AM Tarun Chabarwal <tarunchabarwal@xxxxxxxxx> wrote:
Hi

I used cassandra driver provided by datastax (3.5.0) library in nodejs. I've 5 nodes cluster. I'm writing to a table with quorum.

I observed that there is some spike in write. In ~20 writes, 2-5 writes are taking longer(~200ms). I debugged one of the node process with strace and found that longer latencies are batched and they use same fd to connect to cassandra. This may be the multiplexing.

Why it takes that long ?
Where should I look to resolve it?

Regards
Tarun Chabarwal