Thanks for the response.
I have 2 DC's of 5 nodes each. I'm doing write against DC2 with local quorum consistency. nodetool confirmed that all space is consumed on DC2 and I did not set any replication for DC1. This is the expected behavior.
Regarding node.js behavior, I validated that node is not responsible for the delay since the application is deployed on a single server where latencies are high(25%-40% higher).
I turned on trace and found that there is some communication going on across DC, which introduced this delay.
Here is the attached events for one such session:
1) Why DC1 is at-all involved while performing local quorum write to DC2 ?
2) How to read source_elapsed? It seems cumulative per host, but somewhere they reset.
There are a ton of factors that can impact query performance.
The cassandra native protocol supports multiple simultaneous requests per connection. Most drivers by default only create one connection to each C* host in the local data center. That being said, that shouldn't be a problem, particularly if you are only executing 20 concurrent requests, this is something both driver clients and C* handles well. The driver does do some write batching to reduce the amount of system calls, but I'm reasonably confident this is not an issue.
It may be worth enabling client logging
to see if that shines any light. You can also enable tracing on your requests by specifying traceQuery
as a query option (example
) to see if the delay is caused by C*-side processing.
Also keep in mind that all user code in node.js is handled in a single thread. If you have callbacks tied to your responses that do non-trivial work, that can delay subsequent requests from being processed, which may give impression that some queries are slow.
I used cassandra driver provided by datastax (3.5.0) library in nodejs. I've 5 nodes cluster. I'm writing to a table with quorum.
I observed that there is some spike in write. In ~20 writes, 2-5 writes are taking longer(~200ms). I debugged one of the node process with strace and found that longer latencies are batched and they use same fd to connect to cassandra. This may be the multiplexing.
Why it takes that long ?
Where should I look to resolve it?