[osprofiler] Distributed tracing in OpenStack
Oh, yes please! :)
From: Ilya Shakhat [shakhat at gmail.com]
Sent: Thursday, April 11, 2019 2:42 PM
To: openstack-discuss at lists.openstack.org
Subject: [osprofiler] Distributed tracing in OpenStack
Distributed tracing is one of must-have features when one wants to track the full path of request going through different services and APIs. This makes it similar to shared request-id, but with nice visualization at the end . In OpenStack the tracing can be achieved via osprofiler library. The library was introduced 5 years ago, and back then there was no standard approach on how to do tracing and that's why it stays aside from what has become a mainstream. Yet there is no single standard, but the major players are OpenTracing and OpenCensus communities. OpenTracing is represented by Uber's Jaeger which is the default tracer from k8s world.
Issues and limitations to be fixed:
1. Compatibility. While osprofiler library supports many different storage drivers, it has only one way of transferring trace context over the wire. Ideally the library should be compatible with other third-party tracers and allow traces to start in front of OpenStack APIs (e.g. in user apps) and continue after (e.g. in storage systems, or network management tools). 
2. Operation mode. With osprofiler tracing is initiated by user request, while in industrial solutions the tracing can be managed centrally via dynamic sampling policies.
3. In-process trace propagation. Depending on execution model (threaded, async) the ways of storing current trace context differ. OSProfiler supports thread-local model, which recently got broken with new async implementation in openstacksdk . With OpenTracing it is possible to select the appropriate model alongside with tracer configuration.
What's the plan:
Switching to OpenTracing could be a good option to gain compatibility with 3rd-party solutions. The actual change should go to osprofiler library, but indirectly affects all OpenStack projects (should it be a global team goal then?). I'm going to make a PoC of proposed change, so reviews would be highly appreciated.
 e.g. http://logs.openstack.org/15/650915/4/check/tempest-smoke-py3-osprofiler-redis/7c6c14e/osprofiler-traces/trace-3e5cc660-8815-4079-86b9-778af8469d79.html.gz
-------------- next part --------------
An HTML attachment was scrubbed...