Skip synchronized(unsafeTags) on owner-thread tag writes#11082
Skip synchronized(unsafeTags) on owner-thread tag writes#11082
Conversation
Spans are almost always written by a single thread, so the lock on every setTag/setMetric call is uncontended overhead. This adds a volatile tagWriteState check: if the current thread is the span's creating thread (STATE_OWNER), tag writes skip the lock entirely. Non-owner threads and post-finish writes take the lock and sticky-transition to STATE_SHARED. Long-running spans disable the optimization at construction since the writer thread may read tags on unfinished spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063626
Total [baseline] (11.15 s) : 0, 11149749
Agent [candidate] (1.058 s) : 0, 1058169
Total [candidate] (11.112 s) : 0, 11112380
section appsec
Agent [baseline] (1.249 s) : 0, 1248992
Total [baseline] (11.083 s) : 0, 11083030
Agent [candidate] (1.249 s) : 0, 1248808
Total [candidate] (11.108 s) : 0, 11107753
section iast
Agent [baseline] (1.223 s) : 0, 1222869
Total [baseline] (11.32 s) : 0, 11319740
Agent [candidate] (1.225 s) : 0, 1225152
Total [candidate] (11.233 s) : 0, 11233249
section profiling
Agent [baseline] (1.186 s) : 0, 1185593
Total [baseline] (11.094 s) : 0, 11094126
Agent [candidate] (1.194 s) : 0, 1193971
Total [candidate] (11.257 s) : 0, 11257032
gantt
title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.233 ms) : 0, 1233
crashtracking [candidate] (1.237 ms) : 0, 1237
BytebuddyAgent [baseline] (636.694 ms) : 0, 636694
BytebuddyAgent [candidate] (631.769 ms) : 0, 631769
AgentMeter [baseline] (29.688 ms) : 0, 29688
AgentMeter [candidate] (29.376 ms) : 0, 29376
GlobalTracer [baseline] (250.598 ms) : 0, 250598
GlobalTracer [candidate] (248.904 ms) : 0, 248904
AppSec [baseline] (32.14 ms) : 0, 32140
AppSec [candidate] (32.089 ms) : 0, 32089
Debugger [baseline] (60.709 ms) : 0, 60709
Debugger [candidate] (60.174 ms) : 0, 60174
Remote Config [baseline] (594.69 µs) : 0, 595
Remote Config [candidate] (612.52 µs) : 0, 613
Telemetry [baseline] (8.185 ms) : 0, 8185
Telemetry [candidate] (8.047 ms) : 0, 8047
Flare Poller [baseline] (7.339 ms) : 0, 7339
Flare Poller [candidate] (9.841 ms) : 0, 9841
section appsec
crashtracking [baseline] (1.235 ms) : 0, 1235
crashtracking [candidate] (1.226 ms) : 0, 1226
BytebuddyAgent [baseline] (662.661 ms) : 0, 662661
BytebuddyAgent [candidate] (661.537 ms) : 0, 661537
AgentMeter [baseline] (12.138 ms) : 0, 12138
AgentMeter [candidate] (12.087 ms) : 0, 12087
GlobalTracer [baseline] (249.781 ms) : 0, 249781
GlobalTracer [candidate] (249.2 ms) : 0, 249200
AppSec [baseline] (184.261 ms) : 0, 184261
AppSec [candidate] (184.903 ms) : 0, 184903
Debugger [baseline] (65.198 ms) : 0, 65198
Debugger [candidate] (66.196 ms) : 0, 66196
Remote Config [baseline] (601.372 µs) : 0, 601
Remote Config [candidate] (619.056 µs) : 0, 619
Telemetry [baseline] (8.629 ms) : 0, 8629
Telemetry [candidate] (8.497 ms) : 0, 8497
Flare Poller [baseline] (3.534 ms) : 0, 3534
Flare Poller [candidate] (3.553 ms) : 0, 3553
IAST [baseline] (24.596 ms) : 0, 24596
IAST [candidate] (24.604 ms) : 0, 24604
section iast
crashtracking [baseline] (1.216 ms) : 0, 1216
crashtracking [candidate] (1.26 ms) : 0, 1260
BytebuddyAgent [baseline] (799.954 ms) : 0, 799954
BytebuddyAgent [candidate] (801.193 ms) : 0, 801193
AgentMeter [baseline] (11.408 ms) : 0, 11408
AgentMeter [candidate] (11.485 ms) : 0, 11485
GlobalTracer [baseline] (239.19 ms) : 0, 239190
GlobalTracer [candidate] (239.464 ms) : 0, 239464
AppSec [baseline] (32.565 ms) : 0, 32565
AppSec [candidate] (32.11 ms) : 0, 32110
Debugger [baseline] (59.674 ms) : 0, 59674
Debugger [candidate] (62.489 ms) : 0, 62489
Remote Config [baseline] (536.868 µs) : 0, 537
Remote Config [candidate] (533.526 µs) : 0, 534
Telemetry [baseline] (12.698 ms) : 0, 12698
Telemetry [candidate] (11.153 ms) : 0, 11153
Flare Poller [baseline] (3.642 ms) : 0, 3642
Flare Poller [candidate] (3.402 ms) : 0, 3402
IAST [baseline] (25.743 ms) : 0, 25743
IAST [candidate] (25.866 ms) : 0, 25866
section profiling
ProfilingAgent [baseline] (94.002 ms) : 0, 94002
ProfilingAgent [candidate] (94.84 ms) : 0, 94840
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.177 ms) : 0, 1177
BytebuddyAgent [baseline] (692.346 ms) : 0, 692346
BytebuddyAgent [candidate] (697.321 ms) : 0, 697321
AgentMeter [baseline] (9.083 ms) : 0, 9083
AgentMeter [candidate] (9.187 ms) : 0, 9187
GlobalTracer [baseline] (207.379 ms) : 0, 207379
GlobalTracer [candidate] (208.923 ms) : 0, 208923
AppSec [baseline] (32.455 ms) : 0, 32455
AppSec [candidate] (32.885 ms) : 0, 32885
Debugger [baseline] (65.678 ms) : 0, 65678
Debugger [candidate] (66.095 ms) : 0, 66095
Remote Config [baseline] (577.696 µs) : 0, 578
Remote Config [candidate] (586.359 µs) : 0, 586
Telemetry [baseline] (7.918 ms) : 0, 7918
Telemetry [candidate] (7.955 ms) : 0, 7955
Flare Poller [baseline] (3.548 ms) : 0, 3548
Flare Poller [candidate] (3.637 ms) : 0, 3637
Profiling [baseline] (94.566 ms) : 0, 94566
Profiling [candidate] (95.399 ms) : 0, 95399
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063817
Total [baseline] (8.856 s) : 0, 8855987
Agent [candidate] (1.058 s) : 0, 1057690
Total [candidate] (8.828 s) : 0, 8828369
section iast
Agent [baseline] (1.228 s) : 0, 1228326
Total [baseline] (9.629 s) : 0, 9628876
Agent [candidate] (1.226 s) : 0, 1226260
Total [candidate] (9.533 s) : 0, 9533455
gantt
title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.242 ms) : 0, 1242
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (635.894 ms) : 0, 635894
BytebuddyAgent [candidate] (631.889 ms) : 0, 631889
AgentMeter [baseline] (29.653 ms) : 0, 29653
AgentMeter [candidate] (29.351 ms) : 0, 29351
GlobalTracer [baseline] (249.912 ms) : 0, 249912
GlobalTracer [candidate] (248.617 ms) : 0, 248617
AppSec [baseline] (32.278 ms) : 0, 32278
AppSec [candidate] (32.065 ms) : 0, 32065
Debugger [baseline] (59.924 ms) : 0, 59924
Debugger [candidate] (59.253 ms) : 0, 59253
Remote Config [baseline] (599.887 µs) : 0, 600
Remote Config [candidate] (597.658 µs) : 0, 598
Telemetry [baseline] (8.186 ms) : 0, 8186
Telemetry [candidate] (8.112 ms) : 0, 8112
Flare Poller [baseline] (9.84 ms) : 0, 9840
Flare Poller [candidate] (10.413 ms) : 0, 10413
section iast
crashtracking [baseline] (1.221 ms) : 0, 1221
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (803.161 ms) : 0, 803161
BytebuddyAgent [candidate] (802.11 ms) : 0, 802110
AgentMeter [baseline] (11.455 ms) : 0, 11455
AgentMeter [candidate] (11.36 ms) : 0, 11360
GlobalTracer [baseline] (240.165 ms) : 0, 240165
GlobalTracer [candidate] (240.293 ms) : 0, 240293
AppSec [baseline] (29.038 ms) : 0, 29038
AppSec [candidate] (31.078 ms) : 0, 31078
Debugger [baseline] (63.956 ms) : 0, 63956
Debugger [candidate] (59.715 ms) : 0, 59715
Remote Config [baseline] (543.287 µs) : 0, 543
Remote Config [candidate] (524.282 µs) : 0, 524
Telemetry [baseline] (12.585 ms) : 0, 12585
Telemetry [candidate] (12.952 ms) : 0, 12952
Flare Poller [baseline] (3.651 ms) : 0, 3651
Flare Poller [candidate] (3.62 ms) : 0, 3620
IAST [baseline] (26.02 ms) : 0, 26020
IAST [candidate] (26.695 ms) : 0, 26695
LoadParameters
See matching parameters
SummaryFound 1 performance improvements and 1 performance regressions! Performance is the same for 18 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section baseline
no_agent (18.852 ms) : 18658, 19046
. : milestone, 18852,
appsec (18.777 ms) : 18591, 18963
. : milestone, 18777,
code_origins (17.84 ms) : 17670, 18010
. : milestone, 17840,
iast (17.957 ms) : 17776, 18138
. : milestone, 17957,
profiling (18.569 ms) : 18388, 18750
. : milestone, 18569,
tracing (17.793 ms) : 17617, 17969
. : milestone, 17793,
section candidate
no_agent (19.119 ms) : 18925, 19314
. : milestone, 19119,
appsec (18.358 ms) : 18169, 18546
. : milestone, 18358,
code_origins (17.802 ms) : 17625, 17978
. : milestone, 17802,
iast (18.972 ms) : 18783, 19162
. : milestone, 18972,
profiling (18.1 ms) : 17922, 18279
. : milestone, 18100,
tracing (17.921 ms) : 17750, 18093
. : milestone, 17921,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section baseline
no_agent (1.233 ms) : 1221, 1244
. : milestone, 1233,
iast (3.208 ms) : 3168, 3249
. : milestone, 3208,
iast_FULL (5.962 ms) : 5902, 6022
. : milestone, 5962,
iast_GLOBAL (3.767 ms) : 3707, 3828
. : milestone, 3767,
profiling (2.348 ms) : 2324, 2371
. : milestone, 2348,
tracing (1.876 ms) : 1860, 1893
. : milestone, 1876,
section candidate
no_agent (1.236 ms) : 1224, 1247
. : milestone, 1236,
iast (3.294 ms) : 3244, 3344
. : milestone, 3294,
iast_FULL (5.798 ms) : 5740, 5855
. : milestone, 5798,
iast_GLOBAL (3.605 ms) : 3552, 3659
. : milestone, 3605,
profiling (2.424 ms) : 2397, 2450
. : milestone, 2424,
tracing (1.887 ms) : 1871, 1904
. : milestone, 1887,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section baseline
no_agent (15.449 s) : 15449000, 15449000
. : milestone, 15449000,
appsec (14.95 s) : 14950000, 14950000
. : milestone, 14950000,
iast (18.497 s) : 18497000, 18497000
. : milestone, 18497000,
iast_GLOBAL (18.185 s) : 18185000, 18185000
. : milestone, 18185000,
profiling (14.914 s) : 14914000, 14914000
. : milestone, 14914000,
tracing (14.939 s) : 14939000, 14939000
. : milestone, 14939000,
section candidate
no_agent (15.551 s) : 15551000, 15551000
. : milestone, 15551000,
appsec (14.914 s) : 14914000, 14914000
. : milestone, 14914000,
iast (18.49 s) : 18490000, 18490000
. : milestone, 18490000,
iast_GLOBAL (18.343 s) : 18343000, 18343000
. : milestone, 18343000,
profiling (14.94 s) : 14940000, 14940000
. : milestone, 14940000,
tracing (14.886 s) : 14886000, 14886000
. : milestone, 14886000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
dateFormat X
axisFormat %s
section baseline
no_agent (1.497 ms) : 1485, 1509
. : milestone, 1497,
appsec (3.879 ms) : 3656, 4102
. : milestone, 3879,
iast (2.291 ms) : 2222, 2361
. : milestone, 2291,
iast_GLOBAL (2.33 ms) : 2260, 2399
. : milestone, 2330,
profiling (2.1 ms) : 2046, 2155
. : milestone, 2100,
tracing (2.101 ms) : 2047, 2155
. : milestone, 2101,
section candidate
no_agent (1.495 ms) : 1483, 1506
. : milestone, 1495,
appsec (3.794 ms) : 3574, 4013
. : milestone, 3794,
iast (2.285 ms) : 2216, 2354
. : milestone, 2285,
iast_GLOBAL (2.325 ms) : 2255, 2395
. : milestone, 2325,
profiling (2.115 ms) : 2060, 2170
. : milestone, 2115,
tracing (2.095 ms) : 2042, 2149
. : milestone, 2095,
|
Add three targeted concurrency tests that exercise the exact cross-thread tag write pattern the JMH crossThread benchmark was measuring: - crossThreadSustainedNoCrash: 8 threads × 10k setTag on same span - ownerToSharedTransition: owner writes first, then 8 threads join - manySpansCrossThread: 10k short-lived spans tagged from 8 threads All pass, proving the production code handles cross-thread writes without NPE or structural corruption. Fix the crossThread benchmark: change SharedSpan @setup from Level.Invocation to Level.Iteration. With Level.Invocation, 8 threads raced to call setup() concurrently, causing NPE when state.span was transiently null between invocations — a benchmark harness bug, not a production code bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/test/java/datadog/trace/core/DDSpanContextConcurrencyTest.java
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Looking at throughput numbers, there's a modest gain that might make this change worth the complexity if we can simplify it a bit.
The two main changes that I think we should try are...
- replacing the two volatiles with a single volatile for the owningThread
- introducing high-order helper functions that hide the locking complexity
Then all the duplicate code can go away, and the accessing code becomes something like...
accessTags(unsafeTags -> {
...
});
We just need to make sure not to introduce a capturing lambda, so we don't incur unneeded allocation.
Alas, creating a higher-order helper function in DDSpanContext proved more annoying than anticipated. I'm back to thinking if we're going to do this change, we should do it in OptimizedTagMap. At least in OptimizedTagMap, most methods are sugar around a few methods that just work with a single TagMap.Entry. |
Per review feedback, move the lock-skipping optimization from DDSpanContext into OptimizedTagMap itself. This keeps the optimization invisible to callers — DDSpanContext no longer needs synchronized blocks around tag operations, and developers adding new tag operations don't need to think about locking. OptimizedTagMap now has a volatile Thread ownerThread field. Core methods (getAndSet, getAndRemove, getEntry, putAll, forEach, copy, etc.) check ownership: owner thread skips the lock, non-owner threads synchronize and permanently transition to shared mode. DDSpanContext changes: removed all 27 synchronized(unsafeTags) blocks, added setOwnerThread(current) in constructor, transitionToShared() delegates to TagMap. Also adds @threads(8) JMH benchmark variants and 5 new concurrency tests (mixed read/write, fuzz, value consistency, finish race, concurrent metrics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What Does This Do
Optimizes
DDSpanContexttag writes by skippingsynchronized(unsafeTags)when the writing thread is the span's creating thread (the common case). A volatiletagWriteStatefield tracks whether the span is in owner-only mode or shared mode. Once any non-owner thread accesses tags or the span finishes, it transitions to shared mode permanently and all subsequent accesses take the lock.Motivation
Every
setTag()/setMetric()call acquiressynchronized(unsafeTags)— ~27 lock sites total. Spans are almost always written by a single thread, so the lock is uncontended overhead (~20-30ns per acquire/release × 5-20 tags per span × millions of spans/second). This optimization eliminates that overhead on the fast path by replacing the lock with a volatile read + thread ID comparison.Safety model:
volatile int tagWriteStatetracksSTATE_OWNER(0) vsSTATE_SHARED(1). Once shared, never reverts.STATE_SHAREDfinish()callstransitionToShared()so post-finish writes (decorators/handlers) always take the lockBenchmark Results
JMH benchmarks, 2 forks × 5 warmup + 5 measurement iterations, back-to-back on same machine. Owner-thread benchmarks use
@Threads(1), cross-thread uses@Threads(8).JDK 21 (Zulu 21.0.1) — biased locking removed
fullLifecycle_tenTagssetStringTag_ownerThreadsetIntTag_ownerThreadsetTenTags_ownerThreadsetStringTag_crossThread(8T)JDK 8 (Zulu 8u372) — biased locking enabled
fullLifecycle_tenTagssetStringTag_ownerThreadsetIntTag_ownerThreadsetTenTags_ownerThreadsetStringTag_crossThread(8T)Analysis
On JDK 21 (where biased locking was removed in JDK 15), uncontended
synchronizedis more expensive and the optimization shows clear gains: +12-33% across all owner-thread benchmarks. The full lifecycle benchmark (create + 10 tags + finish) shows +11.8% throughput improvement.On JDK 8 (biased locking enabled), the JVM already optimizes uncontended locks, so gains are modest (+4-12%).
Cross-thread path: no regression on either JDK. The slow path (non-owner threads) takes the lock just like before, with one additional volatile read of
tagWriteState.Additional Notes
synchronized(unsafeTags)is kept on reader paths:processTagsAndBaggage,earlyProcessTags,getTags,toStringDDSpanContextConcurrencyTest(9 JUnit 5 tests including 3 targeted cross-thread stress tests) andSpanTagBenchmark(5 JMH benchmarks)Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueNote: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.