Skip synchronized(unsafeTags) on owner-thread tag writes by bm1549 · Pull Request #11082 · DataDog/dd-trace-java

bm1549 · 2026-04-10T18:31:23Z

What Does This Do

Optimizes DDSpanContext tag writes by skipping synchronized(unsafeTags) when the writing thread is the span's creating thread (the common case). A volatile tagWriteState field tracks whether the span is in owner-only mode or shared mode. Once any non-owner thread accesses tags or the span finishes, it transitions to shared mode permanently and all subsequent accesses take the lock.

Motivation

Every setTag() / setMetric() call acquires synchronized(unsafeTags) — ~27 lock sites total. Spans are almost always written by a single thread, so the lock is uncontended overhead (~20-30ns per acquire/release × 5-20 tags per span × millions of spans/second). This optimization eliminates that overhead on the fast path by replacing the lock with a volatile read + thread ID comparison.

Safety model:

volatile int tagWriteState tracks STATE_OWNER (0) vs STATE_SHARED (1). Once shared, never reverts.
Owner thread: volatile read + threadId check → skip lock
Non-owner thread: take lock + sticky-transition to STATE_SHARED
finish() calls transitionToShared() so post-finish writes (decorators/handlers) always take the lock
Long-running spans disable the optimization at construction since the writer thread may read tags on unfinished spans

Benchmark Results

JMH benchmarks, 2 forks × 5 warmup + 5 measurement iterations, back-to-back on same machine. Owner-thread benchmarks use @Threads(1), cross-thread uses @Threads(8).

JDK 21 (Zulu 21.0.1) — biased locking removed

Benchmark	Baseline	Optimized	Unit	Change
`fullLifecycle_tenTags`	2.288 ± 0.111	2.559 ± 0.091	ops/µs	+11.8%
`setStringTag_ownerThread`	0.032 ± 0.002	0.037 ± 0.004	ops/ns	+15.6%
`setIntTag_ownerThread`	0.030 ± 0.004	0.038 ± 0.017	ops/ns	+26.7%
`setTenTags_ownerThread`	0.006 ± 0.001	0.008 ± 0.001	ops/ns	+33.3%
`setStringTag_crossThread` (8T)	0.019 ± 0.003	0.017 ± 0.002	ops/ns	~neutral (within error)

JDK 8 (Zulu 8u372) — biased locking enabled

Benchmark	Baseline	Optimized	Unit	Change
`fullLifecycle_tenTags`	2.043 ± 0.071	2.132 ± 0.126	ops/µs	+4.4%
`setStringTag_ownerThread`	0.026 ± 0.002	0.029 ± 0.002	ops/ns	+11.5%
`setIntTag_ownerThread`	0.040 ± 0.016	0.035 ± 0.009	ops/ns	~neutral (within error)
`setTenTags_ownerThread`	0.007 ± 0.001	0.007 ± 0.001	ops/ns	~neutral
`setStringTag_crossThread` (8T)	0.022 ± 0.004	0.022 ± 0.005	ops/ns	~neutral (no regression)

Analysis

On JDK 21 (where biased locking was removed in JDK 15), uncontended synchronized is more expensive and the optimization shows clear gains: +12-33% across all owner-thread benchmarks. The full lifecycle benchmark (create + 10 tags + finish) shows +11.8% throughput improvement.

On JDK 8 (biased locking enabled), the JVM already optimizes uncontended locks, so gains are modest (+4-12%).

Cross-thread path: no regression on either JDK. The slow path (non-owner threads) takes the lock just like before, with one additional volatile read of tagWriteState.

Additional Notes

synchronized(unsafeTags) is kept on reader paths: processTagsAndBaggage, earlyProcessTags, getTags, toString
The TOCTOU race window for cross-thread writes is accepted — same best-effort as today, the existing codebase comments say "tags will rarely, if ever, be read and modified concurrently"
Added DDSpanContextConcurrencyTest (9 JUnit 5 tests including 3 targeted cross-thread stress tests) and SpanTagBenchmark (5 JMH benchmarks)

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
tag: ai generated

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

Spans are almost always written by a single thread, so the lock on every setTag/setMetric call is uncontended overhead. This adds a volatile tagWriteState check: if the current thread is the span's creating thread (STATE_OWNER), tag writes skip the lock entirely. Non-owner threads and post-finish writes take the lock and sticky-transition to STATE_SHARED. Long-running spans disable the optimization at construction since the writer thread may read tags on unfinished spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pr-commenter · 2026-04-10T19:21:56Z

Benchmarks

⚠️ Warning: Baseline build not found for merge-base commit. Comparing against the latest commit on master instead.

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/thread-owned-tags
git_commit_date	1776110723	1776111607
git_commit_sha	`9f89a0b`	`d992fcd`
release_version	1.62.0-SNAPSHOT~9f89a0b26c	1.62.0-SNAPSHOT~d992fcde3f

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1776113382	1776113382
ci_job_id	1591478952	1591478952
ci_pipeline_id	107470487	107470487
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-1-6rieplgt 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-1-6rieplgt 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 59 metrics, 12 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063626
Total [baseline] (11.15 s) : 0, 11149749
Agent [candidate] (1.058 s) : 0, 1058169
Total [candidate] (11.112 s) : 0, 11112380
section appsec
Agent [baseline] (1.249 s) : 0, 1248992
Total [baseline] (11.083 s) : 0, 11083030
Agent [candidate] (1.249 s) : 0, 1248808
Total [candidate] (11.108 s) : 0, 11107753
section iast
Agent [baseline] (1.223 s) : 0, 1222869
Total [baseline] (11.32 s) : 0, 11319740
Agent [candidate] (1.225 s) : 0, 1225152
Total [candidate] (11.233 s) : 0, 11233249
section profiling
Agent [baseline] (1.186 s) : 0, 1185593
Total [baseline] (11.094 s) : 0, 11094126
Agent [candidate] (1.194 s) : 0, 1193971
Total [candidate] (11.257 s) : 0, 11257032

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.064 s	-
Agent	appsec	1.249 s	185.367 ms (17.4%)
Agent	iast	1.223 s	159.243 ms (15.0%)
Agent	profiling	1.186 s	121.967 ms (11.5%)
Total	tracing	11.15 s	-
Total	appsec	11.083 s	-66.719 ms (-0.6%)
Total	iast	11.32 s	169.991 ms (1.5%)
Total	profiling	11.094 s	-55.623 ms (-0.5%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	appsec	1.249 s	190.639 ms (18.0%)
Agent	iast	1.225 s	166.983 ms (15.8%)
Agent	profiling	1.194 s	135.802 ms (12.8%)
Total	tracing	11.112 s	-
Total	appsec	11.108 s	-4.626 ms (-0.0%)
Total	iast	11.233 s	120.869 ms (1.1%)
Total	profiling	11.257 s	144.652 ms (1.3%)

gantt
    title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.233 ms) : 0, 1233
crashtracking [candidate] (1.237 ms) : 0, 1237
BytebuddyAgent [baseline] (636.694 ms) : 0, 636694
BytebuddyAgent [candidate] (631.769 ms) : 0, 631769
AgentMeter [baseline] (29.688 ms) : 0, 29688
AgentMeter [candidate] (29.376 ms) : 0, 29376
GlobalTracer [baseline] (250.598 ms) : 0, 250598
GlobalTracer [candidate] (248.904 ms) : 0, 248904
AppSec [baseline] (32.14 ms) : 0, 32140
AppSec [candidate] (32.089 ms) : 0, 32089
Debugger [baseline] (60.709 ms) : 0, 60709
Debugger [candidate] (60.174 ms) : 0, 60174
Remote Config [baseline] (594.69 µs) : 0, 595
Remote Config [candidate] (612.52 µs) : 0, 613
Telemetry [baseline] (8.185 ms) : 0, 8185
Telemetry [candidate] (8.047 ms) : 0, 8047
Flare Poller [baseline] (7.339 ms) : 0, 7339
Flare Poller [candidate] (9.841 ms) : 0, 9841
section appsec
crashtracking [baseline] (1.235 ms) : 0, 1235
crashtracking [candidate] (1.226 ms) : 0, 1226
BytebuddyAgent [baseline] (662.661 ms) : 0, 662661
BytebuddyAgent [candidate] (661.537 ms) : 0, 661537
AgentMeter [baseline] (12.138 ms) : 0, 12138
AgentMeter [candidate] (12.087 ms) : 0, 12087
GlobalTracer [baseline] (249.781 ms) : 0, 249781
GlobalTracer [candidate] (249.2 ms) : 0, 249200
AppSec [baseline] (184.261 ms) : 0, 184261
AppSec [candidate] (184.903 ms) : 0, 184903
Debugger [baseline] (65.198 ms) : 0, 65198
Debugger [candidate] (66.196 ms) : 0, 66196
Remote Config [baseline] (601.372 µs) : 0, 601
Remote Config [candidate] (619.056 µs) : 0, 619
Telemetry [baseline] (8.629 ms) : 0, 8629
Telemetry [candidate] (8.497 ms) : 0, 8497
Flare Poller [baseline] (3.534 ms) : 0, 3534
Flare Poller [candidate] (3.553 ms) : 0, 3553
IAST [baseline] (24.596 ms) : 0, 24596
IAST [candidate] (24.604 ms) : 0, 24604
section iast
crashtracking [baseline] (1.216 ms) : 0, 1216
crashtracking [candidate] (1.26 ms) : 0, 1260
BytebuddyAgent [baseline] (799.954 ms) : 0, 799954
BytebuddyAgent [candidate] (801.193 ms) : 0, 801193
AgentMeter [baseline] (11.408 ms) : 0, 11408
AgentMeter [candidate] (11.485 ms) : 0, 11485
GlobalTracer [baseline] (239.19 ms) : 0, 239190
GlobalTracer [candidate] (239.464 ms) : 0, 239464
AppSec [baseline] (32.565 ms) : 0, 32565
AppSec [candidate] (32.11 ms) : 0, 32110
Debugger [baseline] (59.674 ms) : 0, 59674
Debugger [candidate] (62.489 ms) : 0, 62489
Remote Config [baseline] (536.868 µs) : 0, 537
Remote Config [candidate] (533.526 µs) : 0, 534
Telemetry [baseline] (12.698 ms) : 0, 12698
Telemetry [candidate] (11.153 ms) : 0, 11153
Flare Poller [baseline] (3.642 ms) : 0, 3642
Flare Poller [candidate] (3.402 ms) : 0, 3402
IAST [baseline] (25.743 ms) : 0, 25743
IAST [candidate] (25.866 ms) : 0, 25866
section profiling
ProfilingAgent [baseline] (94.002 ms) : 0, 94002
ProfilingAgent [candidate] (94.84 ms) : 0, 94840
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.177 ms) : 0, 1177
BytebuddyAgent [baseline] (692.346 ms) : 0, 692346
BytebuddyAgent [candidate] (697.321 ms) : 0, 697321
AgentMeter [baseline] (9.083 ms) : 0, 9083
AgentMeter [candidate] (9.187 ms) : 0, 9187
GlobalTracer [baseline] (207.379 ms) : 0, 207379
GlobalTracer [candidate] (208.923 ms) : 0, 208923
AppSec [baseline] (32.455 ms) : 0, 32455
AppSec [candidate] (32.885 ms) : 0, 32885
Debugger [baseline] (65.678 ms) : 0, 65678
Debugger [candidate] (66.095 ms) : 0, 66095
Remote Config [baseline] (577.696 µs) : 0, 578
Remote Config [candidate] (586.359 µs) : 0, 586
Telemetry [baseline] (7.918 ms) : 0, 7918
Telemetry [candidate] (7.955 ms) : 0, 7955
Flare Poller [baseline] (3.548 ms) : 0, 3548
Flare Poller [candidate] (3.637 ms) : 0, 3637
Profiling [baseline] (94.566 ms) : 0, 94566
Profiling [candidate] (95.399 ms) : 0, 95399

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063817
Total [baseline] (8.856 s) : 0, 8855987
Agent [candidate] (1.058 s) : 0, 1057690
Total [candidate] (8.828 s) : 0, 8828369
section iast
Agent [baseline] (1.228 s) : 0, 1228326
Total [baseline] (9.629 s) : 0, 9628876
Agent [candidate] (1.226 s) : 0, 1226260
Total [candidate] (9.533 s) : 0, 9533455

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.064 s	-
Agent	iast	1.228 s	164.508 ms (15.5%)
Total	tracing	8.856 s	-
Total	iast	9.629 s	772.889 ms (8.7%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.058 s	-
Agent	iast	1.226 s	168.57 ms (15.9%)
Total	tracing	8.828 s	-
Total	iast	9.533 s	705.085 ms (8.0%)

gantt
    title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.242 ms) : 0, 1242
crashtracking [candidate] (1.221 ms) : 0, 1221
BytebuddyAgent [baseline] (635.894 ms) : 0, 635894
BytebuddyAgent [candidate] (631.889 ms) : 0, 631889
AgentMeter [baseline] (29.653 ms) : 0, 29653
AgentMeter [candidate] (29.351 ms) : 0, 29351
GlobalTracer [baseline] (249.912 ms) : 0, 249912
GlobalTracer [candidate] (248.617 ms) : 0, 248617
AppSec [baseline] (32.278 ms) : 0, 32278
AppSec [candidate] (32.065 ms) : 0, 32065
Debugger [baseline] (59.924 ms) : 0, 59924
Debugger [candidate] (59.253 ms) : 0, 59253
Remote Config [baseline] (599.887 µs) : 0, 600
Remote Config [candidate] (597.658 µs) : 0, 598
Telemetry [baseline] (8.186 ms) : 0, 8186
Telemetry [candidate] (8.112 ms) : 0, 8112
Flare Poller [baseline] (9.84 ms) : 0, 9840
Flare Poller [candidate] (10.413 ms) : 0, 10413
section iast
crashtracking [baseline] (1.221 ms) : 0, 1221
crashtracking [candidate] (1.225 ms) : 0, 1225
BytebuddyAgent [baseline] (803.161 ms) : 0, 803161
BytebuddyAgent [candidate] (802.11 ms) : 0, 802110
AgentMeter [baseline] (11.455 ms) : 0, 11455
AgentMeter [candidate] (11.36 ms) : 0, 11360
GlobalTracer [baseline] (240.165 ms) : 0, 240165
GlobalTracer [candidate] (240.293 ms) : 0, 240293
AppSec [baseline] (29.038 ms) : 0, 29038
AppSec [candidate] (31.078 ms) : 0, 31078
Debugger [baseline] (63.956 ms) : 0, 63956
Debugger [candidate] (59.715 ms) : 0, 59715
Remote Config [baseline] (543.287 µs) : 0, 543
Remote Config [candidate] (524.282 µs) : 0, 524
Telemetry [baseline] (12.585 ms) : 0, 12585
Telemetry [candidate] (12.952 ms) : 0, 12952
Flare Poller [baseline] (3.651 ms) : 0, 3651
Flare Poller [candidate] (3.62 ms) : 0, 3620
IAST [baseline] (26.02 ms) : 0, 26020
IAST [candidate] (26.695 ms) : 0, 26695

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/thread-owned-tags
git_commit_date	1776110723	1776111607
git_commit_sha	`9f89a0b`	`d992fcd`
release_version	1.62.0-SNAPSHOT~9f89a0b26c	1.62.0-SNAPSHOT~d992fcde3f

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1776113855	1776113855
ci_job_id	1591478953	1591478953
ci_pipeline_id	107470487	107470487
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-4-ihuncgo1 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-4-ihuncgo1 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 1 performance improvements and 1 performance regressions! Performance is the same for 18 metrics, 16 unstable metrics.

scenario	Δ mean agg_http_req_duration_p50	Δ mean agg_http_req_duration_p95	Δ mean throughput	candidate mean agg_http_req_duration_p50	candidate mean agg_http_req_duration_p95	candidate mean throughput	baseline mean agg_http_req_duration_p50	baseline mean agg_http_req_duration_p95	baseline mean throughput
scenario:load:petclinic:iast:high_load	worse [+0.796ms; +1.621ms] or [+4.526%; +9.218%]	same [-13.037µs; +1595.382µs] or [-0.045%; +5.478%]	unstable [-39.012op/s; +11.074op/s] or [-15.010%; +4.261%]	18.793ms	29.915ms	245.938op/s	17.585ms	29.124ms	259.906op/s
scenario:load:petclinic:appsec:high_load	better [-799.929µs; -387.507µs] or [-4.333%; -2.099%]	same [-716.972µs; +417.648µs] or [-2.401%; +1.399%]	unstable [-18.949op/s; +30.136op/s] or [-7.622%; +12.123%]	17.869ms	29.713ms	254.188op/s	18.463ms	29.863ms	248.594op/s

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
    dateFormat X
    axisFormat %s
section baseline
no_agent (18.852 ms) : 18658, 19046
.   : milestone, 18852,
appsec (18.777 ms) : 18591, 18963
.   : milestone, 18777,
code_origins (17.84 ms) : 17670, 18010
.   : milestone, 17840,
iast (17.957 ms) : 17776, 18138
.   : milestone, 17957,
profiling (18.569 ms) : 18388, 18750
.   : milestone, 18569,
tracing (17.793 ms) : 17617, 17969
.   : milestone, 17793,
section candidate
no_agent (19.119 ms) : 18925, 19314
.   : milestone, 19119,
appsec (18.358 ms) : 18169, 18546
.   : milestone, 18358,
code_origins (17.802 ms) : 17625, 17978
.   : milestone, 17802,
iast (18.972 ms) : 18783, 19162
.   : milestone, 18972,
profiling (18.1 ms) : 17922, 18279
.   : milestone, 18100,
tracing (17.921 ms) : 17750, 18093
.   : milestone, 17921,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	18.852 ms [18.658 ms, 19.046 ms]	-
appsec	18.777 ms [18.591 ms, 18.963 ms]	-74.33 µs (-0.4%)
code_origins	17.84 ms [17.67 ms, 18.01 ms]	-1.012 ms (-5.4%)
iast	17.957 ms [17.776 ms, 18.138 ms]	-894.782 µs (-4.7%)
profiling	18.569 ms [18.388 ms, 18.75 ms]	-282.566 µs (-1.5%)
tracing	17.793 ms [17.617 ms, 17.969 ms]	-1.059 ms (-5.6%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	19.119 ms [18.925 ms, 19.314 ms]	-
appsec	18.358 ms [18.169 ms, 18.546 ms]	-761.714 µs (-4.0%)
code_origins	17.802 ms [17.625 ms, 17.978 ms]	-1.318 ms (-6.9%)
iast	18.972 ms [18.783 ms, 19.162 ms]	-146.99 µs (-0.8%)
profiling	18.1 ms [17.922 ms, 18.279 ms]	-1.019 ms (-5.3%)
tracing	17.921 ms [17.75 ms, 18.093 ms]	-1.198 ms (-6.3%)

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.233 ms) : 1221, 1244
.   : milestone, 1233,
iast (3.208 ms) : 3168, 3249
.   : milestone, 3208,
iast_FULL (5.962 ms) : 5902, 6022
.   : milestone, 5962,
iast_GLOBAL (3.767 ms) : 3707, 3828
.   : milestone, 3767,
profiling (2.348 ms) : 2324, 2371
.   : milestone, 2348,
tracing (1.876 ms) : 1860, 1893
.   : milestone, 1876,
section candidate
no_agent (1.236 ms) : 1224, 1247
.   : milestone, 1236,
iast (3.294 ms) : 3244, 3344
.   : milestone, 3294,
iast_FULL (5.798 ms) : 5740, 5855
.   : milestone, 5798,
iast_GLOBAL (3.605 ms) : 3552, 3659
.   : milestone, 3605,
profiling (2.424 ms) : 2397, 2450
.   : milestone, 2424,
tracing (1.887 ms) : 1871, 1904
.   : milestone, 1887,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.233 ms [1.221 ms, 1.244 ms]	-
iast	3.208 ms [3.168 ms, 3.249 ms]	1.976 ms (160.3%)
iast_FULL	5.962 ms [5.902 ms, 6.022 ms]	4.729 ms (383.6%)
iast_GLOBAL	3.767 ms [3.707 ms, 3.828 ms]	2.535 ms (205.6%)
profiling	2.348 ms [2.324 ms, 2.371 ms]	1.115 ms (90.4%)
tracing	1.876 ms [1.86 ms, 1.893 ms]	643.586 µs (52.2%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	1.236 ms [1.224 ms, 1.247 ms]	-
iast	3.294 ms [3.244 ms, 3.344 ms]	2.058 ms (166.6%)
iast_FULL	5.798 ms [5.74 ms, 5.855 ms]	4.562 ms (369.2%)
iast_GLOBAL	3.605 ms [3.552 ms, 3.659 ms]	2.37 ms (191.8%)
profiling	2.424 ms [2.397 ms, 2.45 ms]	1.188 ms (96.2%)
tracing	1.887 ms [1.871 ms, 1.904 ms]	651.616 µs (52.7%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	brian.marks/thread-owned-tags
git_commit_date	1776110723	1776111607
git_commit_sha	`9f89a0b`	`d992fcd`
release_version	1.62.0-SNAPSHOT~9f89a0b26c	1.62.0-SNAPSHOT~d992fcde3f

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1776113572	1776113572
ci_job_id	1591478954	1591478954
ci_pipeline_id	107470487	107470487
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-5-x1ij87wc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-5-x1ij87wc 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.449 s) : 15449000, 15449000
.   : milestone, 15449000,
appsec (14.95 s) : 14950000, 14950000
.   : milestone, 14950000,
iast (18.497 s) : 18497000, 18497000
.   : milestone, 18497000,
iast_GLOBAL (18.185 s) : 18185000, 18185000
.   : milestone, 18185000,
profiling (14.914 s) : 14914000, 14914000
.   : milestone, 14914000,
tracing (14.939 s) : 14939000, 14939000
.   : milestone, 14939000,
section candidate
no_agent (15.551 s) : 15551000, 15551000
.   : milestone, 15551000,
appsec (14.914 s) : 14914000, 14914000
.   : milestone, 14914000,
iast (18.49 s) : 18490000, 18490000
.   : milestone, 18490000,
iast_GLOBAL (18.343 s) : 18343000, 18343000
.   : milestone, 18343000,
profiling (14.94 s) : 14940000, 14940000
.   : milestone, 14940000,
tracing (14.886 s) : 14886000, 14886000
.   : milestone, 14886000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.449 s [15.449 s, 15.449 s]	-
appsec	14.95 s [14.95 s, 14.95 s]	-499.0 ms (-3.2%)
iast	18.497 s [18.497 s, 18.497 s]	3.048 s (19.7%)
iast_GLOBAL	18.185 s [18.185 s, 18.185 s]	2.736 s (17.7%)
profiling	14.914 s [14.914 s, 14.914 s]	-535.0 ms (-3.5%)
tracing	14.939 s [14.939 s, 14.939 s]	-510.0 ms (-3.3%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	15.551 s [15.551 s, 15.551 s]	-
appsec	14.914 s [14.914 s, 14.914 s]	-637.0 ms (-4.1%)
iast	18.49 s [18.49 s, 18.49 s]	2.939 s (18.9%)
iast_GLOBAL	18.343 s [18.343 s, 18.343 s]	2.792 s (18.0%)
profiling	14.94 s [14.94 s, 14.94 s]	-611.0 ms (-3.9%)
tracing	14.886 s [14.886 s, 14.886 s]	-665.0 ms (-4.3%)

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d992fcde3f, baseline=1.62.0-SNAPSHOT~9f89a0b26c
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.497 ms) : 1485, 1509
.   : milestone, 1497,
appsec (3.879 ms) : 3656, 4102
.   : milestone, 3879,
iast (2.291 ms) : 2222, 2361
.   : milestone, 2291,
iast_GLOBAL (2.33 ms) : 2260, 2399
.   : milestone, 2330,
profiling (2.1 ms) : 2046, 2155
.   : milestone, 2100,
tracing (2.101 ms) : 2047, 2155
.   : milestone, 2101,
section candidate
no_agent (1.495 ms) : 1483, 1506
.   : milestone, 1495,
appsec (3.794 ms) : 3574, 4013
.   : milestone, 3794,
iast (2.285 ms) : 2216, 2354
.   : milestone, 2285,
iast_GLOBAL (2.325 ms) : 2255, 2395
.   : milestone, 2325,
profiling (2.115 ms) : 2060, 2170
.   : milestone, 2115,
tracing (2.095 ms) : 2042, 2149
.   : milestone, 2095,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.497 ms [1.485 ms, 1.509 ms]	-
appsec	3.879 ms [3.656 ms, 4.102 ms]	2.382 ms (159.1%)
iast	2.291 ms [2.222 ms, 2.361 ms]	794.201 µs (53.0%)
iast_GLOBAL	2.33 ms [2.26 ms, 2.399 ms]	832.383 µs (55.6%)
profiling	2.1 ms [2.046 ms, 2.155 ms]	603.312 µs (40.3%)
tracing	2.101 ms [2.047 ms, 2.155 ms]	603.798 µs (40.3%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.495 ms [1.483 ms, 1.506 ms]	-
appsec	3.794 ms [3.574 ms, 4.013 ms]	2.299 ms (153.8%)
iast	2.285 ms [2.216 ms, 2.354 ms]	790.26 µs (52.9%)
iast_GLOBAL	2.325 ms [2.255 ms, 2.395 ms]	830.077 µs (55.5%)
profiling	2.115 ms [2.06 ms, 2.17 ms]	620.397 µs (41.5%)
tracing	2.095 ms [2.042 ms, 2.149 ms]	600.427 µs (40.2%)

@setup

Add three targeted concurrency tests that exercise the exact cross-thread tag write pattern the JMH crossThread benchmark was measuring: - crossThreadSustainedNoCrash: 8 threads × 10k setTag on same span - ownerToSharedTransition: owner writes first, then 8 threads join - manySpansCrossThread: 10k short-lived spans tagged from 8 threads All pass, proving the production code handles cross-thread writes without NPE or structural corruption. Fix the crossThread benchmark: change SharedSpan @setup from Level.Invocation to Level.Iteration. With Level.Invocation, 8 threads raced to call setup() concurrently, causing NPE when state.span was transiently null between invocations — a benchmark harness bug, not a production code bug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java

dd-trace-core/src/test/java/datadog/trace/core/DDSpanContextConcurrencyTest.java

dd-trace-core/src/jmh/java/datadog/trace/core/SpanTagBenchmark.java

dd-trace-core/src/main/java/datadog/trace/core/DDSpanContext.java

dougqh

Looking at throughput numbers, there's a modest gain that might make this change worth the complexity if we can simplify it a bit.

The two main changes that I think we should try are...

replacing the two volatiles with a single volatile for the owningThread
introducing high-order helper functions that hide the locking complexity

Then all the duplicate code can go away, and the accessing code becomes something like...
accessTags(unsafeTags -> {
...
});
We just need to make sure not to introduce a capturing lambda, so we don't incur unneeded allocation.

dougqh · 2026-04-13T19:49:44Z

Looking at throughput numbers, there's a modest gain that might make this change worth the complexity if we can simplify it a bit.

The two main changes that I think we should try are...

replacing the two volatiles with a single volatile for the owningThread

introducing high-order helper functions that hide the locking complexity

Then all the duplicate code can go away, and the accessing code becomes something like... accessTags(unsafeTags -> { ... }); We just need to make sure not to introduce a capturing lambda, so we don't incur unneeded allocation.

Alas, creating a higher-order helper function in DDSpanContext proved more annoying than anticipated.
Or at least, it is hard to do so while also avoiding variable closure than I expected.

I'm back to thinking if we're going to do this change, we should do it in OptimizedTagMap. At least in OptimizedTagMap, most methods are sugar around a few methods that just work with a single TagMap.Entry.

@threads

Per review feedback, move the lock-skipping optimization from DDSpanContext into OptimizedTagMap itself. This keeps the optimization invisible to callers — DDSpanContext no longer needs synchronized blocks around tag operations, and developers adding new tag operations don't need to think about locking. OptimizedTagMap now has a volatile Thread ownerThread field. Core methods (getAndSet, getAndRemove, getEntry, putAll, forEach, copy, etc.) check ownership: owner thread skips the lock, non-owner threads synchronize and permanently transition to shared mode. DDSpanContext changes: removed all 27 synchronized(unsafeTags) blocks, added setOwnerThread(current) in constructor, transitionToShared() delegates to TagMap. Also adds @threads(8) JMH benchmark variants and 5 new concurrency tests (mixed read/write, fuzz, value consistency, finish race, concurrent metrics). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bm1549 added type: enhancement Enhancements and improvements comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM labels Apr 10, 2026

bm1549 changed the title ~~Skip synchronized(unsafeTags) on owner-thread tag writes~~ WIP - DO NOT REVIEW: Skip synchronized(unsafeTags) on owner-thread tag writes Apr 10, 2026

bm1549 marked this pull request as ready for review April 10, 2026 19:05

bm1549 requested a review from a team as a code owner April 10, 2026 19:05

bm1549 requested a review from mhlidd April 10, 2026 19:05

bm1549 marked this pull request as draft April 10, 2026 19:23

bm1549 changed the title ~~WIP - DO NOT REVIEW: Skip synchronized(unsafeTags) on owner-thread tag writes~~ Skip synchronized(unsafeTags) on owner-thread tag writes Apr 10, 2026