Skip to content

Skip synchronized(unsafeTags) on owner-thread tag writes#11082

Draft
bm1549 wants to merge 2 commits intomasterfrom
brian.marks/thread-owned-tags
Draft

Skip synchronized(unsafeTags) on owner-thread tag writes#11082
bm1549 wants to merge 2 commits intomasterfrom
brian.marks/thread-owned-tags

Conversation

@bm1549
Copy link
Copy Markdown
Contributor

@bm1549 bm1549 commented Apr 10, 2026

What Does This Do

Optimizes DDSpanContext tag writes by skipping synchronized(unsafeTags) when the writing thread is the span's creating thread (the common case). A volatile tagWriteState field tracks whether the span is in owner-only mode or shared mode. Once any non-owner thread accesses tags or the span finishes, it transitions to shared mode permanently and all subsequent accesses take the lock.

Motivation

Every setTag() / setMetric() call acquires synchronized(unsafeTags) — ~27 lock sites total. Spans are almost always written by a single thread, so the lock is uncontended overhead (~20-30ns per acquire/release × 5-20 tags per span × millions of spans/second). This optimization eliminates that overhead on the fast path by replacing the lock with a volatile read + thread ID comparison.

Safety model:

  • volatile int tagWriteState tracks STATE_OWNER (0) vs STATE_SHARED (1). Once shared, never reverts.
  • Owner thread: volatile read + threadId check → skip lock
  • Non-owner thread: take lock + sticky-transition to STATE_SHARED
  • finish() calls transitionToShared() so post-finish writes (decorators/handlers) always take the lock
  • Long-running spans disable the optimization at construction since the writer thread may read tags on unfinished spans

Benchmark Results

JMH benchmarks, 2 forks × 5 warmup + 5 measurement iterations, back-to-back on same machine. Owner-thread benchmarks use @Threads(1), cross-thread uses @Threads(8).

JDK 21 (Zulu 21.0.1) — biased locking removed

Benchmark Baseline Optimized Unit Change
fullLifecycle_tenTags 2.288 ± 0.111 2.559 ± 0.091 ops/µs +11.8%
setStringTag_ownerThread 0.032 ± 0.002 0.037 ± 0.004 ops/ns +15.6%
setIntTag_ownerThread 0.030 ± 0.004 0.038 ± 0.017 ops/ns +26.7%
setTenTags_ownerThread 0.006 ± 0.001 0.008 ± 0.001 ops/ns +33.3%
setStringTag_crossThread (8T) 0.019 ± 0.003 0.017 ± 0.002 ops/ns ~neutral (within error)

JDK 8 (Zulu 8u372) — biased locking enabled

Benchmark Baseline Optimized Unit Change
fullLifecycle_tenTags 2.043 ± 0.071 2.132 ± 0.126 ops/µs +4.4%
setStringTag_ownerThread 0.026 ± 0.002 0.029 ± 0.002 ops/ns +11.5%
setIntTag_ownerThread 0.040 ± 0.016 0.035 ± 0.009 ops/ns ~neutral (within error)
setTenTags_ownerThread 0.007 ± 0.001 0.007 ± 0.001 ops/ns ~neutral
setStringTag_crossThread (8T) 0.022 ± 0.004 0.022 ± 0.005 ops/ns ~neutral (no regression)

Analysis

On JDK 21 (where biased locking was removed in JDK 15), uncontended synchronized is more expensive and the optimization shows clear gains: +12-33% across all owner-thread benchmarks. The full lifecycle benchmark (create + 10 tags + finish) shows +11.8% throughput improvement.

On JDK 8 (biased locking enabled), the JVM already optimizes uncontended locks, so gains are modest (+4-12%).

Cross-thread path: no regression on either JDK. The slow path (non-owner threads) takes the lock just like before, with one additional volatile read of tagWriteState.

Additional Notes

  • synchronized(unsafeTags) is kept on reader paths: processTagsAndBaggage, earlyProcessTags, getTags, toString
  • The TOCTOU race window for cross-thread writes is accepted — same best-effort as today, the existing codebase comments say "tags will rarely, if ever, be read and modified concurrently"
  • Added DDSpanContextConcurrencyTest (9 JUnit 5 tests including 3 targeted cross-thread stress tests) and SpanTagBenchmark (5 JMH benchmarks)

Contributor Checklist

Note: Once your PR is ready to merge, add it to the merge queue by commenting /merge. /merge -c cancels the queue request. /merge -f --reason "reason" skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.

Spans are almost always written by a single thread, so the lock on every
setTag/setMetric call is uncontended overhead. This adds a volatile
tagWriteState check: if the current thread is the span's creating thread
(STATE_OWNER), tag writes skip the lock entirely. Non-owner threads and
post-finish writes take the lock and sticky-transition to STATE_SHARED.

Long-running spans disable the optimization at construction since the
writer thread may read tags on unfinished spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bm1549 bm1549 added type: enhancement Enhancements and improvements comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM labels Apr 10, 2026
@bm1549 bm1549 changed the title Skip synchronized(unsafeTags) on owner-thread tag writes WIP - DO NOT REVIEW: Skip synchronized(unsafeTags) on owner-thread tag writes Apr 10, 2026
@bm1549 bm1549 marked this pull request as ready for review April 10, 2026 19:05
@bm1549 bm1549 requested a review from a team as a code owner April 10, 2026 19:05
@bm1549 bm1549 requested a review from mhlidd April 10, 2026 19:05
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Apr 10, 2026

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/thread-owned-tags
git_commit_date 1775834061 1775857722
git_commit_sha 5ab378f d664ab2
release_version 1.62.0-SNAPSHOT~5ab378f780 1.62.0-SNAPSHOT~d664ab2db5
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1775859540 1775859540
ci_job_id 1587287876 1587287876
ci_pipeline_id 107208604 107208604
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-ii911gvi 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-ii911gvi 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 13 unstable metrics.

Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1059848
Total [baseline] (8.839 s) : 0, 8839209
Agent [candidate] (1.074 s) : 0, 1073851
Total [candidate] (8.886 s) : 0, 8885932
section iast
Agent [baseline] (1.223 s) : 0, 1223212
Total [baseline] (9.586 s) : 0, 9585743
Agent [candidate] (1.233 s) : 0, 1232543
Total [candidate] (9.575 s) : 0, 9574588
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.06 s -
Agent iast 1.223 s 163.364 ms (15.4%)
Total tracing 8.839 s -
Total iast 9.586 s 746.534 ms (8.4%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.074 s -
Agent iast 1.233 s 158.692 ms (14.8%)
Total tracing 8.886 s -
Total iast 9.575 s 688.656 ms (7.7%)
gantt
    title insecure-bank - break down per module: candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.256 ms) : 0, 1256
crashtracking [candidate] (1.243 ms) : 0, 1243
BytebuddyAgent [baseline] (636.08 ms) : 0, 636080
BytebuddyAgent [candidate] (642.295 ms) : 0, 642295
AgentMeter [baseline] (29.748 ms) : 0, 29748
AgentMeter [candidate] (29.797 ms) : 0, 29797
GlobalTracer [baseline] (249.211 ms) : 0, 249211
GlobalTracer [candidate] (253.92 ms) : 0, 253920
AppSec [baseline] (31.95 ms) : 0, 31950
AppSec [candidate] (32.769 ms) : 0, 32769
Debugger [baseline] (59.284 ms) : 0, 59284
Debugger [candidate] (60.663 ms) : 0, 60663
Remote Config [baseline] (596.164 µs) : 0, 596
Remote Config [candidate] (623.001 µs) : 0, 623
Telemetry [baseline] (8.08 ms) : 0, 8080
Telemetry [candidate] (8.321 ms) : 0, 8321
Flare Poller [baseline] (7.392 ms) : 0, 7392
Flare Poller [candidate] (7.446 ms) : 0, 7446
section iast
crashtracking [baseline] (1.238 ms) : 0, 1238
crashtracking [candidate] (1.252 ms) : 0, 1252
BytebuddyAgent [baseline] (800.362 ms) : 0, 800362
BytebuddyAgent [candidate] (806.847 ms) : 0, 806847
AgentMeter [baseline] (11.344 ms) : 0, 11344
AgentMeter [candidate] (11.435 ms) : 0, 11435
GlobalTracer [baseline] (239.127 ms) : 0, 239127
GlobalTracer [candidate] (241.228 ms) : 0, 241228
AppSec [baseline] (31.811 ms) : 0, 31811
AppSec [candidate] (30.336 ms) : 0, 30336
Debugger [baseline] (58.642 ms) : 0, 58642
Debugger [candidate] (61.841 ms) : 0, 61841
Remote Config [baseline] (1.15 ms) : 0, 1150
Remote Config [candidate] (533.293 µs) : 0, 533
Telemetry [baseline] (13.635 ms) : 0, 13635
Telemetry [candidate] (12.972 ms) : 0, 12972
Flare Poller [baseline] (3.502 ms) : 0, 3502
Flare Poller [candidate] (3.485 ms) : 0, 3485
IAST [baseline] (25.892 ms) : 0, 25892
IAST [candidate] (25.875 ms) : 0, 25875
Loading
Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.06 s) : 0, 1059893
Total [baseline] (11.074 s) : 0, 11073783
Agent [candidate] (1.063 s) : 0, 1063158
Total [candidate] (11.144 s) : 0, 11143810
section appsec
Agent [baseline] (1.251 s) : 0, 1250721
Total [baseline] (11.118 s) : 0, 11118258
Agent [candidate] (1.257 s) : 0, 1256515
Total [candidate] (11.117 s) : 0, 11116930
section iast
Agent [baseline] (1.225 s) : 0, 1225099
Total [baseline] (10.552 s) : 0, 10552399
Agent [candidate] (1.246 s) : 0, 1246267
Total [candidate] (11.331 s) : 0, 11330949
section profiling
Agent [baseline] (1.188 s) : 0, 1187723
Total [baseline] (11.222 s) : 0, 11222371
Agent [candidate] (1.206 s) : 0, 1205913
Total [candidate] (11.209 s) : 0, 11209296
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.06 s -
Agent appsec 1.251 s 190.828 ms (18.0%)
Agent iast 1.225 s 165.205 ms (15.6%)
Agent profiling 1.188 s 127.829 ms (12.1%)
Total tracing 11.074 s -
Total appsec 11.118 s 44.475 ms (0.4%)
Total iast 10.552 s -521.384 ms (-4.7%)
Total profiling 11.222 s 148.588 ms (1.3%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.063 s -
Agent appsec 1.257 s 193.357 ms (18.2%)
Agent iast 1.246 s 183.109 ms (17.2%)
Agent profiling 1.206 s 142.756 ms (13.4%)
Total tracing 11.144 s -
Total appsec 11.117 s -26.881 ms (-0.2%)
Total iast 11.331 s 187.139 ms (1.7%)
Total profiling 11.209 s 65.486 ms (0.6%)
gantt
    title petclinic - break down per module: candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.234 ms) : 0, 1234
crashtracking [candidate] (1.245 ms) : 0, 1245
BytebuddyAgent [baseline] (634.079 ms) : 0, 634079
BytebuddyAgent [candidate] (635.92 ms) : 0, 635920
AgentMeter [baseline] (29.532 ms) : 0, 29532
AgentMeter [candidate] (29.54 ms) : 0, 29540
GlobalTracer [baseline] (249.946 ms) : 0, 249946
GlobalTracer [candidate] (250.928 ms) : 0, 250928
AppSec [baseline] (32.104 ms) : 0, 32104
AppSec [candidate] (32.12 ms) : 0, 32120
Debugger [baseline] (60.432 ms) : 0, 60432
Debugger [candidate] (60.621 ms) : 0, 60621
Remote Config [baseline] (601.648 µs) : 0, 602
Remote Config [candidate] (608.878 µs) : 0, 609
Telemetry [baseline] (8.15 ms) : 0, 8150
Telemetry [candidate] (8.162 ms) : 0, 8162
Flare Poller [baseline] (7.516 ms) : 0, 7516
Flare Poller [candidate] (7.533 ms) : 0, 7533
section appsec
crashtracking [baseline] (1.224 ms) : 0, 1224
crashtracking [candidate] (1.255 ms) : 0, 1255
BytebuddyAgent [baseline] (663.323 ms) : 0, 663323
BytebuddyAgent [candidate] (665.984 ms) : 0, 665984
AgentMeter [baseline] (12.105 ms) : 0, 12105
AgentMeter [candidate] (12.107 ms) : 0, 12107
GlobalTracer [baseline] (249.837 ms) : 0, 249837
GlobalTracer [candidate] (250.849 ms) : 0, 250849
IAST [baseline] (24.526 ms) : 0, 24526
IAST [candidate] (24.792 ms) : 0, 24792
AppSec [baseline] (184.685 ms) : 0, 184685
AppSec [candidate] (185.503 ms) : 0, 185503
Debugger [baseline] (65.841 ms) : 0, 65841
Debugger [candidate] (66.382 ms) : 0, 66382
Remote Config [baseline] (616.178 µs) : 0, 616
Remote Config [candidate] (605.866 µs) : 0, 606
Telemetry [baseline] (8.593 ms) : 0, 8593
Telemetry [candidate] (8.665 ms) : 0, 8665
Flare Poller [baseline] (3.571 ms) : 0, 3571
Flare Poller [candidate] (3.594 ms) : 0, 3594
section iast
crashtracking [baseline] (1.226 ms) : 0, 1226
crashtracking [candidate] (1.256 ms) : 0, 1256
BytebuddyAgent [baseline] (800.481 ms) : 0, 800481
BytebuddyAgent [candidate] (816.191 ms) : 0, 816191
AgentMeter [baseline] (11.399 ms) : 0, 11399
AgentMeter [candidate] (11.663 ms) : 0, 11663
GlobalTracer [baseline] (239.483 ms) : 0, 239483
GlobalTracer [candidate] (243.538 ms) : 0, 243538
IAST [baseline] (25.857 ms) : 0, 25857
IAST [candidate] (26.162 ms) : 0, 26162
AppSec [baseline] (29.043 ms) : 0, 29043
AppSec [candidate] (32.101 ms) : 0, 32101
Debugger [baseline] (63.822 ms) : 0, 63822
Debugger [candidate] (61.43 ms) : 0, 61430
Remote Config [baseline] (1.197 ms) : 0, 1197
Remote Config [candidate] (533.065 µs) : 0, 533
Telemetry [baseline] (12.679 ms) : 0, 12679
Telemetry [candidate] (13.116 ms) : 0, 13116
Flare Poller [baseline] (3.584 ms) : 0, 3584
Flare Poller [candidate] (3.484 ms) : 0, 3484
section profiling
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.232 ms) : 0, 1232
BytebuddyAgent [baseline] (692.325 ms) : 0, 692325
BytebuddyAgent [candidate] (704.741 ms) : 0, 704741
AgentMeter [baseline] (9.128 ms) : 0, 9128
AgentMeter [candidate] (9.302 ms) : 0, 9302
GlobalTracer [baseline] (207.897 ms) : 0, 207897
GlobalTracer [candidate] (210.259 ms) : 0, 210259
AppSec [baseline] (32.605 ms) : 0, 32605
AppSec [candidate] (33.124 ms) : 0, 33124
Debugger [baseline] (66.028 ms) : 0, 66028
Debugger [candidate] (66.832 ms) : 0, 66832
Remote Config [baseline] (570.605 µs) : 0, 571
Remote Config [candidate] (588.57 µs) : 0, 589
Telemetry [baseline] (7.822 ms) : 0, 7822
Telemetry [candidate] (7.974 ms) : 0, 7974
Flare Poller [baseline] (3.562 ms) : 0, 3562
Flare Poller [candidate] (3.571 ms) : 0, 3571
ProfilingAgent [baseline] (95.25 ms) : 0, 95250
ProfilingAgent [candidate] (95.56 ms) : 0, 95560
Profiling [baseline] (95.828 ms) : 0, 95828
Profiling [candidate] (96.135 ms) : 0, 96135
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/thread-owned-tags
git_commit_date 1775834061 1775857722
git_commit_sha 5ab378f d664ab2
release_version 1.62.0-SNAPSHOT~5ab378f780 1.62.0-SNAPSHOT~d664ab2db5
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1775860181 1775860181
ci_job_id 1587287877 1587287877
ci_pipeline_id 107208604 107208604
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-0-hp15gq3q 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-0-hp15gq3q 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 1 performance regressions! Performance is the same for 20 metrics, 15 unstable metrics.

scenario Δ mean agg_http_req_duration_p50 Δ mean agg_http_req_duration_p95 Δ mean throughput candidate mean agg_http_req_duration_p50 candidate mean agg_http_req_duration_p95 candidate mean throughput baseline mean agg_http_req_duration_p50 baseline mean agg_http_req_duration_p95 baseline mean throughput
scenario:load:petclinic:iast:high_load worse
[+347.923µs; +706.338µs] or [+2.018%; +4.098%]
same
[-65.515µs; +1131.133µs] or [-0.229%; +3.960%]
unstable
[-33.380op/s; +23.068op/s] or [-12.677%; +8.761%]
17.765ms 29.099ms 258.156op/s 17.238ms 28.566ms 263.312op/s
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780
    dateFormat X
    axisFormat %s
section baseline
no_agent (19.251 ms) : 19054, 19448
.   : milestone, 19251,
appsec (18.682 ms) : 18493, 18872
.   : milestone, 18682,
code_origins (18.426 ms) : 18243, 18608
.   : milestone, 18426,
iast (17.721 ms) : 17545, 17897
.   : milestone, 17721,
profiling (19.649 ms) : 19448, 19851
.   : milestone, 19649,
tracing (18.015 ms) : 17838, 18191
.   : milestone, 18015,
section candidate
no_agent (19.144 ms) : 18948, 19340
.   : milestone, 19144,
appsec (18.649 ms) : 18460, 18837
.   : milestone, 18649,
code_origins (18.188 ms) : 18009, 18368
.   : milestone, 18188,
iast (18.079 ms) : 17897, 18261
.   : milestone, 18079,
profiling (19.298 ms) : 19105, 19490
.   : milestone, 19298,
tracing (18.076 ms) : 17896, 18256
.   : milestone, 18076,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.251 ms [19.054 ms, 19.448 ms] -
appsec 18.682 ms [18.493 ms, 18.872 ms] -568.459 µs (-3.0%)
code_origins 18.426 ms [18.243 ms, 18.608 ms] -825.136 µs (-4.3%)
iast 17.721 ms [17.545 ms, 17.897 ms] -1.529 ms (-7.9%)
profiling 19.649 ms [19.448 ms, 19.851 ms] 398.282 µs (2.1%)
tracing 18.015 ms [17.838 ms, 18.191 ms] -1.236 ms (-6.4%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 19.144 ms [18.948 ms, 19.34 ms] -
appsec 18.649 ms [18.46 ms, 18.837 ms] -495.591 µs (-2.6%)
code_origins 18.188 ms [18.009 ms, 18.368 ms] -956.01 µs (-5.0%)
iast 18.079 ms [17.897 ms, 18.261 ms] -1.065 ms (-5.6%)
profiling 19.298 ms [19.105 ms, 19.49 ms] 153.274 µs (0.8%)
tracing 18.076 ms [17.896 ms, 18.256 ms] -1.068 ms (-5.6%)
Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.243 ms) : 1231, 1255
.   : milestone, 1243,
iast (3.289 ms) : 3247, 3332
.   : milestone, 3289,
iast_FULL (6.228 ms) : 6164, 6293
.   : milestone, 6228,
iast_GLOBAL (3.574 ms) : 3520, 3627
.   : milestone, 3574,
profiling (2.459 ms) : 2434, 2483
.   : milestone, 2459,
tracing (1.934 ms) : 1918, 1950
.   : milestone, 1934,
section candidate
no_agent (1.228 ms) : 1216, 1240
.   : milestone, 1228,
iast (3.208 ms) : 3167, 3249
.   : milestone, 3208,
iast_FULL (5.991 ms) : 5930, 6051
.   : milestone, 5991,
iast_GLOBAL (3.611 ms) : 3552, 3670
.   : milestone, 3611,
profiling (2.379 ms) : 2354, 2405
.   : milestone, 2379,
tracing (1.878 ms) : 1862, 1894
.   : milestone, 1878,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.243 ms [1.231 ms, 1.255 ms] -
iast 3.289 ms [3.247 ms, 3.332 ms] 2.046 ms (164.6%)
iast_FULL 6.228 ms [6.164 ms, 6.293 ms] 4.985 ms (401.0%)
iast_GLOBAL 3.574 ms [3.52 ms, 3.627 ms] 2.33 ms (187.5%)
profiling 2.459 ms [2.434 ms, 2.483 ms] 1.216 ms (97.8%)
tracing 1.934 ms [1.918 ms, 1.95 ms] 690.794 µs (55.6%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.228 ms [1.216 ms, 1.24 ms] -
iast 3.208 ms [3.167 ms, 3.249 ms] 1.98 ms (161.3%)
iast_FULL 5.991 ms [5.93 ms, 6.051 ms] 4.763 ms (387.9%)
iast_GLOBAL 3.611 ms [3.552 ms, 3.67 ms] 2.383 ms (194.1%)
profiling 2.379 ms [2.354 ms, 2.405 ms] 1.151 ms (93.8%)
tracing 1.878 ms [1.862 ms, 1.894 ms] 650.558 µs (53.0%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master brian.marks/thread-owned-tags
git_commit_date 1775834061 1775857722
git_commit_sha 5ab378f d664ab2
release_version 1.62.0-SNAPSHOT~5ab378f780 1.62.0-SNAPSHOT~d664ab2db5
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1775859911 1775859911
ci_job_id 1587287878 1587287878
ci_pipeline_id 107208604 107208604
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version Linux runner-zfyrx7zua-project-304-concurrent-1-7decuatt 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Linux runner-zfyrx7zua-project-304-concurrent-1-7decuatt 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780
    dateFormat X
    axisFormat %s
section baseline
no_agent (15.317 s) : 15317000, 15317000
.   : milestone, 15317000,
appsec (14.968 s) : 14968000, 14968000
.   : milestone, 14968000,
iast (17.984 s) : 17984000, 17984000
.   : milestone, 17984000,
iast_GLOBAL (18.113 s) : 18113000, 18113000
.   : milestone, 18113000,
profiling (15.227 s) : 15227000, 15227000
.   : milestone, 15227000,
tracing (15.062 s) : 15062000, 15062000
.   : milestone, 15062000,
section candidate
no_agent (15.514 s) : 15514000, 15514000
.   : milestone, 15514000,
appsec (14.874 s) : 14874000, 14874000
.   : milestone, 14874000,
iast (18.093 s) : 18093000, 18093000
.   : milestone, 18093000,
iast_GLOBAL (17.97 s) : 17970000, 17970000
.   : milestone, 17970000,
profiling (14.798 s) : 14798000, 14798000
.   : milestone, 14798000,
tracing (14.987 s) : 14987000, 14987000
.   : milestone, 14987000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.317 s [15.317 s, 15.317 s] -
appsec 14.968 s [14.968 s, 14.968 s] -349.0 ms (-2.3%)
iast 17.984 s [17.984 s, 17.984 s] 2.667 s (17.4%)
iast_GLOBAL 18.113 s [18.113 s, 18.113 s] 2.796 s (18.3%)
profiling 15.227 s [15.227 s, 15.227 s] -90.0 ms (-0.6%)
tracing 15.062 s [15.062 s, 15.062 s] -255.0 ms (-1.7%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.514 s [15.514 s, 15.514 s] -
appsec 14.874 s [14.874 s, 14.874 s] -640.0 ms (-4.1%)
iast 18.093 s [18.093 s, 18.093 s] 2.579 s (16.6%)
iast_GLOBAL 17.97 s [17.97 s, 17.97 s] 2.456 s (15.8%)
profiling 14.798 s [14.798 s, 14.798 s] -716.0 ms (-4.6%)
tracing 14.987 s [14.987 s, 14.987 s] -527.0 ms (-3.4%)
Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.62.0-SNAPSHOT~d664ab2db5, baseline=1.62.0-SNAPSHOT~5ab378f780
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.489 ms) : 1477, 1500
.   : milestone, 1489,
appsec (3.71 ms) : 3498, 3923
.   : milestone, 3710,
iast (2.283 ms) : 2214, 2352
.   : milestone, 2283,
iast_GLOBAL (2.328 ms) : 2258, 2398
.   : milestone, 2328,
profiling (2.103 ms) : 2048, 2158
.   : milestone, 2103,
tracing (2.096 ms) : 2042, 2150
.   : milestone, 2096,
section candidate
no_agent (1.49 ms) : 1479, 1502
.   : milestone, 1490,
appsec (3.85 ms) : 3628, 4072
.   : milestone, 3850,
iast (2.278 ms) : 2209, 2348
.   : milestone, 2278,
iast_GLOBAL (2.332 ms) : 2262, 2402
.   : milestone, 2332,
profiling (2.099 ms) : 2044, 2154
.   : milestone, 2099,
tracing (2.081 ms) : 2027, 2135
.   : milestone, 2081,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.489 ms [1.477 ms, 1.5 ms] -
appsec 3.71 ms [3.498 ms, 3.923 ms] 2.222 ms (149.2%)
iast 2.283 ms [2.214 ms, 2.352 ms] 794.186 µs (53.4%)
iast_GLOBAL 2.328 ms [2.258 ms, 2.398 ms] 839.36 µs (56.4%)
profiling 2.103 ms [2.048 ms, 2.158 ms] 614.33 µs (41.3%)
tracing 2.096 ms [2.042 ms, 2.15 ms] 607.327 µs (40.8%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.49 ms [1.479 ms, 1.502 ms] -
appsec 3.85 ms [3.628 ms, 4.072 ms] 2.36 ms (158.4%)
iast 2.278 ms [2.209 ms, 2.348 ms] 788.051 µs (52.9%)
iast_GLOBAL 2.332 ms [2.262 ms, 2.402 ms] 842.052 µs (56.5%)
profiling 2.099 ms [2.044 ms, 2.154 ms] 608.738 µs (40.8%)
tracing 2.081 ms [2.027 ms, 2.135 ms] 590.491 µs (39.6%)

@bm1549 bm1549 marked this pull request as draft April 10, 2026 19:23
@bm1549 bm1549 changed the title WIP - DO NOT REVIEW: Skip synchronized(unsafeTags) on owner-thread tag writes Skip synchronized(unsafeTags) on owner-thread tag writes Apr 10, 2026
Add three targeted concurrency tests that exercise the exact cross-thread
tag write pattern the JMH crossThread benchmark was measuring:
- crossThreadSustainedNoCrash: 8 threads × 10k setTag on same span
- ownerToSharedTransition: owner writes first, then 8 threads join
- manySpansCrossThread: 10k short-lived spans tagged from 8 threads

All pass, proving the production code handles cross-thread writes without
NPE or structural corruption.

Fix the crossThread benchmark: change SharedSpan @setup from
Level.Invocation to Level.Iteration. With Level.Invocation, 8 threads
raced to call setup() concurrently, causing NPE when state.span was
transiently null between invocations — a benchmark harness bug, not a
production code bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

// Disable thread-owned tag optimization for long-running spans because the writer thread
// can call processTagsAndBaggage on unfinished spans, creating concurrent access.
if (traceCollector.longRunningSpansEnabled()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this is going to be hard to maintain.
I can easily see someone forgetting to transition the state.


public void setSpanSamplingPriority(double rate, int limit) {
synchronized (unsafeTags) {
if (isOwnerThread()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this, we need to encapsulate this better.
I think we can make a "accessTags(lambda)" that encapsulates whether or not to use synchronization.

unsafeTags.putAll(map);
}
} else {
synchronized (unsafeTags) {
Copy link
Copy Markdown
Contributor

@dougqh dougqh Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my preference would to be push this logic into OptimizedTagMap -- along with adding a lot more tests to verify the multi-thread safety.

I don't want most developers having to interact when adding a new feature to the core.

The tricky part is what to do about the LegacyTagMap, but given that OptimizedTagMap has been on-by-default for over a month now, maybe we can drop the LegacyTagMap.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a half measure where we create helper routines in DDSpanContext instead, that would be easier change to make.

import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

public class DDSpanContextConcurrencyTest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to see something more comprehensive here.
TagMap has multi-threaded tests and fuzzing tests, we can build on that to make something more thorough.

@Warmup(iterations = 5)
@Measurement(iterations = 5)
@Fork(2)
public class SpanTagBenchmark {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see this run with @Threads(8).

The point of using multiple isn't just for thread safety. It is to measure the macro impact of locking, allocation, etc on multiple threads working concurrently.

A benchmark can perform well in a single threaded situation, but poorly in a multiple thread situation. Given that customers are frequently running multi-thread web servers, checking the multi-threaded case is important.

// we transition to STATE_SHARED and all subsequent accesses take the lock.
private static final int STATE_OWNER = 0;
private static final int STATE_SHARED = 1;
private volatile int tagWriteState = STATE_OWNER;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that we can simplify this a bit.
We can introduce an ownerThread variable that is initially set to the creating Thread, but switched to null once when the state transition occurs.

}

private boolean isOwnerThread() {
return tagWriteState == STATE_OWNER && Thread.currentThread().getId() == threadId;
Copy link
Copy Markdown
Contributor

@dougqh dougqh Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking the getId check isn't safe because threadId can be overridden by a child class

A slightly option would be ThreadUtils which has a helper to get the new ID that cannot be overridden by a child class

But better yet, I think we could just compare the thread via a reference check
That works well with my suggestion to just store the thread and transition it to null, then the ownership check simply becomes...

Thread ownerThread = this.ownerThread;
if ( ownerThread != null && ownerThread == Thread.currentThread() )

NOTE: I'm keep the null check to avoid calling Thread.currentThread.
Although, that may not be better given Thread.currentThread is probably an intrinsic in most JVMs.

return unsafeTags.getObject(key);
} else {
synchronized (unsafeTags) {
tagWriteState = STATE_SHARED;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location of the write of tagWriteState isn't great.
After the initial state transition, setting again will dirty the cache line unnecessarily.

Again, I think it makes more sense to create a helper method that encapsulate the ownership, state transition, and locking.

tagGet will be slightly annoying to handle because the return value, but fortunately, we don't have many places that are returning primitives (yet).

Copy link
Copy Markdown
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at throughput numbers, there's a modest gain that might make this change worth the complexity if we can simplify it a bit.

The two main changes that I think we should try are...

  • replacing the two volatiles with a single volatile for the owningThread
  • introducing high-order helper functions that hide the locking complexity

Then all the duplicate code can go away, and the accessing code becomes something like...
accessTags(unsafeTags -> {
...
});
We just need to make sure not to introduce a capturing lambda, so we don't incur unneeded allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants