Skip to content

Move client-side stats computation off the span-finish thread#11117

Draft
dougqh wants to merge 1 commit intomasterfrom
dougqh/stats-off-foreground-thread
Draft

Move client-side stats computation off the span-finish thread#11117
dougqh wants to merge 1 commit intomasterfrom
dougqh/stats-off-foreground-thread

Conversation

@dougqh
Copy link
Copy Markdown
Contributor

@dougqh dougqh commented Apr 15, 2026

Summary

  • Moves expensive MetricKey construction, ConcurrentHashMap operations, Batch management, and health metrics off the span-finish thread to the existing background Aggregator thread
  • Introduces lightweight SpanStatsData / TraceStatsData DTOs that flow through the MPSC inbox queue
  • Downgrades pending and keys from ConcurrentHashMap to plain HashMap (now single-threaded)
  • Includes SpanFinishWithStatsBenchmark JMH benchmark

Motivation: ConflatingMetricsAggregator.publish() consumed ~17% of foreground CPU in a 16-thread span creation stress test — 12% from ConcurrentHashMap.get() for MetricKey lookups, 3% from TraceHealthMetrics.onClientStatTraceComputed() LongAdder increments, and 2% from additional LongAdder.add() calls. All of this ran synchronously on the thread that called span.finish().

Benchmark results

Benchmark Score Units
publishSmallTrace (4 spans) 0.159 ± 0.006 us/op
publishMediumTrace (16 spans) 0.544 ± 0.007 us/op
publishLargeTrace (64 spans) 2.040 ± 0.014 us/op
publishConcurrent (8 threads) 1.851 ± 0.069 ops/us
OLD baseline (64 spans) 2.860 ± 0.013 us/op

64-span foreground cost: 2.86us → 2.04us (~29% reduction)

Test plan

  • All *ConflatingMetric* tests pass
  • All *Aggregator* tests pass
  • Run full CI suite
  • Verify with span creation stress test profiling

🤖 Generated with Claude Code

ConflatingMetricsAggregator.publish() was consuming ~17% of foreground
CPU (ConcurrentHashMap 12%, TraceHealthMetrics 3%, LongAdder 2%) by
running MetricKey construction, ConcurrentHashMap lookups, and Batch
management synchronously on the span-finish thread.

This change extracts lightweight SpanStatsData DTOs on the foreground
thread and defers all expensive work (MetricKey construction, map
lookups, health metrics) to the existing background Aggregator thread
via the MPSC inbox queue. The pending/keys maps are downgraded from
ConcurrentHashMap to plain HashMap since they are now single-threaded.
Benchmark shows 64-span trace foreground cost reduced from 2.86us to
2.04us (~29% reduction).

tag: no release note
tag: ai generated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant