Skip to content

fix: increment reconnection attempt counter on clean stream disconnect#2421

Open
manjunathgujjar wants to merge 2 commits intomodelcontextprotocol:mainfrom
manjunathgujjar:fix/streamable-http-reconnect-counter
Open

fix: increment reconnection attempt counter on clean stream disconnect#2421
manjunathgujjar wants to merge 2 commits intomodelcontextprotocol:mainfrom
manjunathgujjar:fix/streamable-http-reconnect-counter

Conversation

@manjunathgujjar
Copy link
Copy Markdown

Problem

_handle_reconnection() in streamable_http.py resets the attempt counter to 0 when a reconnection succeeds at the HTTP level but the stream closes without delivering a complete response (line 494 before this fix). This makes MAX_RECONNECTION_ATTEMPTS ineffective: only the exception path (network error) incremented the counter. A server that repeatedly accepts connections but drops them before sending a final response causes the client to retry forever.

This was reported in #2393 where production jobs hung for hours in a reconnection loop.

Root Cause

# Before: clean-close recursion resets attempt to 0
await self._handle_reconnection(ctx, reconnect_last_event_id, reconnect_retry_ms, 0)

# Exception path correctly increments:
await self._handle_reconnection(ctx, last_event_id, retry_interval_ms, attempt + 1)

The asymmetry means MAX_RECONNECTION_ATTEMPTS only guards against consecutive HTTP-level failures, not against total reconnection attempts.

Fix

Pass attempt + 1 on clean-close recursion, matching the exception path. MAX_RECONNECTION_ATTEMPTS is also raised from 2 to 5 to give more headroom for legitimate transient disconnects while still bounding the retry loop.

The # pragma: no cover on the MAX guard is also removed since the new test now exercises that branch.

Test

Added test_reconnection_attempt_counter_increments_on_clean_disconnect to tests/shared/test_streamable_http.py. The test spies on _handle_reconnection to record the attempt values seen, patches MAX_RECONNECTION_ATTEMPTS=2, then calls a tool that closes the SSE stream multiple times without completing. With the fix, the spy records [0, 1, 2] — the counter increments until the limit is hit. Without the fix it would record [0, 0, 0, ...] indefinitely.

_handle_reconnection() was resetting the attempt counter to 0 whenever
a reconnection succeeded at the HTTP level but the stream closed without
delivering a complete response. This made MAX_RECONNECTION_ATTEMPTS
ineffective: a server that repeatedly accepts then drops SSE connections
would loop forever, hanging the caller indefinitely.

The fix passes attempt+1 on clean-close recursion, matching the behaviour
of the exception path. MAX_RECONNECTION_ATTEMPTS now bounds total attempts
regardless of whether individual connection attempts succeed at the HTTP
level. MAX_RECONNECTION_ATTEMPTS is also raised from 2 to 5 to give
legitimate transient disconnects more headroom.

Also removes the pragma: no cover on the MAX_RECONNECTION_ATTEMPTS guard
now that the new test exercises it.

Fixes modelcontextprotocol#2393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant