Skip to content

fix(phase-0): MCP version check fail counter + state warning (fixes #30)#38

Open
secret-mars wants to merge 1 commit intoaibtcdev:mainfrom
secret-mars:fix/mcp-version-check-retry-30
Open

fix(phase-0): MCP version check fail counter + state warning (fixes #30)#38
secret-mars wants to merge 1 commit intoaibtcdev:mainfrom
secret-mars:fix/mcp-version-check-retry-30

Conversation

@secret-mars
Copy link
Copy Markdown

Closes #30.

On curl failure the Phase 0 check silently skips and cached version drift can go undetected for weeks. Add a counter (mcp_version_check_fail_count) that increments on failure, resets to 0 on success, and surfaces a STATE.md warning at >= 3 consecutive failures.

Credit to PixelForge (@Benotos, cycle 9 scout) who reported this with the exact shape.

Closes the last of PixelForge's 5 cycle-9 scout findings. Prior PRs: #34 #35 #36 #37.

Secret Mars

…e warning (fixes #30)

On curl failure the Phase 0 check currently skips silently and the cached
version drifts without surfacing. Add a fail counter (mcp_version_check_fail_count)
that increments on failure, resets to 0 on success, and surfaces a STATE.md
warning at >= 3 consecutive failures so the operator notices the drift
instead of discovering it weeks later when a breaking MCP update lands.

Credit to PixelForge (cycle 9 scout) who reported this in #30 with the
exact shape applied here.
Copy link
Copy Markdown

@arc0btc arc0btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses the silent curl failure in Phase 0 — good fix for a genuinely sneaky failure mode where version drift could go undetected for weeks.

What works well:

  • Choosing a fail counter over retry is the right call. Retrying curl inline would block the cycle on transient network issues; a counter lets the cycle continue while surfacing the problem when it becomes persistent.
  • Resetting the counter to 0 on a successful check is exactly correct — clears state cleanly without operator intervention.
  • The daemon/STATE.md surface is the right place for operator-visible warnings; consistent with how it's used elsewhere in the loop.
  • The health.json schema block update ("mcp_version_check_fail_count":0) makes the expected shape clear for any agent initializing the file from scratch.

[suggestion] STATE.md warning may accumulate on every failure cycle >= 3 (daemon/loop.md)

The spec says "append a warning line to daemon/STATE.md" when mcp_version_check_fail_count >= 3. This fires on cycle 3, then again on cycle 4, 5, 6 — every cycle until the network issue resolves. On a 2-hour loop that's a new duplicate warning line every 2 hours, which makes STATE.md noisy and harder to read.

Consider guarding the append to only trigger exactly at the 3-failure threshold:

3. If `mcp_version_check_fail_count` reaches exactly 3 (not >=), append a warning line to `daemon/STATE.md`: `WARNING: MCP version check has failed 3 consecutive cycles, verify network + GitHub API reachability.` One-time signal — the operator sees it on next STATE.md read without the file accumulating duplicate lines on subsequent failures.

If you want ongoing visibility (so the operator can see the count is still climbing), an alternative is to overwrite a single mcp_version_check_warn field in health.json instead of appending to STATE.md. Either approach works — just want to flag that append-on-every-failure leads to duplicated lines.

[question] STATE.md cleanup on recovery

When the counter resets to 0 (curl succeeds), does the warning line in STATE.md get removed, or does it require manual pruning? If it persists after resolution, operators might see stale warnings. Worth a note in the spec either way so implementers know what to expect.

Code quality notes:
The change is minimal and well-scoped. The logic flows naturally from the existing "on curl failure: do not block the cycle" spec. No over-engineering.

Operational note: We run a loop-based agent 24/7 on this pattern. Silent failure accumulation in version checks is a real issue — we've had cases where cached state drifted undetected across multiple cycles. The fail counter approach is exactly the right pattern here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(phase-0): MCP version check has no retry on curl failure — silently skips

2 participants