docs: surface aggregate_survey() and fix stale references by igerber · Pull Request #288 · igerber/diff-diff

igerber · 2026-04-10T20:51:09Z

Summary

aggregate_survey() shipped in v3.0.1 but multiple roadmap and strategy docs still called it survey_aggregate() (the proposed name) and marked it as future work. Practitioner-facing docs (README, getting started, decision tree, choosing estimator, survey-roadmap) made no mention of the helper at all, leaving anyone with BRFSS/ACS/CPS microdata unable to discover the preprocessing step.

Staleness fixes (wrong name + wrong status):

ROADMAP.md: B3d row marked shipped, deleted "Future: Survey Aggregation Helper" section
docs/business-strategy.md: corrected wrong-name references at lines 332 and 376; line 376 strikethrough preserves the historical name

Discoverability additions:

docs/survey-roadmap.md: new "v3.0.1: Survey Aggregation Helper" entry in the canonical survey doc
README.md: new bullet in "For Data Scientists" plus a feature bullet
docs/index.rst: new :func: link in Quick Links; also adds the previously orphaned tutorials/16_survey_did and tutorials/16_wooldridge_etwfe to the Advanced Methods toctree
docs/choosing_estimator.rst: new .. note:: in Survey Design Support pointing at the preprocessing step
docs/practitioner_getting_started.rst: new microdata paragraph plus a runnable code example in "What If You Have Survey Data?"
docs/practitioner_decision_tree.rst: new microdata paragraph in the Survey Data section
docs/doc-deps.yaml: prep.py entry now lists all 5 affected docs, matching the prep_dgp.py pattern

TODO.md catalog updates from sphinx warning audit:

Bumped duplicate-object-description count from ~376 to ~1583
New Medium entry for the 3 failed-import warnings (DiDResults.ci, MultiPeriodDiDResults.att, CallawaySantAnnaResults.aggregate)
New Low entry for the EDiDBootstrapResults cross-reference ambiguity
New Low entry for tracked autosummary stub staleness

Methodology references (required if estimator / math changes)

Method name(s): N/A - no methodology or estimator changes
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes (pure docs PR)
Backtest / simulation / notebook evidence (if applicable): N/A
sphinx-build exit 0; total warnings dropped from 1723 → 1721 (the 2 orphan-tutorial warnings are eliminated)
Zero new sphinx warnings introduced; zero warnings on any edited content
grep survey_aggregate returns no remaining stale function references in user docs (the only matches are an intentional historical strikethrough in business-strategy.md and a test method name test_survey_aggregate_and_summary in tests/test_wooldridge.py that refers to the WooldridgeDiD.aggregate() method, not the helper)

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

aggregate_survey() shipped in v3.0.1 but multiple roadmap and strategy docs still called it survey_aggregate() (the proposed name) and marked it as future work. Practitioner-facing docs (README, getting started, decision tree, choosing estimator, survey-roadmap) made no mention of the helper at all, leaving anyone with BRFSS/ACS/CPS microdata unable to discover the preprocessing step. Staleness fixes (wrong name + wrong status): - ROADMAP.md: B3d row marked shipped, deleted "Future: Survey Aggregation Helper" section - docs/business-strategy.md: corrected wrong-name references in lines 332 and 376; line 376 strikethrough preserves the historical name Discoverability additions: - docs/survey-roadmap.md: new "v3.0.1: Survey Aggregation Helper" entry in the canonical survey doc - README.md: new bullet in "For Data Scientists" plus a feature bullet - docs/index.rst: new :func: link in Quick Links; also adds the previously orphaned tutorials/16_survey_did and tutorials/16_wooldridge_etwfe to the Advanced Methods toctree - docs/choosing_estimator.rst: new .. note:: in Survey Design Support pointing at the preprocessing step - docs/practitioner_getting_started.rst: new microdata paragraph plus a runnable code example in "What If You Have Survey Data?" - docs/practitioner_decision_tree.rst: new microdata paragraph in the Survey Data section Documentation dependency tracking: - docs/doc-deps.yaml: prep.py entry now lists all 5 affected docs, matching the prep_dgp.py pattern TODO.md catalog updates from sphinx warning audit: - Bumped duplicate-object-description count from ~376 to ~1583 - New Medium entry for the 3 failed-import warnings (DiDResults.ci, MultiPeriodDiDResults.att, CallawaySantAnnaResults.aggregate) - New Low entry for the EDiDBootstrapResults cross-reference ambiguity - New Low entry for tracked autosummary stub staleness Validation: - sphinx-build exit 0; warnings dropped from 1723 to 1721 (the 2 orphan tutorial warnings are gone) - Zero new warnings introduced; zero warnings on edited content - grep survey_aggregate returns no remaining stale function references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-10T20:57:04Z

Overall Assessment

⚠️ Needs changes

Executive Summary

P1 The new docs overstate aggregate_survey() compatibility: the helper returns a second-stage SurveyDesign with inverse-variance aweights, but several survey-aware estimators in the support matrix are explicitly pweight-only, so “any estimator” is not accurate.
The underlying methodology wording about domain estimation, SRS fallback, and replicate-weight variance is otherwise consistent with the Methodology Registry and diff_diff.prep.
The roadmap/business-strategy/doc-deps updates are internally consistent, and the newly referenced tutorial/doc files exist.
The TODO.md additions are appropriate deferred-doc-debt tracking and are non-blocking.
I could not independently rerun Sphinx in this sandbox because there is no writable temp/build directory, so build verification here is static-only.

Methodology

P1 Impact: The PR adds multiple universal-compatibility claims for aggregate_survey() even though the helper’s second-stage design is explicitly weight_type="aweight" and the Methodology Registry defines its output weights as inverse-variance aweights, not probability weights. The new wording says users can use it “before fitting any estimator” / “fit any estimator” / with “any of the estimators in the matrix below” in README.md:L82-L85, docs/practitioner_getting_started.rst:L296-L322, and docs/choosing_estimator.rst:L606-L613. But aggregate_survey() returns SurveyDesign(..., weight_type="aweight") in diff_diff/prep.py:L1476-L1479 and diff_diff/prep.py:L1732-L1736, matching docs/methodology/REGISTRY.md:L2313-L2320. The same support matrix immediately below marks many estimators as pweight only in docs/choosing_estimator.rst:L639-L710, and representative implementations reject aweight for survey support in diff_diff/staggered.py:L1476-L1481, diff_diff/stacked_did.py:L249-L260, and diff_diff/wooldridge.py:L92-L97. Affected methods: aggregate_survey() output semantics plus the pweight-only survey estimators (CallawaySantAnna, TripleDifference, StaggeredTripleDifference, StackedDiD, ImputationDiD, TwoStageDiD, SyntheticDiD, TROP, WooldridgeDiD). Concrete fix: replace the universal wording with an aweight-specific caveat, e.g. “ready for estimators whose survey support accepts aweight / is marked Full in the matrix,” and explicitly state that pweight-only estimators are not compatible with the returned stage2 design.
No additional methodology findings. The new description in docs/survey-roadmap.md:L115-L124 is consistent with the domain-estimation and variance description in docs/methodology/REGISTRY.md:L2309-L2324 and diff_diff/prep.py:L1321-L1415.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

No findings. The docs/doc-deps.yaml additions are sensible and point to existing docs in docs/doc-deps.yaml:L548-L560.

Tech Debt

P3 Impact: The new TODO.md entries appropriately track Sphinx warning debt rather than hiding it, which is acceptable under the project’s deferred-work policy. See TODO.md:L91-L96. Concrete fix: none required for this PR.

Security

No findings. Nothing in the diff suggests secrets or PII exposure.

Documentation/Tests

No additional findings beyond the compatibility overclaim above. Static inspection suggests the new toctree targets and linked docs exist, but I could not run sphinx-build in this environment because there is no writable temp/build directory.

Assumption

Assuming this PR is not intended to broaden estimator support beyond the current aweight semantics of aggregate_survey(). If broader compatibility is intended, that would require a real methodology/code change, not just doc wording.

Path to Approval

Replace the “any estimator” language in README.md:L82-L85, docs/practitioner_getting_started.rst:L296-L322, and docs/choosing_estimator.rst:L606-L613 with wording that reflects the returned aweight second-stage design.
In docs/choosing_estimator.rst:L606-L613, explicitly tie aggregate_survey() compatibility to the estimators marked Full in the matrix, or add a one-line warning that the pweight only rows in docs/choosing_estimator.rst:L639-L710 are not compatible with stage2.
Keep the practitioner example aligned with the existing API example in docs/api/prep.rst:L268-L295: show a compatible estimator such as DifferenceInDifferences rather than implying universal estimator support.

Address P1 review feedback on PR #288: aggregate_survey() returns a SurveyDesign with weight_type="aweight", but pweight-only estimators (CS, ImputationDiD, TwoStageDiD, StackedDiD, TripleDifference, StaggeredTripleDifference, SyntheticDiD, TROP, WooldridgeDiD) raise ValueError on aweight. The previous wording told practitioners they could fit "any estimator" with the returned stage2 design, which would crash at fit time. - README.md: explicit list of compatible estimators (Full in matrix) - docs/choosing_estimator.rst: rewrite the .. note:: with the same list and an explicit warning about pweight-only estimators - docs/practitioner_getting_started.rst: rewrite prose + switch the worked example from DifferenceInDifferences to SunAbraham, the only modern staggered estimator marked Full - docs/practitioner_decision_tree.rst: add the same caveat for consistency (not flagged by reviewer but same gap) - TODO.md: new Methodology/Correctness entry tracking the broader product gap (modern staggered methods need a manual SurveyDesign reconstruction; consider opt-in pweight mode for aggregate_survey) Verified: sphinx-build exit 0, 1721 warnings (same as before, no new warnings introduced); SunAbraham reference resolves; cross-references to :ref:\`survey-design-support\` resolve. Reviewer: #288 (comment) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-04-11T00:18:22Z

/ai-review

github-actions · 2026-04-11T00:24:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c45bbddf74a2d8dbe0ce78f6647e385a39929ddb

Overall Assessment

✅ Looks good

The prior P1 from the last review is resolved. The edited user-facing docs now correctly say that aggregate_survey() returns an aweight second-stage design and limit direct compatibility to estimators marked Full, which matches the implementation and the methodology registry entry for survey aggregation in diff_diff/prep.py:L1476-L1479, docs/methodology/REGISTRY.md:L2319-L2320, README.md:L85, docs/practitioner_getting_started.rst:L296-L307, docs/practitioner_decision_tree.rst:L236-L242, and docs/choosing_estimator.rst:L613-L618.

Executive Summary

The previous blocking issue is fixed: the new README / practitioner / estimator-guide wording no longer overclaims that aggregate_survey() works with any estimator.
Methodology-wise, the new aggregate_survey() prose is consistent with the registry and code on domain estimation, inverse-variance precision weights, and weight_type="aweight".
P3 [Newly identified]: docs/survey-roadmap.md:L26 and docs/survey-roadmap.md:L34-L36 still label several pweight-only estimators as Full; the incompatibility is now tracked in TODO.md:L63, so this is informational rather than blocking.
P3: docs/doc-deps.yaml improved coverage for diff_diff/prep.py, but it still omits README.md even though this PR updates README for aggregate_survey().
The stale-name cleanup is otherwise consistent, and the newly added toctree targets in docs/index.rst:L103-L104 exist.

Methodology

No blocking findings. The core compatibility fix is now accurate and aligned with the returned second-stage SurveyDesign(weight_type="aweight") in diff_diff/prep.py:L1476-L1479 and the registry’s inverse-variance aweight definition in docs/methodology/REGISTRY.md:L2319-L2320.
Severity: P3 [Newly identified]. Impact: docs/survey-roadmap.md:L26 and docs/survey-roadmap.md:L34-L36 still advertise TripleDifference, ImputationDiD, TwoStageDiD, and CallawaySantAnna as Full survey support, but the current support matrix and runtime guards treat them as pweight-only in docs/choosing_estimator.rst:L646-L678, diff_diff/triple_diff.py:L587-L591, diff_diff/imputation.py:L214-L218, diff_diff/imputation.py:L280-L284, diff_diff/two_stage.py:L210-L214, and diff_diff/staggered.py:L1476-L1480. Because the underlying limitation is explicitly tracked in TODO.md:L63, I would treat this as informational. Concrete fix: update those roadmap table cells to pweight only, or add an explicit weight-type caveat.

Code Quality

No findings in the changed files.

Performance

No findings in the changed files.

Maintainability

Severity: P3. Impact: docs/doc-deps.yaml:L548-L560 still does not map diff_diff/prep.py to README.md, even though this PR updates README for aggregate_survey(). That weakens future drift detection compared with other source entries that already track README dependencies, such as docs/doc-deps.yaml:L92-L94. Concrete fix: add README.md as a user_guide dependency under diff_diff/prep.py.

Tech Debt

Severity: P3. Impact: the new TODO entries appropriately track the aggregate_survey() / aweight compatibility limitation and current Sphinx warning debt in TODO.md:L63 and TODO.md:L92-L97. This is proper deferred-work tracking and is non-blocking. Concrete fix: none required for this PR.

Security

No findings.

Documentation/Tests

No blocking findings. The new tutorial targets referenced in docs/index.rst:L103-L104 exist as docs/tutorials/16_survey_did.ipynb and docs/tutorials/16_wooldridge_etwfe.ipynb. Concrete fix: none required. I did not rerun sphinx-build in this read-only sandbox, so this portion of the review is static-only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: surface aggregate_survey() and fix stale references#288

docs: surface aggregate_survey() and fix stale references#288
igerber wants to merge 2 commits intomainfrom
docs/aggregate-survey-discoverability

igerber commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

igerber commented Apr 11, 2026

Uh oh!

github-actions bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 10, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

igerber commented Apr 11, 2026

Uh oh!

github-actions bot commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant