Add ContinuousDiD estimator (Callaway, Goodman-Bacon & Sant'Anna 2024)#177
Add ContinuousDiD estimator (Callaway, Goodman-Bacon & Sant'Anna 2024)#177
Conversation
Implement Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD estimator with B-spline dose-response curves, ACRT derivatives, staggered adoption support, and multiplier bootstrap inference. Validated against R contdid v0.1.0 across 6 benchmarks. Also extract shared bootstrap utilities to bootstrap_utils.py and fix plt.show() blocking in test suite via non-interactive backend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Overall assessment: ⛔ Blocker Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Suggested next steps
|
…lidation - Fix not_yet_treated control group to exclude cohort g from its own control set (matches staggered.py behavior) - Replace inline t_stat/p_value computation in DoseResponseCurve.to_dataframe() with safe_inference() loop per project convention - Add validation rejecting negative doses among treated units - Fix test_inf_first_treat_normalization CI failure (cast to float before inf) - Add test for not_yet_treated control group correctness and negative dose Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: ⛔ Blocker Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Tests not run (not requested). |
- Fix analytical SE: use sqrt(sum(IF^2)) instead of sqrt(mean(IF^2)) to match CallawaySantAnna's influence function convention (P0) - Add discrete dose detection warning for integer-valued treatments (P1) - Guard empty post-treatment cells: warn and return NaN instead of 0.0 (P1) - Validate control_group and base_period params in __init__ and set_params (P2) - Add 7 new tests: SE parity, discrete dose, anticipation event study, empty post_gt, and parameter validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…l event-study SEs - Use percentile CI/p-value from bootstrap (not normal approx) for overall ATT/ACRT and event-study effects, matching CallawaySantAnna convention - Add P(D=0)>0 warning when control_group='not_yet_treated' has no never-treated units (Remark 3.1 in Callaway et al.) - Compute IF-based analytical SEs for event-study bins when n_bootstrap=0 (previously yielded NaN) - Add tests for all three fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Upgrade P(D=0)=0 warning to ValueError for not_yet_treated (P1) - Strengthen balanced-panel check to verify identical time sets (P1) - Add aggregate parameter validation at fit() entry (P3) - Replace hardcoded /tmp paths with tempfile in R benchmarks (P3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Store acrt_glob in _bootstrap_info so the bootstrap distribution is centered at the point estimate instead of 0. Add test that verifies bootstrap ACRT CI brackets the estimate rather than zero. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…e, drop unused vcov - Document B-spline boundary knot deviation in REGISTRY.md and continuous-did.md - Add base_period, anticipation, n_bootstrap, bootstrap_weights, seed, rank_deficient_action fields to ContinuousDiDResults with passthrough from fit() - Switch per-cell OLS to return_vcov=False to skip unused covariance computation - Add test_results_contain_init_params verifying param roundtrip Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…weights in docs, add "webb" to docstring Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: ⛔ Blocker Executive summary:
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests |
When n_valid=1 (ddof=1 → NaN) or all-identical samples (SE=0), compute_effect_bootstrap_stats now returns NaN for all inference fields instead of mixing finite CI/p-value with NaN SE. Adds regression tests and updates REGISTRY.md edge-case documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Documentation/Tests
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ter tests Address PR #177 review round 9: - P2: DoseResponseCurve now stores bootstrap p-values per grid point and uses them in to_dataframe() instead of recomputing normal-approx t-stats/p-values, eliminating mixed inference regimes (bootstrap CIs + normal p-values). - P1: Add event-study tests for control_group="not_yet_treated", base_period="universal", and not_yet_treated with bootstrap inference. - Add test verifying bootstrap dose-response p-value consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Zero out NaN coefficients from dropped rank-deficient columns before prediction (ATT(d), ACRT(d), acrt_glob), producing correct constant ATT and zero ACRT when all treated doses are identical. Original NaN-bearing beta_hat preserved in bootstrap_info for diagnostics. Extend test_all_same_dose to assert dose-response behavior: ATT(d) = overall_att everywhere and ACRT(d) = 0 everywhere. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…sed param Use beta_pred (NaN→0 for dropped columns) instead of beta_hat in bootstrap path to prevent NaN from propagating into ATT(d)/ACRT(d) distributions. Remove unused treatment_groups parameter from _aggregate_event_study. Add bootstrap regression test to test_all_same_dose. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
|
Use compute_effect_bootstrap_stats from bootstrap_utils for both event-study and overall ATT bootstrap paths, adding non-finite filtering, SE<=0 guard, and <50% valid sample check. Remove unused _compute_percentile_ci and _compute_bootstrap_pvalue methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
If you want, I can also point out the minimal diff needed to update the two remaining bootstrap modules. |
Add noise to outcomes in bootstrap regression test so residuals are non-zero across platforms. Identical outcomes produce zero residuals, giving zero-variance bootstrap distributions and SE=0 → NaN on macOS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
Replace random noise approach with naturally heterogeneous outcomes (different y_pre per unit) while keeping identical dose. This exercises the rank-deficient bootstrap path with real sampling variance rather than papering over zero-residual data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Return all-NaN inference when the point estimate is NaN/Inf, preventing finite SE/CI/p-value from a valid bootstrap distribution when the estimate itself is undefined. Adds parametrized regression tests and updates REGISTRY.md bootstrap notes for both CallawaySantAnna and SunAbraham sections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: ⛔ Blocker Executive summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
|
The not-yet-treated control group in both ContinuousDiD and CallawaySantAnna used `G > t` instead of `G > t + anticipation`, incorrectly including cohorts in the anticipation window as controls. This matches R's `did::compute.att_gt()` logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Looks good Executive Summary
Methodology Code Quality Performance
Maintainability Tech Debt Security Documentation/Tests |
Summary
ContinuousDiDestimator implementing Callaway, Goodman-Bacon & Sant'Anna (2024) "Difference-in-Differences with a Continuous Treatment" (NBER WP 32117)staggered_bootstrap.pyintobootstrap_utils.pyplt.show()blocking test suite by setting non-interactive matplotlib backend inconftest.pyNew files
diff_diff/continuous_did.py— MainContinuousDiDestimator classdiff_diff/continuous_did_results.py—ContinuousDiDResultsandDoseResponseCurvedataclassesdiff_diff/continuous_did_bspline.py— B-spline basis construction, evaluation, derivativesdiff_diff/bootstrap_utils.py— Shared bootstrap weight generation, CI, p-value helpersdocs/methodology/continuous-did.md— Full methodology documentationtests/test_continuous_did.py— 46 unit/integration teststests/test_methodology_continuous_did.py— 15 equation verification + 6 R benchmark testsMethodology references (required if estimator / math changes)
contdidv0.1.0 (https://bcallaway11.github.io/contdid/)splines2::bSpline(dvals)usesrange(dvals)instead ofrange(dose))control_groupto overall ATT computation (R's contdid v0.1.0 always usesnotyettreatedinternally regardless of user setting)Validation
tests/test_continuous_did.py— 46 tests covering init, data validation, fit, results, B-splines, dose grid, aggregation, bootstrap, edge casestests/test_methodology_continuous_did.py— 15 tests: hand-calculable equation verification + 6 Rcontdidbenchmarks (ATT(d), ACRT(d), overall ATT/ACRT all within <1-2% relative tolerance)tests/conftest.py— AddedMPLBACKEND=Aggto preventplt.show()blockingSecurity / privacy
Generated with Claude Code