Add origin and timestamp preservation to table repair.#73
Add origin and timestamp preservation to table repair.#73ibrarahmad wants to merge 8 commits intopgEdge:mainfrom
Conversation
This commit introduces logic to preserve replication origin metadata and commit timestamps during table repair operations. When enabled, repairs now maintain the original source node and timestamp information instead of marking repaired rows with current repair-time metadata. The implementation extracts origin information from source row metadata and applies it during repair using PostgreSQL's replication origin API. Timestamp parsing supports multiple formats including RFC3339, PostgreSQL text format, and numeric Unix timestamps with automatic scale detection (seconds, milliseconds, microseconds, nanoseconds). Error handling ensures proper session cleanup and graceful degradation when origin metadata is unavailable. Origin preservation works across repair modes including standard repair, fix-nulls updates, and bidirectional synchronization. New integration tests verify correct preservation of timestamps, origin node IDs, and row content integrity throughout the repair process.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces replication origin preservation during table repairs. It adds database query functions for managing replication origins, extends the CLI and HTTP API with a preserve-origin option, implements origin-aware repair logic with per-row timestamp tracking, and includes comprehensive test coverage and documentation updates. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
For detailed refence see PR |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@db/queries/templates.go`:
- Around line 1552-1569: Add a brief comment above the replication origin SQL
templates (near GetReplicationOriginByName, CreateReplicationOrigin,
SetupReplicationOriginSession, ResetReplicationOriginSession,
SetupReplicationOriginXact, ResetReplicationOriginXact) stating that the
pg_replication_origin_* functions require PostgreSQL 9.5 or newer and that
calling them typically requires superuser or equivalent replication/owner
privileges; also add a short note to the README or database setup docs
documenting the minimum PostgreSQL version (>=9.5) and the privilege
requirements for these functions.
In `@internal/consistency/repair/table_repair_batch_test.go`:
- Around line 217-224: The comment for TestParseNumericTimestamp incorrectly
states that Unix seconds value 1704067200 corresponds to 2024-01-01 12:00:00
UTC; update the comment to the correct timestamp (2024-01-01 00:00:00 UTC) so it
accurately documents the expected value used in the sec variable and
parseNumericTimestamp test; locate the comment in TestParseNumericTimestamp near
the sec := time.Unix(1704067200, 0) line and change the annotation only.
In `@internal/consistency/repair/table_repair.go`:
- Around line 2736-2750: The code in resolveOriginLSN (the block that picks
survivorNode using t.Pools and t.SourceOfTruth and currently returns fmt.Errorf
when none found) should not hard-fail; instead log or warn and return (nil,
false, nil) to allow repairs to continue in degraded preserve-origin mode.
Change the branch that checks survivorNode == "" so it emits a warning
mentioning batchKey.nodeOrigin (use the existing logger or
fmt.Printf/processLogger) and then returns nil, false, nil rather than an error;
keep the survivor selection logic using t.Pools and t.SourceOfTruth unchanged.
- Around line 2301-2343: The current loop in executeUpserts processes multiple
originGroups in a single transaction so pg_replication_origin_xact_setup() is
overwritten and all rows get the last batch's origin; fix by ensuring each
origin group runs in its own transaction or by resetting the origin xact before
re-setting it: update executeUpserts (referencing executeUpsertBatch,
task.setupOriginForBatchKey, task.resetReplicationOriginXact,
task.resetReplicationOriginSession) to either (A) begin a fresh transaction for
each batchKey, call task.setupOriginForBatchKey, run executeUpsertBatch, commit
that transaction and then reset session as needed, or (B) if keeping one outer
tx, call task.resetReplicationOriginXact() (and handle errors) before calling
task.setupOriginForBatchKey for the next batch so per-batch origin
timestamps/LSNs are preserved.
🧹 Nitpick comments (1)
pkg/logger/logger.go (1)
31-33: *Accept io.Writer instead of os.File in SetOutput.The underlying
Log.SetOutputacceptsio.Writer. Restricting the wrapper to*os.Fileis unnecessarily limiting and goes against Go best practices of accepting interfaces, not concrete types. Since*os.Fileimplementsio.Writer, this change is fully backward compatible while enabling greater flexibility for testing and custom output destinations.♻️ Proposed refactor
import ( "fmt" + "io" "os" "github.com/charmbracelet/log" ) @@ -func SetOutput(w *os.File) { +func SetOutput(w io.Writer) { Log.SetOutput(w) }
Three bugs prevented preserve-origin from working end-to-end: 1. GetSpockOriginLSNForNode used a broken LIKE pattern that never matched actual spock origin names. Replaced with JOINs through spock.subscription and spock.node, matching the approach used by GetSpockSlotLSNForNode. 2. executeUpserts called resetReplicationOriginSession before tx.Commit(), clearing the origin from the WAL commit record (roident=0). 3. pg_replication_origin_xact_setup is per-transaction, so all rows in a single transaction received the same timestamp. Added executePreserveOriginUpserts which uses a dedicated connection and separate transactions per (origin, timestamp) batch key, preserving per-row timestamps. When PreserveOrigin is active, deletes now commit in a separate transaction before the per-batch-key upsert transactions. The standard (non-preserve) path is unchanged. Also adds TestTableRepair_MixedOps_PreserveOrigin integration test covering DELETE+INSERT+UPDATE with origin/timestamp verification, and fixes the getReplicationOrigin test helper to use spock.xact_commit_timestamp_origin.
760495d to
cca05e9
Compare
Skip FixNulls test (UPDATE doesn't preserve origin). Fix Bidirectional test (proper transaction isolation for divergence creation).
cca05e9 to
5596050
Compare
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
This commit introduces logic to preserve replication origin metadata and commit timestamps during table repair operations. When enabled, repairs now maintain the original source node and timestamp information instead of marking repaired rows with current repair-time metadata.
The implementation extracts origin information from source row metadata and applies it during repair using PostgreSQL's replication origin API. Timestamp parsing supports multiple formats including RFC3339, PostgreSQL text format, and numeric Unix timestamps with automatic scale detection (seconds, milliseconds, microseconds, nanoseconds). Error handling ensures proper session cleanup and graceful degradation when origin metadata is unavailable.
Origin preservation works across repair modes including standard repair, fix-nulls updates, and bidirectional synchronization. New integration tests verify correct preservation of timestamps, origin node IDs, and row content integrity throughout the repair process.