Skip to content

Comments

Use chunked file reading to avoid loading entire files into memory#37

Merged
JanTvrdik merged 4 commits intomainfrom
chunked-file-reading
Feb 23, 2026
Merged

Use chunked file reading to avoid loading entire files into memory#37
JanTvrdik merged 4 commits intomainfrom
chunked-file-reading

Conversation

@JanTvrdik
Copy link
Member

@JanTvrdik JanTvrdik commented Feb 18, 2026

Summary

  • Added PatternIterator — a reusable, chunk-aware regex matching iterator over Iterator<string> streams. It accumulates chunks, yields match arrays, supports mid-iteration pattern changes via setPattern(), and handles chunk-boundary safety (waits for more data when a match reaches the end of the buffer).
  • Added BaseMultiQueryParser — an abstract base class implementing IMultiQueryParser with parseFile() (chunked 64 KiB reading via fopen/fread), parseString(), and an abstract parseStringStream() method.
  • Refactored all three parsers (MySqlMultiQueryParser, PostgreSqlMultiQueryParser, SqlServerMultiQueryParser) to extend BaseMultiQueryParser and use PatternIterator instead of BufferedFileParseTrait. No changes to regex patterns or public API.
  • Zero-length regex matches (e.g. \z end-of-content markers) are handled safely — they break out of both loops to prevent infinite iteration and ensure consistent behavior regardless of chunking.
  • Added comprehensive PatternIteratorTest (33 tests) covering empty/trivial inputs, single/multi-chunk matching, capture groups, chunk-boundary safety, generator-based streams, single-byte chunks, pattern mutation, error handling, zero-length matches, SQL-like patterns, and comparison with the MySQL query pattern.
  • Added randomized chunking test to MySqlMultiQueryParserTest — runs 100 iterations with random chunk sizes (1–256 bytes) against mysql.sql to verify results are identical regardless of chunking.

Test plan

  • All tests pass (42 tests)
  • PHPStan reports no errors (composer phpstan)
  • Verify with a large SQL file (100+ MB) that memory stays bounded

Copilot AI review requested due to automatic review settings February 18, 2026 15:06
@JanTvrdik JanTvrdik force-pushed the chunked-file-reading branch from a7322e5 to 04aafe6 Compare February 18, 2026 15:15
Replace file_get_contents() with buffered fopen()/fread() via a shared
BufferedFileParseTrait. Memory usage is now proportional to the largest
single query rather than the entire file size, which matters for large
SQL files (100+ MB). The parsers already use generators for output, so
this completes the streaming pipeline on the input side.
@JanTvrdik JanTvrdik force-pushed the chunked-file-reading branch from 04aafe6 to d768638 Compare February 18, 2026 15:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the multi-query parser to use chunked file reading instead of loading entire files into memory, aiming to reduce memory usage for large SQL files. The implementation introduces a shared BufferedFileParseTrait that reads files in 64 KiB chunks using fopen()/fread() and refactors all three database-specific parsers (MySQL, PostgreSQL, SQL Server) to use this trait. A safety mechanism is included to prevent false \z regex anchor matches at chunk boundaries.

Changes:

  • Introduced BufferedFileParseTrait with chunked file reading logic (64 KiB chunks)
  • Refactored MySQL, PostgreSQL, and SQL Server parsers to use the new trait with callback-based pattern processing
  • Added safety check to handle \z anchor edge cases at chunk boundaries

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/BufferedFileParseTrait.php New trait implementing chunked file reading with buffering, pattern matching, and memory management logic
src/MySqlMultiQueryParser.php Refactored to use BufferedFileParseTrait with callback handling for dynamic delimiter changes
src/PostgreSqlMultiQueryParser.php Refactored to use BufferedFileParseTrait with static callback for query extraction
src/SqlServerMultiQueryParser.php Refactored to use BufferedFileParseTrait with static callback for query extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JanTvrdik JanTvrdik requested a review from hrach February 23, 2026 19:59
@JanTvrdik JanTvrdik merged commit fc2cf19 into main Feb 23, 2026
12 checks passed
@JanTvrdik JanTvrdik deleted the chunked-file-reading branch February 23, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants