Use chunked file reading to avoid loading entire files into memory#37
Use chunked file reading to avoid loading entire files into memory#37
Conversation
a7322e5 to
04aafe6
Compare
Replace file_get_contents() with buffered fopen()/fread() via a shared BufferedFileParseTrait. Memory usage is now proportional to the largest single query rather than the entire file size, which matters for large SQL files (100+ MB). The parsers already use generators for output, so this completes the streaming pipeline on the input side.
04aafe6 to
d768638
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the multi-query parser to use chunked file reading instead of loading entire files into memory, aiming to reduce memory usage for large SQL files. The implementation introduces a shared BufferedFileParseTrait that reads files in 64 KiB chunks using fopen()/fread() and refactors all three database-specific parsers (MySQL, PostgreSQL, SQL Server) to use this trait. A safety mechanism is included to prevent false \z regex anchor matches at chunk boundaries.
Changes:
- Introduced
BufferedFileParseTraitwith chunked file reading logic (64 KiB chunks) - Refactored MySQL, PostgreSQL, and SQL Server parsers to use the new trait with callback-based pattern processing
- Added safety check to handle
\zanchor edge cases at chunk boundaries
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/BufferedFileParseTrait.php | New trait implementing chunked file reading with buffering, pattern matching, and memory management logic |
| src/MySqlMultiQueryParser.php | Refactored to use BufferedFileParseTrait with callback handling for dynamic delimiter changes |
| src/PostgreSqlMultiQueryParser.php | Refactored to use BufferedFileParseTrait with static callback for query extraction |
| src/SqlServerMultiQueryParser.php | Refactored to use BufferedFileParseTrait with static callback for query extraction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4123b75 to
ee415d7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
PatternIterator— a reusable, chunk-aware regex matching iterator overIterator<string>streams. It accumulates chunks, yields match arrays, supports mid-iteration pattern changes viasetPattern(), and handles chunk-boundary safety (waits for more data when a match reaches the end of the buffer).BaseMultiQueryParser— an abstract base class implementingIMultiQueryParserwithparseFile()(chunked 64 KiB reading viafopen/fread),parseString(), and an abstractparseStringStream()method.MySqlMultiQueryParser,PostgreSqlMultiQueryParser,SqlServerMultiQueryParser) to extendBaseMultiQueryParserand usePatternIteratorinstead ofBufferedFileParseTrait. No changes to regex patterns or public API.\zend-of-content markers) are handled safely — they break out of both loops to prevent infinite iteration and ensure consistent behavior regardless of chunking.PatternIteratorTest(33 tests) covering empty/trivial inputs, single/multi-chunk matching, capture groups, chunk-boundary safety, generator-based streams, single-byte chunks, pattern mutation, error handling, zero-length matches, SQL-like patterns, and comparison with the MySQL query pattern.MySqlMultiQueryParserTest— runs 100 iterations with random chunk sizes (1–256 bytes) againstmysql.sqlto verify results are identical regardless of chunking.Test plan
composer phpstan)