Use chunked file reading to avoid loading entire files into memory by JanTvrdik · Pull Request #37 · nextras/multi-query-parser

JanTvrdik · 2026-02-18T15:06:30Z

Summary

Added PatternIterator — a reusable, chunk-aware regex matching iterator over Iterator<string> streams. It accumulates chunks, yields match arrays, supports mid-iteration pattern changes via setPattern(), and handles chunk-boundary safety (waits for more data when a match reaches the end of the buffer).
Added BaseMultiQueryParser — an abstract base class implementing IMultiQueryParser with parseFile() (chunked 64 KiB reading via fopen/fread), parseString(), and an abstract parseStringStream() method.
Refactored all three parsers (MySqlMultiQueryParser, PostgreSqlMultiQueryParser, SqlServerMultiQueryParser) to extend BaseMultiQueryParser and use PatternIterator instead of BufferedFileParseTrait. No changes to regex patterns or public API.
Zero-length regex matches (e.g. \z end-of-content markers) are handled safely — they break out of both loops to prevent infinite iteration and ensure consistent behavior regardless of chunking.
Added comprehensive PatternIteratorTest (33 tests) covering empty/trivial inputs, single/multi-chunk matching, capture groups, chunk-boundary safety, generator-based streams, single-byte chunks, pattern mutation, error handling, zero-length matches, SQL-like patterns, and comparison with the MySQL query pattern.
Added randomized chunking test to MySqlMultiQueryParserTest — runs 100 iterations with random chunk sizes (1–256 bytes) against mysql.sql to verify results are identical regardless of chunking.

Test plan

All tests pass (42 tests)
PHPStan reports no errors (composer phpstan)
Verify with a large SQL file (100+ MB) that memory stays bounded

Replace file_get_contents() with buffered fopen()/fread() via a shared BufferedFileParseTrait. Memory usage is now proportional to the largest single query rather than the entire file size, which matters for large SQL files (100+ MB). The parsers already use generators for output, so this completes the streaming pipeline on the input side.

Copilot

Pull request overview

This PR refactors the multi-query parser to use chunked file reading instead of loading entire files into memory, aiming to reduce memory usage for large SQL files. The implementation introduces a shared BufferedFileParseTrait that reads files in 64 KiB chunks using fopen()/fread() and refactors all three database-specific parsers (MySQL, PostgreSQL, SQL Server) to use this trait. A safety mechanism is included to prevent false \z regex anchor matches at chunk boundaries.

Changes:

Introduced BufferedFileParseTrait with chunked file reading logic (64 KiB chunks)
Refactored MySQL, PostgreSQL, and SQL Server parsers to use the new trait with callback-based pattern processing
Added safety check to handle \z anchor edge cases at chunk boundaries

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
src/BufferedFileParseTrait.php	New trait implementing chunked file reading with buffering, pattern matching, and memory management logic
src/MySqlMultiQueryParser.php	Refactored to use BufferedFileParseTrait with callback handling for dynamic delimiter changes
src/PostgreSqlMultiQueryParser.php	Refactored to use BufferedFileParseTrait with static callback for query extraction
src/SqlServerMultiQueryParser.php	Refactored to use BufferedFileParseTrait with static callback for query extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/BufferedFileParseTrait.php

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/PatternIterator.php

tests/cases/MySqlMultiQueryParserTest.phpt

phpstan.neon

src/PatternIterator.php

src/BaseMultiQueryParser.php

src/IMultiQueryParser.php

Copilot AI review requested due to automatic review settings February 18, 2026 15:06

Copilot started reviewing on behalf of JanTvrdik February 18, 2026 15:12 View session

JanTvrdik force-pushed the chunked-file-reading branch from a7322e5 to 04aafe6 Compare February 18, 2026 15:15

JanTvrdik force-pushed the chunked-file-reading branch from 04aafe6 to d768638 Compare February 18, 2026 15:19

Copilot AI reviewed Feb 18, 2026

View reviewed changes

src/BufferedFileParseTrait.php Outdated Show resolved Hide resolved

hrach reviewed Feb 21, 2026

View reviewed changes

src/BufferedFileParseTrait.php Outdated Show resolved Hide resolved

reimplement chunked parsing

ee415d7

JanTvrdik force-pushed the chunked-file-reading branch from 4123b75 to ee415d7 Compare February 23, 2026 19:23

JanTvrdik requested a review from Copilot February 23, 2026 19:51

Copilot started reviewing on behalf of JanTvrdik February 23, 2026 19:51 View session

drop useless test

ac838fe

Copilot AI reviewed Feb 23, 2026

View reviewed changes

JanTvrdik requested a review from hrach February 23, 2026 19:59

address copilot review

e6525b9

JanTvrdik merged commit fc2cf19 into main Feb 23, 2026
12 checks passed

JanTvrdik deleted the chunked-file-reading branch February 23, 2026 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Use chunked file reading to avoid loading entire files into memory#37

Use chunked file reading to avoid loading entire files into memory#37
JanTvrdik merged 4 commits intomainfrom
chunked-file-reading

JanTvrdik commented Feb 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

JanTvrdik commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JanTvrdik commented Feb 18, 2026 •

edited

Loading