streaming create_final_documents by dayesouza · Pull Request #2243 · microsoft/graphrag

dayesouza · 2026-02-23T13:36:42Z

Streaming Performance Improvement (with .txt):

run time: -60.70%
peak memory: -31.38%
memory delta: -11.58%

This pull request refactors the create_final_documents workflow to use streaming table reads and writes instead of loading entire dataframes into memory. The main goal is to make the workflow more efficient and scalable by processing data in a streaming fashion. The changes also simplify the logic by removing the dependency on pandas and the DataReader class.

Refactor to streaming table processing:

Rewrote the run_workflow function in create_final_documents.py to use asynchronous context managers for opening tables and removed pandas DataFrame operations in favor of streaming row-by-row processing.
Implemented a new create_final_documents async function that builds a mapping from text units to documents and enriches each document row with its associated text unit IDs as it streams through the data, writing results directly to the output table.

Dependency and import changes:

Removed the import and usage of pandas and DataReader, and added imports for the new row transformer and table abstractions.

dayesouza added 2 commits February 23, 2026 10:30

streaming create_final_documents

0d39b0e

add semversioner

4e380e9

dayesouza requested a review from a team as a code owner February 23, 2026 13:36

natoverse approved these changes Feb 23, 2026

View reviewed changes

Merge branch 'main' into create_final_documents

da4a0a8

dayesouza merged commit 1cedb79 into main Feb 23, 2026
18 checks passed

dayesouza deleted the create_final_documents branch February 23, 2026 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

streaming create_final_documents#2243

streaming create_final_documents#2243
dayesouza merged 3 commits intomainfrom
create_final_documents

dayesouza commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

dayesouza commented Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants