-
Notifications
You must be signed in to change notification settings - Fork 697
Description
Hey!
I reached out over email to @kszucs to ask about the status of using e-graphs in Ibis for query optimization or translation between different backends.
E-graphs are a data structure that can be used to hold multiple equivalent versions of an expression or AST. Similar to a rewrite system, you would add rules that match on one expression and return an equivalent one. But instead of replacing it, you would simply add it to the e-graph. This can let you avoid some issues of phase ordering. Then, you "extract" out the lowest cost expression based on some heuristics that is equivalent to your input expression.
@kszucs gave me a rough overview of where things were at, that there were some experiments to use egg and also a custom Python e-graph implementation. However, there there were some issues representing relational algebra operations involving lists/dicts, that informed the switch to using a traditional match/replace term rewriting system.
- https://github.com/ibis-project/ibis/blob/main/ibis/expr/rewrites.py
- https://github.com/ibis-project/ibis/blob/main/ibis/backends/sql/rewrites.py
This works well for some things, but it is hard to write rewrites that use additional information. There was a general interest in supporting some kind of general IR for relational algebra in e-graphs that could be re-used across projects like DataFusion or databases, like Apache Calcite.
I am a first year PhD student at UW currently and maintain the Python bindings for egglog as well as help generally in its development. Egglog is a language/library implemented in Rust that supports using e-graphs to reason over your custom domains. It allow rules that match on multiple expressions and new equivalences based on those. Egglog itself is built on a database that stores all expressions added to the e-graph and and allows efficient executions of rules, potentially even in parallel.
I am interested in exploring how egglog in particular might work with Ibis (and vice versa, how the features needed by Ibis to use egglog might help drive its development). I am mentoring another student here @marko-subotic on egglog development who also might be interested in helping explore this.
If this is of interest to other maintainers, I imagine it might be helpful to start with a proof of concept that attempts to address two things:
- Roadblocks that were encountered when previously trying to adopt e-graphs, around vectors and maps
- Issues with the current term replacement system, either things that can't be done or that are just cumbersome to do.
If we could work on something that resolves these, then we could start exploring options for potential adoption.
I would be curious to know if the current maintainers have any feedback on this approach, any concerns or suggestions. If there is interest in helping here, then I think the first thing to do would be to get more concrete detail on those two issues above, what are some example of things that failed in the last e-graph adoption approach and example of things that are hard with the current system.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status