TBD: Ast serde by CircArgs · Pull Request #727 · DataJunction/dj

CircArgs · 2023-08-18T01:03:45Z

Summary

Serializing and deserializing ASTs maintaining all information even after compilation into a flat form that is json serializable

Test Plan

unit tests

PR has an associated issue: #
make check passes
make test shows 100% unit test coverage

netlify · 2023-08-18T01:03:48Z

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Name	Link
🔨 Latest commit	`4ad644a`
🔍 Latest deploy log	https://app.netlify.com/sites/thriving-cassata-78ae72/deploys/64dec372adca0900081eb7d4

shangyian

Thanks for adding this! Have you done any profiling to see how long the deserialization takes? I can get you some examples of huge queries if needed.

CircArgs · 2023-08-18T02:51:04Z

@shangyian

Thanks for adding this! Have you done any profiling to see how long the deserialization takes? I can get you some examples of huge queries if needed.

Just ran some to get an idea:

parsing all the spark tpcds spark queries:
3.53 s ± 23.9 ms per loop

serializing all tpcds spark queries:
83.1 ms ± 1.76 ms per loop

deserializing them:
31.8 µs ± 950 ns per loop <-this one might be 7x slower than this for all 95 queries, but that's still really fast compared to even just our parsing time.

keep in mind these timings are for 95 queries not individual

shangyian

@CircArgs that's great!

shangyian · 2023-08-18T05:50:23Z

@CircArgs Hmm, actually I tried this with one of our use cases -- specifically some metrics that depend on several layers of transforms, each of which have fairly nested subqueries. For those metrics, I serialized the compiled query AST and then deserialized it, but it takes a long time to deserialize (more than five minutes per metric). Maybe there's something else going on, but parsing and recompiling is faster.

I started a change that would save this serialized ast on a node revision, but I'll hold off until we get to the bottom of the perf issues. I was having similar issues in #699, where deserialization turned out to be slower than just recompiling.

CircArgs · 2023-08-18T14:37:36Z

@shangyian I can hold off on this for now then. I wonder if there's a big distinction with just serializing/de non-compiled queries vs compiled queries. My timings were all non-compiled

shangyian · 2023-08-18T14:51:51Z

@CircArgs Maybe it's because some compiled queries, if they're pulling together many layers of transforms, can be huge. But I would still expect this to be faster than actually having to compile the queries. 🤔

On that thought, we also need to make the queries built more efficient / readable by removing all the columns that aren't used.

agorajek

Thanks for this work @CircArgs . Can't wait to see it plugged in.

CircArgs added 2 commits August 17, 2023 20:12

serde

d542a28

test serde

4ad644a

CircArgs marked this pull request as ready for review August 18, 2023 01:32

CircArgs requested review from agorajek, samredai and shangyian August 18, 2023 01:32

shangyian reviewed Aug 18, 2023

View reviewed changes

shangyian approved these changes Aug 18, 2023

View reviewed changes

agorajek approved these changes Aug 18, 2023

View reviewed changes

CircArgs changed the title ~~Ast serde~~ TBD: Ast serde Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TBD: Ast serde#727

TBD: Ast serde#727
CircArgs wants to merge 2 commits intoDataJunction:mainfrom
CircArgs:ast-serde

CircArgs commented Aug 18, 2023 •

edited

Loading

Uh oh!

netlify bot commented Aug 18, 2023 •

edited

Loading

Uh oh!

shangyian left a comment •

edited

Loading

Uh oh!

CircArgs commented Aug 18, 2023 •

edited

Loading

Uh oh!

shangyian left a comment

Uh oh!

shangyian commented Aug 18, 2023 •

edited

Loading

Uh oh!

CircArgs commented Aug 18, 2023

Uh oh!

shangyian commented Aug 18, 2023

Uh oh!

agorajek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

CircArgs commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

netlify bot commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for thriving-cassata-78ae72 canceled.

Uh oh!

shangyian left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CircArgs commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shangyian left a comment

Choose a reason for hiding this comment

Uh oh!

shangyian commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CircArgs commented Aug 18, 2023

Uh oh!

shangyian commented Aug 18, 2023

Uh oh!

agorajek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

CircArgs commented Aug 18, 2023 •

edited

Loading

netlify bot commented Aug 18, 2023 •

edited

Loading

shangyian left a comment •

edited

Loading

CircArgs commented Aug 18, 2023 •

edited

Loading

shangyian commented Aug 18, 2023 •

edited

Loading