Hello I have been working on upgrading my team s Apollo code kotlinlang #apollo-kotlin

Hello! I have been working on upgrading my team’s ...

Tyler Wong

11/28/2023, 9:06 AM

Hello! I have been working on upgrading my team’s Apollo codegen and have unfortunately hit some road blocks. During the validation steps when calling

GQLDocument.validateAsExecutable

and again when building IR, our build is severely impacted by heavy fragment re-use which results in very long validation times. These validation calls also recursively call

validate

on every selection set, so it seems some entities are being validated multiple times. More in 🧵

Tyler Wong

11/28/2023, 9:06 AM

I’ve forked the repo at version v3.8.2 and have retrofitted

CodegenTest

to run over our source files. Are there changes in v4.0.0 that could have improved anything? Java codegen finishes and builds in about 8 minutes on v2.3.1 (super old I know 😅), but on v3.8.2, fragment validation on the first pass takes about 41 minutes and operation validation on the first pass takes about 29 minutes. Some of the fragments and operations are validated once more when building IR, so validation takes close to 2 hours. After getting through document validation and building IR, we ended up with an OOM error while writing file infos after a grand total of 1 hour and 58 minutes.

Tyler Wong

11/28/2023, 9:07 AM

Can the validation calls can be memoized? I’m not really familiar with document validation and if every single call to

validate

is necessary to maintain the integrity of the document. Could we optimize our fragment/operation definitions to help too? I’m currently working on getting an obfuscated version of our source. I also saw this issue and was surprised to not find anything about the validation steps taking a long time. For context, we have over 5k type definitions in our schema, including over 1k fragments defined and over 200 operations.

CodegenTest

was altered to generate Kotlin models and use operation-based codegen when testing our build. Sorry for the super long read, but hoping we can get somewhere!

mbonnin

11/28/2023, 9:22 AM

Yikes! 2h is not great. Codegen performance hasn't really been a problem so far, usually, build time is dominated by the Kotlin compiler so we haven't spent a lot of time improving the codegen performance but it looks like we should.

mbonnin

11/28/2023, 9:24 AM

One expensive part of validation is

fieldsCanMerge

, we have an issue to speed things up there but haven't got to it yet. It's hard to "memo" stuff because every fragment needs to be validated in the context of its operation

mbonnin

11/28/2023, 9:27 AM

But it might very well be something completely different. 2h sounds way too much TBH so the code must hit a busyloop somewhere or something like this. If you can share your schema/operations, I'm pretty sure we can get to the bottom of this. There is graphql-anonymizer to anonymize your schema + queries. Or if you don't want to share publicly, feel free to share at martin@apollographql.com, that works too

mbonnin

11/29/2023, 11:05 PM

Did you get any chance to look into it? I'm quite curious where the bottleneck might be now 🙂

Tyler Wong

11/29/2023, 11:14 PM

Ah yes! I’ve been working on obfuscating our biggest query and just got it properly obfuscated and building again. This one takes about 20 minutes alone if it’s put through

CodegenTest

. Will upload here shortly!

Tyler Wong

11/29/2023, 11:21 PM

Here are the scalars we use.

Copy code

scalarMapping = mapOf(
              "ykvogxdjcr" to ScalarInfo("java.lang.String"),
              "dgsalnbfnk" to ScalarInfo("java.lang.String"),
              "owblnrurns" to ScalarInfo("java.lang.String"),
              "fxvpqiduxs" to ScalarInfo("java.lang.String"),
              "ebjtupyrji" to ScalarInfo("java.lang.String"),
              "vtxouwljjr" to ScalarInfo("java.lang.String"),
              "inpciuewhq" to ScalarInfo("java.lang.String"),
              "fkukbftrbi" to ScalarInfo("java.lang.String"),
              "buoguxadgo" to ScalarInfo("java.lang.String"),
              "dujbacbaoz" to ScalarInfo("java.lang.String"),
          ),

I’m still stepping through the code to see what the actual bottleneck is for this example, although it seems to be

selectionSet.validate

and not

fieldsInSetCanMerge

. Not 100% sure yet though.

schema.json bigQuery.graphql

thank you color 1

mbonnin

11/29/2023, 11:44 PM

Nice!

fieldsInSetCanMerge

sounds like a good candidate!

mbonnin

11/29/2023, 11:54 PM

Calling just validate is ok-ish, takes 11s on my M2 laptop (test here)

mbonnin

11/30/2023, 12:00 AM

Sanity check: are you using

responseBased

codegen by any chance?

mbonnin

11/30/2023, 12:07 AM

I need to call it a day, will look into more details tomorrow. Let me know if you find anything!

Tyler Wong

11/30/2023, 12:19 AM

Oh ok so I just ran the test on my machine and got pretty fast results as you did. It seems like the latest 4.x version might’ve fixed something, because changing the version back to 3.8.2 in the test causes validation to take along time again. Wondering if the

detectCycles

addition in 4.x is short-circuiting our validation chain? Sounds good! Thanks for taking a look. Will do 👍

Tyler Wong

11/30/2023, 12:25 AM

Oh and also, was using

operationBased

responseBased

didn’t finish (or I gave up) when I tried it on the obfuscated example.

👍 1

mbonnin

11/30/2023, 12:30 AM

Makes sense. I started a profile run with IJ, see if we can get a flame graph, will post that tomorrow (if the test finishes 😅 )

🤞 1

mbonnin

11/30/2023, 12:34 AM

PS: I don't think detectCycle would help. It's doing additional checks so most likely slowing things down if anything

👍 1

Tyler Wong

11/30/2023, 1:21 AM

So it looks like 4.x is much faster! Really curious to know what changed between 3.8.2 and the latest beta.

mbonnin

11/30/2023, 8:58 AM

Flame graph doesn't tell much, just spends a lot of time traversing the GraphQL tree. Maybe

possibleTypes

but looks more like v3 was traversing more than needed.

Tyler Wong

11/30/2023, 7:10 PM

Interesting…well looks like we’ll go straight to v4! How long did that run take? Also curious to know what kinds of improvements made it into v4 that would have fixed this.

mbonnin

11/30/2023, 7:11 PM

~11min IIRC (but I cut this run short because I didn't want to wait 😄 )

mbonnin

11/30/2023, 7:11 PM

Also very curious about what changed but not much time to dive into this and since we have a solution I'm tempted to not look 😄

Tyler Wong

11/30/2023, 7:12 PM

Haha agreed. Thanks for all the help! Much appreciated.

thank you color 1

5 Views

Open in Slack

Previous Next