I am seeing pretty bad performance in KSP compared...
# ksp
e
I am seeing pretty bad performance in KSP compared to KAPT that is fairly concerning (version 1.5.31-1.0.0) I am just about done migrating Epoxy to support KSP via XProcessing (https://github.com/airbnb/epoxy/pull/1244) and took some time to benchmark ksp vs kapt via both gradle build scans and manual timing print statements in the processor. All of our processors now support both ksp and kapt, and I can switch between modes to compare them, but numbers are much worse on KSP for clean builds. Here are processors timings we manually logged for a clean build of one of our larger modules: Paris - 1100ms kapt , 3600ms ksp : 3.27x slower Epoxy - 675ms kapt , 4800ms ksp : 7.1x slower another in house processor - 375ms kapt , 1800ms ksp : 4.8x slower Even with taking kapt stub generation into account KSP does worse, when looking at gradle tasks. ksp mode: - :kspDebugKotlin 18.992s kapt mode: - :kaptDebugKotlin 5.652s - :kaptGenerateStubsDebugKotlin 3.814s I haven’t done any in depth profiling yet to know where this comes from - not sure if it would mainly be xprocessing or KSP itself. Room uses xprocessing though and claimed 2x speed increase so I’m curious if that is still correct or if maybe there was a regression? I believe we used to actually see better performance with KSP a few months ago
not relevant to your case, but one mistake i made was relying on gradle profiler's ABI change which actually didn't change the ABI in a way KAPT would be invalidated.
also, you should definitely count KAPT stub time in your comparisons since that is what makes KAPT clean compilation slow
e
thanks for the scripts, I’d love to get a flamegraph to dig into where the time is going. I’m doing clean builds with a command on loop like this
./gradlew module:clean module:assembleDebug --no-build-cache
even with stub generation taken into account ksp is twice as slow. but I believe KSP claimed that even without stub generation KSP should be faster because they can do more granular type resolution on demand
y
Not sure about that because kotlin type system is more complex than Java
The first script i linked will give you a flame graph but to invalidate abi for KAPT, you will need my custom gradle profiler build. I might dig that up
The interesting case i found there was, gradle profiler invalidates abi by addinga public method to a file, which did invalidate KSP but didn't invalidate KAPT as it turns them into Java files hence the original annotated file stays unmodified
e
shouldn’t matter for clean build profiling though?
y
nope
let me see if i can find the results
just ran that ksp-kapt comparison script.
Copy code
totals:
kapt : 53759 ms
ksp : 33144 ms
taskTotals:
kaptWithKaptDebugAndroidTestKotlin : 22754 ms
kaptClasspath_kaptWithKaptDebugKotlin : 0 ms
kaptClasspath_kaptDebugKotlin : 1 ms
kspWithKspDebugAndroidTestKotlin : 32804 ms
kaptAndroidTestWithKapt : 329 ms
kaptGenerateStubsWithKaptDebugAndroidTestKotlin : 30438 ms
kaptAndroidTestDebug : 0 ms
kapt : 1 ms
kaptAndroidTest : 0 ms
kspAndroidTestWithKsp : 319 ms
kaptGenerateStubsWithKaptDebugKotlin : 80 ms
kaptWithKaptDebugKotlin : 41 ms
kaptGenerateStubsDebugKotlin : 50 ms
kspWithKspDebugAndroidTestKotlinProcessorClasspath : 21 ms
kaptDebugKotlin : 37 ms
kaptWithKapt : 0 ms
kaptDebug : 0 ms
kaptWithKaptDebug : 0 ms
kaptAndroidTestWithKaptDebug : 0 ms
kaptClasspath_kaptWithKaptDebugAndroidTestKotlin : 28 ms
`
e
thanks for checking, good to see Room is still faster. I’m trying to get a flame graph with the gradle profiler it’s also interesting to see that without the stubs task, direct comparison of kapt vs ksp is roughly 50% different so that helps me set expectations for my processors
I got a jprofiler flamegraph working (happy to share it if anyone is interested in the data). xprocessing’s
getDeclaredMethods
takes a significant amount of time, in large part I think because of forces type resolution for this
Copy code
// if it receives or returns inline, drop it.
                // we can re-enable these once room generates kotlin code
                it.parameters.any {
                    it.type.resolve().isInline()
                } || it.returnType?.resolve()?.isInline() == true
similarly
syntheticGetterSetterMethods
checks type for inline functions. I suppose anywhere that xprocessing currently resolves type should try to be deferred as much as possible. that’s really wasteful for my specific use case. I’m not sure how flexible you want to make the xprocessing api’s - it might be easier to have an escape hatch to expose the ksp resolver so we can do efficient things when we know we’re in ksp (right now I access the resolver reflectively, but it would be nice to have a real api) similarly maybe access to modifiers should be deferred more. for example, getting all methods forces creation of modifiers, when we could probably defer knowing whether its a suspend function
Copy code
return if (declaration.modifiers.contains(Modifier.SUSPEND)) {
                KspSuspendMethodElement(env, containing, declaration)
            } else {
                KspNormalMethodElement(env, containing, declaration)
            }
Also, accessing rawType seems to not defer creation of type name
Copy code
constructor(original: KspType) : this(
        ksType = original.ksType.starProjection().makeNotNullable(),
        typeName = original.typeName.rawTypeName()
    )
and that is a big chunk of my flame graph I think addressing those two things should help me a lot actually
In KSP itself there is a lot of time spent in
getSymbolsWithAnnotation -> resolveToUnderlying
which I suppose is to be expected, but would be great if some optimizations could be done eventually
j
Thanks for doing performance evaluation, we will address performance issues as part of our plan. Speaking of
getSymbolsWithAnnotation
, the overhead comes from resolving annotation type, we do have some optimizations in place before calling
resolveToUnderlying
, we will look into if we can put more optimization, but my guess is it is most likely marginal since a type resolution is unavoidable, unless we can improve performance for type resolution.
e
thanks, yes it seems type resolution is mostly needed. I think the largest room for optimization is in xprocessing
t
re: `getSymbolsWithAnnotation -> resolveToUnderlying`: Looks like the optimization (check simple name before resolution) for Java sources is failing. This is a huge performance hit because it basically resolves all annotations in Java sources. We'll fix it in KSP 1.0.1. https://github.com/google/ksp/issues/707
🎉 1
y
@elihart can you file them as bugs? also, does your code never need those resolutions? Flame graphs might sometimes be misleeading if you eventually read those values.
e
does your code never need those resolutions?
yes, in many cases we either only need names, or are filtering the methods/fields so it is wasteful to resolve all of them
I’ll file a bug
y
i guess that lazy resolution wasn't a problem for room because most things we read (pojos, daos) we need them all. that being said, it will probably help room as well as for entities, sometimes they are not value objects so we might be reading more than we need (possibly insignificant though). might be significant for dagger as well
👍 1
@elihart, do you have any of your profiles merged into any of those projects using xprocessing? (wondering if there is an easy way for me to take a look)
profiler scripts*
e
I ran this on a large module in our airbnb android project. But I did it with the standard gradle profiler with jprofiler -
gradle-profiler --profile jprofiler --jprofiler-config sampling-all --scenario-file profiling.scenarios
With a
profiling.scenarios
file like
Copy code
default-scenarios = ["profiling"]

profiling {
    title = "Profile ksp"
    tasks = ["kspDebugKotlin"]
    cleanup-tasks = ["clean"]
    gradle-args = ["--no-build-cache", "-Dkotlin.compiler.execution.strategy=in-process"]
    daemon = none
}
I can send you a jprofiler file if you like though
y
👍 i wanted to run them myself.
actually, if you can share the profiler result with yboyar@google that would be nice. i made some small changes but not resolving them does not seem to help much. maybe it is a different path
also greatly struggling w/ picking a benchmarking library for this so suggestions are welcome there. I need the API similar to androidx.benchmarking that will allow me to stop/start measurements. JMH does not seem to provide that (not very faimilar though)
ok so i did some simple tight look of getting raw types and making it lazy does seem to help based on an async profiler output. don't have proper benchmarks yet so it does feel a bit uncomfortable yet 🙂
but when i try to measure actual time spent, it does not seem to be getting better which does not make much sense, need to dig deeper or better figure out how to setup jmh to run some of these properly
e
thanks for starting to look at this so quickly. I don’t have a benchmark library to recommend, but I will email you my jprofiler file
y
https://android-review.googlesource.com/c/platform/frameworks/support/+/1880590/1 this is how i'm measuring it, to be honest relying on profile is not great but that is the best i have for now. https://android-review.googlesource.com/c/platform/frameworks/support/+/1880595/2
👀 1
🎉 1
e
great start, thanks!