Are `screenshot tests` worth it kotlinlang #compose

Are `screenshot tests` worth it?

ursus

04/10/2025, 1:02 AM

Are

screenshot tests

worth it?

Colton Idle

04/10/2025, 1:09 AM

Depends... I feel like some teams I've been on have gotten really insane about screenshot testing and all it does it cause issues every day and we stopped shipping and care more about screenshot tests. recency bias... but yeah. on my small projects I plan to incorporate it in just a few ways in a much smaller fashion instead of trying to screenshot test everything under the sun.

ursus

04/10/2025, 1:11 AM

like what, just the "core" design system?

agrosner

04/10/2025, 1:25 AM

We have hundreds of those tests and hardly have had any issues. Drift sometimes happens but paparazzi does a good job of accounting for it

agrosner

04/10/2025, 1:26 AM

It’s tough to tell how many bugs it stops from happening because it gates changes. But it had caught visual bugs when snapshots are changed and someone inspects them

ursus

04/10/2025, 1:28 AM

When do you run then? Along with rest of unit tests? Or somehow split the test suite?

Colton Idle

04/10/2025, 1:29 AM

i think my one team just went like overboard. we had hundreds of them and something would always break and cause pain.

ursus

04/10/2025, 1:31 AM

It’s tough to tell how many bugs it stops from happening because it gates changes. But it had caught visual bugs when snapshots are changed and someone inspects them

not sure if I understand this bit - snapshots = screenshots paparazzi takes? someone inspects them? why? isnt that the tools job?

agrosner

04/10/2025, 1:33 AM

Yeah we run on every PR using categorized junit tests and split them from regular ones since they take longer to run if done together with regular unit tests

➕ 1

agrosner

04/10/2025, 1:33 AM

Ours hardly break because they only do individual components or composite ones

agrosner

04/10/2025, 1:34 AM

So they break if our design system updates , or if a dev changes the inner business logic of that composable without expecting a change, if they expect to change they just check in the changes

agrosner

04/10/2025, 1:34 AM

They’re paparazzi screenshots / snapshots yes

ursus

04/10/2025, 1:38 AM

Yes the speeds kind of bothers me, hence I'm contemplating if they're worth it, since I wanted to use them as you do, for the core design system components - and to be frank there's not a lot of traffic in that module once it's done

agrosner

04/10/2025, 2:00 AM

Yeah it depends. Paparazzi is much faster than roborazzi. We use parameterized tests and run hundreds if not thousands of them and they run in less than 10 minutes. Not bad! But could be long

➕ 1

agrosner

04/10/2025, 2:00 AM

Typically engineers just run the specific test class only and not all until CI

Colton Idle

04/10/2025, 2:09 AM

IMO. Just follow this recent video to enable screenshot testing

https://www.youtube.com/watch?v=Y9GWnwi9D0I▾

and see if it's useful for you/your team.

eygraber

04/10/2025, 3:01 AM

since they take longer to run if done together with regular unit tests

Is that documented anywhere, or just something you found in your projects?

agrosner

04/10/2025, 3:42 PM

We found in our projects. We have them in 10-20 modules and they run parameterized so there’s so many

👍 1

agrosner

04/10/2025, 3:43 PM

So we skip them in junit and run them only in special ci builds

mattinger

04/10/2025, 4:30 PM

They are useful, but i feel like the approach of failing builds when you deviate from the "golden" snapshots is not as helpful as you'd think. Personally, i'd rather have a golden set, and have the CI process regenerate them and show you the differences so that you can visualize the changes as part of the code review process and approve or not approve the PR based on that.

agrosner

04/10/2025, 6:32 PM

That’s exactly how they work, you just need to check the files in , otherwise it has no idea what the golden images are. CI can be setup to display the diffs paparazzi outputs so you can review and approve the changes. Not sure how that differs here

ursus

04/10/2025, 7:23 PM

any chance youre using gitlab? (im looking for as how to surface it in PR)

agrosner

04/10/2025, 7:29 PM

No build kite trigger that send back the results to GitHub PRs

ursus

04/10/2025, 7:30 PM

how? a comment?

👍 1

mattinger

04/11/2025, 1:15 PM

@agrosner It depends on how you've implemented it. I suspect a lot of folks use it as a gate to get through. They run the verifyPaparazzi/verifyRoborazzi task (depending on which library you're using) as part of their CI and they fail pull requests that don't pass this task. This presents issues because that means every visual changes requires a re-record of snapshots. And that in and of itself can cause issues if you don't have a way to do that on your PR branch through the CI environment. Both libraries are highly sensitive to the environment you're in (jdk, os, etc....) and tend to have what humans would call a false positive, but from a machine standpoint would be a visual difference worthy of a failure.

mattinger

04/11/2025, 1:16 PM

That's discussed here: https://github.com/takahirom/roborazzi/issues/351

agrosner

04/11/2025, 1:44 PM

Paparazzi has had very few issues on a massive monorepo with 400 modules. And yes every visual change requires a re recording. We isolate our components for snapshots as lowest common denominator. So mostly intentional changes are caught

agrosner

04/11/2025, 1:45 PM

a visual change should always be reviewed. paparazzi has a built in buffer to account for drift. occasionally someone might run into it and have to rerecord but its not the norm

agrosner

04/11/2025, 1:46 PM

you rerecord locally for the current module if they fail, and you expect the visual changes

Colton Idle

04/11/2025, 3:13 PM

my biggest issue with paparazzi is that updates to AGP end up causing issues with paparazzi or we can't update agp because of paparazzi. idk if thats just a me problem tho.

mattinger

04/11/2025, 3:41 PM

Yeah, i haven't found the right amount of drift to allow. I'm curious what people are using. If you go too high you could miss actual real changes. Too low and you get environment specific things like anti-aliasing and the like triggering these. I'm wondering what other folks are using. I'm trying to standardize us around a set of canned values:

Copy code

enum class SnapshotSensitivity(
    val changeThreshold: Float
) {
    High(0.0001f),
    Medium(0.001f),
    Low(0.01f)
}

mattinger

04/11/2025, 3:42 PM

i figure 1% out to be the lowest sensitivity, but like i said i'm curious what others are using.

agrosner

04/11/2025, 4:15 PM

Honestly using the default paparazzi hasn’t been an issue that much. But each project is unique

agrosner

04/11/2025, 4:15 PM

We have 20+ engineers all contributing

mattinger

04/11/2025, 5:45 PM

Yeah, we're much bigger than that, I don't know the exact count, but big. I'd guess somewhere around 100 at this point

👍 1

ursus

04/11/2025, 6:46 PM

100 engineers in a single apk? damn

142 Views

Open in Slack

Previous Next