Are `screenshot tests` worth it?
# compose
u
Are
screenshot tests
worth it?
c
Depends... I feel like some teams I've been on have gotten really insane about screenshot testing and all it does it cause issues every day and we stopped shipping and care more about screenshot tests. recency bias... but yeah. on my small projects I plan to incorporate it in just a few ways in a much smaller fashion instead of trying to screenshot test everything under the sun.
u
like what, just the "core" design system?
a
We have hundreds of those tests and hardly have had any issues. Drift sometimes happens but paparazzi does a good job of accounting for it
It’s tough to tell how many bugs it stops from happening because it gates changes. But it had caught visual bugs when snapshots are changed and someone inspects them
u
When do you run then? Along with rest of unit tests? Or somehow split the test suite?
c
i think my one team just went like overboard. we had hundreds of them and something would always break and cause pain.
u
It’s tough to tell how many bugs it stops from happening because it gates changes. But it had caught visual bugs when snapshots are changed and someone inspects them
not sure if I understand this bit - snapshots = screenshots paparazzi takes? someone inspects them? why? isnt that the tools job?
a
Yeah we run on every PR using categorized junit tests and split them from regular ones since they take longer to run if done together with regular unit tests
1
Ours hardly break because they only do individual components or composite ones
So they break if our design system updates , or if a dev changes the inner business logic of that composable without expecting a change, if they expect to change they just check in the changes
They’re paparazzi screenshots / snapshots yes
u
Yes the speeds kind of bothers me, hence I'm contemplating if they're worth it, since I wanted to use them as you do, for the core design system components - and to be frank there's not a lot of traffic in that module once it's done
a
Yeah it depends. Paparazzi is much faster than roborazzi. We use parameterized tests and run hundreds if not thousands of them and they run in less than 10 minutes. Not bad! But could be long
1
Typically engineers just run the specific test class only and not all until CI
c
IMO. Just follow this recent video to enable screenshot testing

https://www.youtube.com/watch?v=Y9GWnwi9D0I

and see if it's useful for you/your team.
e
since they take longer to run if done together with regular unit tests
Is that documented anywhere, or just something you found in your projects?
a
We found in our projects. We have them in 10-20 modules and they run parameterized so there’s so many
👍 1
So we skip them in junit and run them only in special ci builds
m
They are useful, but i feel like the approach of failing builds when you deviate from the "golden" snapshots is not as helpful as you'd think. Personally, i'd rather have a golden set, and have the CI process regenerate them and show you the differences so that you can visualize the changes as part of the code review process and approve or not approve the PR based on that.
a
That’s exactly how they work, you just need to check the files in , otherwise it has no idea what the golden images are. CI can be setup to display the diffs paparazzi outputs so you can review and approve the changes. Not sure how that differs here
u
any chance youre using gitlab? (im looking for as how to surface it in PR)
a
No build kite trigger that send back the results to GitHub PRs
u
how? a comment?
👍 1
m
@agrosner It depends on how you've implemented it. I suspect a lot of folks use it as a gate to get through. They run the verifyPaparazzi/verifyRoborazzi task (depending on which library you're using) as part of their CI and they fail pull requests that don't pass this task. This presents issues because that means every visual changes requires a re-record of snapshots. And that in and of itself can cause issues if you don't have a way to do that on your PR branch through the CI environment. Both libraries are highly sensitive to the environment you're in (jdk, os, etc....) and tend to have what humans would call a false positive, but from a machine standpoint would be a visual difference worthy of a failure.
a
Paparazzi has had very few issues on a massive monorepo with 400 modules. And yes every visual change requires a re recording. We isolate our components for snapshots as lowest common denominator. So mostly intentional changes are caught
a visual change should always be reviewed. paparazzi has a built in buffer to account for drift. occasionally someone might run into it and have to rerecord but its not the norm
you rerecord locally for the current module if they fail, and you expect the visual changes
c
my biggest issue with paparazzi is that updates to AGP end up causing issues with paparazzi or we can't update agp because of paparazzi. idk if thats just a me problem tho.
m
Yeah, i haven't found the right amount of drift to allow. I'm curious what people are using. If you go too high you could miss actual real changes. Too low and you get environment specific things like anti-aliasing and the like triggering these. I'm wondering what other folks are using. I'm trying to standardize us around a set of canned values:
Copy code
enum class SnapshotSensitivity(
    val changeThreshold: Float
) {
    High(0.0001f),
    Medium(0.001f),
    Low(0.01f)
}
i figure 1% out to be the lowest sensitivity, but like i said i'm curious what others are using.
a
Honestly using the default paparazzi hasn’t been an issue that much. But each project is unique
We have 20+ engineers all contributing
m
Yeah, we're much bigger than that, I don't know the exact count, but big. I'd guess somewhere around 100 at this point
👍 1
u
100 engineers in a single apk? damn
131 Views