I am assuming you are doing this on Android, so you are using tensorflow lite? How many fps are you doing, or trying to do? Are you using YOLO or some similar approach to draw bounding boxes?
Off the top of my head I would think see how slow you can reduce the framerate until it becomes an issue, and you may want to save the raw data to a video so you can send it to a server to use a better model.
You can also use a simpler model on the phone.