I have an algorithm that's very easy to run in parallel (think finding a best path), extremely CPU intensive, and would actually benefit from running only n branches at once for n CPU threads (once a solution is found, branches can be culled eagerly, so exploring a solution fully quicker is better than exploring more than n at once). Is there a dispatcher that would block new coroutines until some are finished, in such a way that only n would run for n CPU threads? Or is this too high level for dispatchers and I have to write my own orchestration logic?