im working with random access on MAXLONG bytes and i can only map a window the size of MAXINT bytes at any one IO access. the mapping is not a cheap operation and there is no parent state that guards the IO. the bigger the setup and teardown cost of marshalling the current IO window for the current task in undefined thread, the more threads i need to catch up to the lost IO. it works great in single threaded code but 1 core is not io-bound. so the transition to multithreaded code is throwing state in a threadlocal. i havee not located the documentation that tells me "here is how to make coruotines stop scrambling the threadlocals they execute under.
so if i just use thread directtly, seems like ill have a better time of predicting what mapping is in the threadlocal and minimize the unintended remapping calls