I've also become nervous recently that implementing high performance numerical libraries in a user-friendly OO language will require a technique available to eliminate virtual function call overhead outside of just the JIT. For this I have no evidence yet (I haven't done the fine-grained benching required to separate out delay sources, which can be quite tricky when doing cross language benching of JIT-ed languages), but it is something that chris lattner called out specifically when talking about using the jvm/java for tensorflow ("it is possible that our approaches could work for this class of languages, but such a system would either force model developers to use very low-abstraction APIs (e.g. all code must be in a final class) or the system would rely on heuristic-based static analysis techniques that work in some cases but not others."). Koma is actually close to having a "final-only" class for
, except for the fact that I want to support multiple backends at runtime. I'm not sure how useful that feature actually is to anyone though. If I said "only import one back-end at a time" I could probably rely on an
around the underlying memory (e.g. around the pointer on k/native to the matrix memory) or at least a final one with whatever it needs inside. Of course, before doing so I'd have to produce some metric to show its worth the pain. It's probably drowned out by other sources of slowness for now.
3 years ago
I've performed some performance study and I did not see any significant overhead from virtual calls on JVM. In most cases they are just being inlined. But maybe it is because in
all data structures are in fact final, all operations are moved to context that could be changed.
3 years ago
JVM is extremely efficient at inlining even when it is virtual, but there is a single implementation at run time.