Is there a KMP library for running LLM inference (...
# ai
f
Is there a KMP library for running LLM inference (locally)?
s
Not kmp, but there is pure java library based project panama and vector APIs..It’s pretty neat - https://github.com/tjake/Jlama
p
Not quite pure, after all there’s some native code in there. But yea, it's good library Also there is nirmato-ollama, client for ollama
👍 1
f
Thanks guys, But jlama only does CPU inference, And I'm looking for in-process inference. I think I'll make a KMP library starting with Java-llama.cpp (not to be confused with Jlama)
p
interesting, what targets do you need need? btw, there's also kinference for running onnx models, and here's example with gpt2. But I’m not sure if it will be supported in the future. There haven’t been any commits recently
f
Currently desktop, android ,web
so for web a webgpu backend is important
s
CPU inference,
Yeah, i think wip for webgpu support - https://github.com/tjake/Jlama/pull/150 , but yeah this won’t be multiplatform.
f
The state of LLM inference outside of desktop is sad, web libraries either lack gpu inference or freeze the browser. No android library as far as I can see Actually there might be something
yeah so apparently java-llamacpp doesn't use the gpu version of llamacpp