A modern programming language that makes developers happier.

kotlinlang

Hi, what do you recommend for streaming an audio response?

Koog doesn't have yet Google's `Gemini 2.5 Flash Preview TTS` model, but I figured how to just instantiate a `LLModel` manually providing the model id and capabilities.
For the capabilities though, there's no audio capability. The only multimodal one is `Vision`.

Aside from this, how would you handle the strategy definition part? In the docs, I could only see an example for streaming structured data, which is very useful, but didn't help me much with this.

Hi
Right now, only text input, output are supported. Support for media content, especially images, is planned for later. You can <https://github.com/JetBrains/koog/issues|create a related issue> to make it easier to track

I didn’t quite understand your question about the strategy, what’s your use case?

Thanks, I created an issue :+1:

For the strategy, I was referencing your code <https://docs.koog.ai/streaming-api/#working-with-a-stream-of-structured-data:~:text=%7D-,4.%20Use%20the%20parser%20in%20your%20agent%20strategy,-val%20agentStrategy%20%3D|example here> for writing a node that streams data. I was just wondering how we could stream audio output following this example.

&gt; For the strategy, I was referencing your code <https://docs.koog.ai/streaming-api/#working-with-a-stream-of-structured-data:~:text=%7D-,4.%20Use%20the%20parser%20in%20your%20agent%20strategy,-val%20agentStrategy%20%3D|example here> for writing a node that streams data. I was just wondering how we could stream audio output following this example.
If I understand your request correctly, you will need any JVM (Java or Kotlin) library that can work with audio streams. from the llm you will simply be receiving a bytearray, which you can then handle in the usual way.

But as I mentioned, media data is currently not supported, only text