Elyes Mansour
05/24/2025, 10:07 PMGemini 2.5 Flash Preview TTS
model, but I figured how to just instantiate a LLModel
manually providing the model id and capabilities.
For the capabilities though, there's no audio capability. The only multimodal one is Vision
.
Aside from this, how would you handle the strategy definition part? In the docs, I could only see an example for streaming structured data, which is very useful, but didn't help me much with this.Pavel Gorgulov
05/26/2025, 7:25 AMElyes Mansour
05/26/2025, 9:12 AMPavel Gorgulov
05/26/2025, 4:42 PMFor the strategy, I was referencing your code example here for writing a node that streams data. I was just wondering how we could stream audio output following this example.If I understand your request correctly, you will need any JVM (Java or Kotlin) library that can work with audio streams. from the llm you will simply be receiving a bytearray, which you can then handle in the usual way. But as I mentioned, media data is currently not supported, only text