Hello! I've been trying to get Koog to extract str...
# koog-agentic-framework
m
Hello! I've been trying to get Koog to extract structured data from a pdf, but AFAICT the pdf never actually gets sent as part of the context (replicating the prompts in the gemini UI, for instance, there haven't been any issues, but doing the same with Koog typically results in all null fields or sometimes entirely hallucinated values, hence my suspicion that it is due to the missing attachment. I suppose there could be a number of explanations for this, but I'm hoping someone here could help me narrow it down, either by (1) informing me that I've misunderstood or mistaken some part of the use of the framework (snippet below), or (2) telling me its a known issue, or (3) providing information about debugging issues like this in Koog (I haven't seen documentation specifically on debugging failures, but if I've missed the docs, just let me know). This is basically the main part:
Copy code
val promptExecutor = simpleGoogleAIExecutor(geminiApiKey)
val exampleStructure =
    JsonStructuredData.createJsonStructure<Example>(
        schemaFormat = JsonSchemaGenerator.SchemaFormat.JsonSchema,
        schemaType = JsonStructuredData.JsonSchemaType.FULL,
    )

val structuredResponse =
    promptExecutor.executeStructured(
        prompt =
            prompt("example") {
                system(SystemPrompts.EXAMPLE_SYSTEM_PROMPT)
                user {
                    text("Please extract information from the provided document.")
                    attachments {
                        Attachment.File(
                            content = AttachmentContent.Binary.Bytes(document),
                            format = "pdf",
                            mimeType = "application/pdf",
                        )
                    }
                }
            },
        structure = exampleStructure,
        mainModel = GoogleModels.Gemini2_5Flash,
        retries = 5,
    )
d
I don't have access to Gemini but, looking at your code, it seems your prompt doesn't correctly use the attachments DSL. It should read:
Copy code
prompt("example") {
    system(SystemPrompts.EXAMPLE_SYSTEM_PROMPT)
    user {
        text("Please extract information from the provided document.")
        attachments {
            attachment(
                Attachment.File(
                    content = AttachmentContent.Binary.Bytes(document),
                    format = "pdf",
                    mimeType = "application/pdf",
                )
            )
        }
    }
}
m
Ah right, I attempted this using the
binaryFile
function awhile ago, and in that case (and as I've updated the code now), the framework is throwing from here:
Copy code
require(model.capabilities.contains(LLMCapability.Document)) {
                            "Model ${model.id} does not support documents"
                        }
Gemini does support document uploads; is this something that is configurable consumer-side?
a
Since models are just instances of
LLModel
, you can create and pass your own model based on the built-in one. Either copy-pasting the source and adjusting parameters as you need, or using
.copy
if you don’t want to redefine all the parameters. In your case, it might look something like this
Copy code
val MyGemini2_5Flash = with(Gemini2_5Flash) {
    copy(
        capabilities = capabilities + LLMCapability.Document
    )
}
❤️ 2
d
Looks like an oversight. As a workaround, you can
.copy
the model and add the appropriate capability.
👌 1
❤️ 1
m
Thanks to you both; this worked like a charm. I'll draft a PR to update this.
❤️ 2
🙏 1
e
hey @micah i need something similar. can you show me how you fix it... i had the same response that "Model gpt-4o does not support files"
m
Andrey's response above:
Copy code
val MyGemini2_5Flash = with(Gemini2_5Flash) {
    copy(
        capabilities = capabilities + LLMCapability.Document
    )
}
then when invoking execution:
Copy code
mainModel = MyGemini2_5Flash
I did put up a PR that added this capability to a bunch of the popular models (the ones that were relatively simple to determine whether they supported PDFs or not), so I expect it should be working without this work around whenever the next release happens
e
niceee! thanks. i'll try that
👍 1