Hi all, I'm trying to find the best practice for u...
# ktor
d
Hi all, I'm trying to find the best practice for uploading large files via
multipart/form-data
with
HttpClient
without running out of memory. I've summarized all the different methods I could find as of Ktor
3.2.3
in the first message in thread. The official documentation primarily highlights the first method (using a
ByteArray
), which seems unsuitable for large files due to its high memory consumption. From the streaming options I've listed, which is considered the most reliable and efficient? Also, would it be helpful to file a documentation issue to add examples for this use case? Any insights would be greatly appreciated. Thanks!
🧵 1
Code sample:
Copy code
import io.ktor.client.HttpClient
import io.ktor.client.request.forms.ChannelProvider
import io.ktor.client.request.forms.InputProvider
import io.ktor.client.request.forms.append
import io.ktor.client.request.forms.formData
import io.ktor.client.request.forms.submitFormWithBinaryData
import io.ktor.client.statement.bodyAsText
import io.ktor.http.HttpHeaders
import io.ktor.http.escapeIfNeeded
import io.ktor.http.headers
import io.ktor.http.isSuccess
import io.ktor.util.cio.readChannel
import io.ktor.utils.io.ByteReadChannel
import kotlinx.coroutines.runBlocking
import kotlinx.io.asSource
import kotlinx.io.buffered
import java.io.File
import kotlin.use

private const val FILE_KEY = "file"

fun main() = runBlocking {
    val client = HttpClient()
    val url = "<http://example.com/api/v1/multiPartUploadPath>" // Some external API.
    val uploadFilePath = "/path/to/file/upload" // Some huge (500+ Mb) artifact to upload.

    // Explicit part size calculation is omitted for the sake of simplicity of the examples.
    val response = client.submitFormWithBinaryData(
        url = url,
        formData =  formData {
            // 1. With ByteArray, from the Ktor documentation <https://ktor.io/docs/client-requests.html#upload_file>
            // Loads everything entirely into ByteArray, then copies it into Buffer, can be slow and lead to OOMs.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L50>
            append(
                key = FILE_KEY,
                // Can be done in KMP with Kotlinx IO like this:
                // value = SystemFileSystem.source(Path(uploadFilePath)).buffered().readByteArray(),
                value = File(uploadFilePath).readBytes(),
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            )


            // 2. With InputProvider of Source
            // Loads content as needed, but possibly blocks under the hood when new bytes are requested.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L63>
            append(
                key = FILE_KEY,
                // Can be done in KMP with Kotlinx IO like this:
                // value = InputProvider { SystemFileSystem.source(Path(uploadFilePath)).buffered() },
                value = InputProvider { File(uploadFilePath).inputStream().asSource().buffered() },
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            )
            // Or the same (InputProvider of Source is built inside .appendInput(...))
            appendInput(
                key = FILE_KEY,
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            ) {
                // Can be done in KMP with Kotlinx IO like this:
                // SystemFileSystem.source(Path(uploadFilePath)).buffered()
                File(uploadFilePath).inputStream().asSource().buffered()
            }


            // 3. With Source directly
            // Loads content as needed, but possibly blocks under the hood when new bytes are requested.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L56>
            append(
                key = FILE_KEY,
                // Can be done in KMP with Kotlinx IO like this:
                // value = SystemFileSystem.source(Path(uploadFilePath)).buffered(),
                value = File(uploadFilePath).inputStream().asSource().buffered(),
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            )


            // 4. With ChannelProvider of ByteReadChannel of Source
            // Loads content as needed, but possibly blocks under the hood when new bytes are requested.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L70>
            append(
                key = FILE_KEY,
                // Can be done in KMP with Kotlinx IO like this:
                // ChannelProvider { ByteReadChannel(SystemFileSystem.source(Path(uploadFilePath)).buffered()) }
                value = ChannelProvider { ByteReadChannel(File(uploadFilePath).inputStream().asSource().buffered()) },
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            )


            // 5. With ChannelProvider with of File.readChannel(...): ByteReadChannel
            // Loads content as needed, but possibly blocks under the hood when new bytes are requested.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L70>
            append(
                key = FILE_KEY,
                // Cannot be done in KMP, as there's no alternative to readChannel(...) for kotlinx.io.files.Path
                value = ChannelProvider { File(uploadFilePath).readChannel() },
                headers = headers {
                    append(HttpHeaders.ContentDisposition, "filename=${uploadFilePath.escapeIfNeeded()}")
                }
            )


            // 6. With convenient .append(...) of `Sink.() -> Unit` builder
            // Loads everything entirely into Buffer under the hood, can be slow and lead to OOMs.
            // Internals: <https://github.com/ktorio/ktor/blob/3.2.3/ktor-client/ktor-client-core/common/src/io/ktor/client/request/forms/formDsl.kt#L231>
            append(
                key = FILE_KEY,
                filename = uploadFilePath, // `Content-Disposition: filename="..."` is calculated from this conveniently.
            ) {
                // Can be done in KMP with Kotlinx IO like this:
                // SystemFileSystem.source(Path(uploadFilePath)).use { transferFrom(it) }
                File(uploadFilePath).inputStream().asSource().use { transferFrom(it) }
            }
        },
    )

    check(response.status.isSuccess())

    println(response.bodyAsText())
}
a
I would go with
value = SystemFileSystem.source(Path(uploadFilePath)).buffered()
because this API can be used in KMP, and the source returned from
buffered
, buffers reads from the original source, so the file will be read by chunks.
d
That would be number 3? I spent some time benchmarking after posting initial question, and this is the slowest one, by far.
Copy code
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart1>"` | 167.3 ± 5.2 | 159.0 | 178.4 | 1.15 ± 0.04 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart2>"` | 148.4 ± 7.6 | 139.5 | 165.4 | 1.02 ± 0.06 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart3>"` | 1254.9 ± 33.9 | 1215.8 | 1341.0 | 8.59 ± 0.31 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart4>"` | 146.0 ± 3.4 | 140.5 | 156.0 | 1.00 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart5>"` | 146.1 ± 5.8 | 140.0 | 165.8 | 1.00 ± 0.05 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/100M" "<http://0.0.0.0:4040/uploadFileByMultiPart6>"` | 174.5 ± 5.8 | 164.5 | 187.0 | 1.20 ± 0.05 |
a
Yes, number 3. Does your benchmarking code use
SystemFileSystem.source(Path(uploadFilePath)).buffered()
or
File(uploadFilePath).inputStream().asSource().buffered()
?
d
I have just tried both now, head to head, results are the same. Also numbers grow exponentially with increase in payload size:
Copy code
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart1>"` | 1.621 ± 0.025 | 1.577 | 1.663 | 1.22 ± 0.02 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart2>"` | 1.325 ± 0.015 | 1.307 | 1.350 | 1.00 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart3>"` | 208.400 ± 19.233 | 194.784 | 257.638 | 157.25 ± 14.62 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart4>"` | 1.429 ± 0.025 | 1.387 | 1.476 | 1.08 ± 0.02 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart5>"` | 1.620 ± 0.359 | 1.417 | 2.536 | 1.22 ± 0.27 |
| `curl -X POST -H "UPLOAD_FILE_PATH: /home/nevack/work/test_data/1000M" "<http://0.0.0.0:4040/uploadFileByMultiPart6>"` | 1.956 ± 0.062 | 1.846 | 2.065 | 1.48 ± 0.05 |
I do not have proof, as I have not profiled the code yet, but my hypothesis is
{ value.peek() }
, that is done on passed Source, is doing an inefficient copy of original Source.
RealSource::peek
creates
PeekSource(this).buffered()
https://github.com/Kotlin/kotlinx-io/blob/0.7.0/core/common/src/RealSource.kt#L145
a
Then number 5 (
ChannelProvider { File(uploadFilePath).readChannel() }
). Which platforms do you need to support?
d
I build CLIs with Kotlin/Native and backend with Kotlin/JVM. I lean to 2 or 4, as they are similar performance wise and can be used in
common
sources. Returning to the original question: Would it be helpful to file a documentation issue to add examples for multipart uploads of large files?
a
It would be greatly appreciated if you do so.
👍 1
d
Before filing the issue I have taken a look at ktor documentation repo. For my surprise, I found this PR https://github.com/ktorio/ktor-documentation/pull/659 I have no access to the issue https://youtrack.jetbrains.com/issue/KTOR-7365 But seems like this was already addressed recently.
a
Yes, but it doesn't show all the different methods of adding a file part and their respective pros and cons.