https://kotlinlang.org logo
#datascience
Title
# datascience
h

holgerbrandl

10/13/2023, 11:41 AM
Hi, can we read/write zipped csvs with kotlin-dataframe?
z

zaleslaw

10/14/2023, 7:07 AM
I suppose the answer "no", but not hard to add support if you give some use-cases (how is structured the zip, how many csv files in one zip) or you could help with reference on r/python/rust df apis
👍 1
n

Nikita Klimenko [JB]

10/14/2023, 12:31 PM
About reading: readCSV actually tries to read files ending with .zip and .gz, but apparently this feature wasn't tested and fails for ZIP 🙂 So this is a bug, and i'll create an issue But you can still use lower level methods that work with InputStream / OutputStream:
Copy code
val zipInputStream = ZipInputStream(
    File("data.csv.zip").inputStream(),
    Charsets.UTF_8
)
zipInputStream.nextEntry
val df1 = DataFrame.readCSV(zipInputStream)
zipInputStream.closeEntry()
Copy code
val zipFilePath = "data.csv.zip"
val fileOutputStream = FileOutputStream(zipFilePath)
val zipOutputStream = ZipOutputStream(fileOutputStream)

val zipEntry = ZipEntry("data.csv")
zipOutputStream.putNextEntry(zipEntry)
val writer = OutputStreamWriter(zipOutputStream, Charsets.UTF_8)

df1.writeCSV(writer)

writer.close()
zipOutputStream.close()
fileOutputStream.close()
5 Views