I have a list of lists of strings, basically just ...
# datascience
j
I have a list of lists of strings, basically just an excel sheet, of roughly 60-90MB size in RAM. This data needs to be exported as excel and am thinking about tinkering a little more with Kotlin Dataframe to do it. What would you say: Use Dataframe or just plain old apache POI?
e
Dataframe will give you a better UX imho
💯 1
plus all of Koltin's niceties
j
Gotcha
Do you have an idea, what's the CPU and Mem overhead, if any, for data of this size?
The process runs in a serverless worker with about 1.5GB Mem.
e
not sure really, maybe someone else can chime in
or run jconsole while running it and find out 🙂
j
Yeah. I think curiosity wins this round. I have already added the necessary lines to my build.gradle.kts 😄 Will get back with my findings on mem.
🙌 1
Ok, so my Matrix is about 70k lines with 32 cols. Cols are short, like not longer than 32 chars. I took a profiler to look at the excel workbook creation from a dataframe. I think I might take a shot at just using POI.
Might just be my bad programming, and the jvm just hogging everything it can get on my workstation. But somehow this seems like a little much for the parameters above
j
@Jens hmm 20GB indeed looks a bit excessive. How large is the excel file you end up with?
j
8.9 MB
👀 1
j
Might be something worth investigating on our end. Maybe we can't help it and it's because of our excel integration, but maybe we can optimize our integration in terms of memory :) if you have the time, could you create an issue on our GitHub? If possible, with a reproducible example. That way we can investigate :)
j
It's a feature that is already overdue, so I have get going a little. What I will do first, is to re-implement with vanilla Apache POI, which might also shed some more light on the problem. I will share my findings. When I'm done, I'll try to provide a small example.
thank you color 2
n
Hello! There's a class that could help to reduce peak memory consumption https://poi.apache.org/apidocs/dev/org/apache/poi/xssf/streaming/SXSSFWorkbook.html. It's a streaming writing implemetation
Copy code
val df = dataFrameOf('a'..'z').fill(1_000_000) { it }
val f = File("test.xlsx")
val xssfWorkbook = XSSFWorkbook()
val workbook = SXSSFWorkbook(xssfWorkbook, SXSSFWorkbook.DEFAULT_WINDOW_SIZE)
df.writeExcel(workbook)
workbook.write(f.outputStream())
This code peaks at 1.3 gb memory, while using
XSSFWorkbook
directly (holding whole sheet in memory) takes whole 15 gb It's from
Copy code
implementation("org.apache.poi:poi-ooxml:5.4.0")
👍 2
j
Cool, thx for the tip. Will check it out