I have a scenario in which multiple users of the a...
# spring
c
I have a scenario in which multiple users of the application will be simultaneously uploading excel sheet(small to huge) to the spring boot backend. After uploading that excel sheet, I have to process it for validations using POI for that which is fine and then store the whole excel sheet data in a database. What libraries or architecture should I choose? Is Spring data flow is the correct choice?! Or Uploading those files to Amazon S3 bucket and after uploading Queuing them in Amazon SQS and processing them in Spring MVC is sufficient for concurrent file uploads, processing and storing them in a DB without table inconsistency.
๐Ÿ˜ถ 2
K 1
p
For excel sheet -> DB, I think spring batch could be an option.
๐Ÿ‘ 1
c
Yep. What about processing multiple files simultaneously and storing them in a DB? Do I need to store them if I use Spring Batch? Or Spring Batch can handle multiple files simultaneously? And I think it will need more CPU and Memory resources right?
a
I have tried both approaches and have currently settled on the second (i.e. Spring Multipart File -> S3 -> message (with S3 URL) to SQS -> multiple replica's of SQS readers to process from SQS). The system scales very well as the resources are fairly distributed (between upload and processing) and S3 can store the raw original (well, it goes to Glacier eventually) and the "cooked" (parsed) product.
๐Ÿ‘ 1
c
Nice! And what about DB table data consistency if you are storing concurrently?
a
In our use case it's never the "same" file - i.e. each user uploads sequentially. There are sometime hundreds uploading concurrently End of Month processing), but we do not have a "concurrency" issue like that. In any case the SQS would like the documents up "single-file" anyway - that should help with the concurrency/collision issue (you could coordinate multiple reader/processors using a FencedLock in HZ if absolutely necessary).
๐Ÿ‘ 1
t
hard to answer this definitively without more context about the application, but if you want a cluster-aware job scheduler that plays nicely with spring, check out https://github.com/kagkarlsson/db-scheduler
๐Ÿ‘ 1
one advantage of keeping it in-process is that it's much easier to integration test than the usual rube-goldberg machine of SQS/Lambda triggers
๐Ÿ‘ 2
otoh, uploading huge files to your application server may not be great from a CPU/RAM/disk perspective, and offloading that compute to a separate microservice may be simpler overall
๐Ÿ‘ 1