Hi all, my company is working on a project to buil...
# opensource
b
Hi all, my company is working on a project to build a simple web application from scratch with the fewest possible dependencies (if you don't count the testing dependencies). Would love to hear your thoughts. https://github.com/7ep/r3z
j
I had a look, and as with any non-trivial codebase, there's a lot of design decisions there which could be argued with. It would help if the goal of this project were clearer. Is this an educational exercise in building an application server? Or is this meant as a proposed basis for a production-ready product? If so, what would be the supported use cases, and desired performance characteristics? Without this kind of background information I feel would be very difficult to discuss the finer points of the design
b
Hi yes, it's an educational exercise
j
Ah nice, seems like a nice way to get a feel for the fundamentals of an app server. Just a few general remarks then. First, I'm not sure why you limit the threadpool to one thread per core when you are using a thread-per-request model and blocking I/O. You are now limiting your concurrency to the number of cores which seems rather limited. Second, I worry about the thread-safety of the PureMemoryDatabase. There's no locking mechanism to keep the internal mutable collections consistent, and also the writing to disk isn't serialized leading to a very real possibility of data loss. Finally, the password hashing algorithm used is very weak. Even in an educational system you'd really want to show how it should be done, because people might copy this thinking it is best practice. The safe choice would be bcrypt, see https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
b
Hmm I admit that bcrypt seems like an interesting choice, but I cannot imagine calling SHA-256 weak, could you? (I'm doing some more reading on that. Thanks!) One mild hangup I have with bcrypt is that the password cannot be more than 72 bytes, but that can be handled. Well, I'm convinced. Onwards to Bcrypt!
I am definitely not a multi-thread guru, but the design was based on the fact that at the end of the day, you only get as much parallelization as cores, or is that not right?
The thread-safety of the PureMemoryDatabase is an interesting point. While it is true that nothing has been done to prevent corrupted state, it is also true, per my analysis, that it takes on average half a millisecond to complete a transaction. To corrupt the state would require two people to cause the same action on the same millisecond. We're still doing some testing of this, but it draws from the hypothesis that all the accrual of baggage to create safety just slows things down to prevent an incredibly slim possibility of risk. That's how the theory goes, we'll see.
j
Yeah, SHA256 is definitely too weak for the purpose of password hashing. Dictionary attacks are viable against SHA256, any decent GPU can calculate billions of hashes per second. NIST specifically recommends against using message digests for password hashing. This is one of those areas where you should just follow the experts or become one 🙂 (on that note, I'm not an expert on hashing, but I do teach a secure coding course and try to keep up with NIST recommendations in particular.) Also, you can definitely hash password of arbitrary length with bcrypt, there is no limitation there
And you're right, you can't have more parallelization than you have cores, but you can have more concurrency. Application servers like Tomcat use up to 200 threads for processing out of the box. They won't run in parallel, but they won't have to - once a thread gets blocked on IO, another can be scheduled on the CPU
And, sure, you can ignore thread safety of the PMD, but at your own peril. As long as this is used by one developer at a time you'll be good. But Murphy's law guarantees that some day two requests will come in at just the right time, and you will run into concurrency issues then. Whether that is or isn't a problem is up to you, but it would definitely be a huge issue for any production-ready software
âś… 2
b
Good calls on all points, much appreciated!!
c
Since you've started with Kotlin on the JVM, I would have considered trying a reactive model using coroutines.
b
I had considered using coroutines but my initial analysis didn’t seem to suggest that the benefits would outweigh the complexity
c
It definitely isn't simple. It is an interesting learning exercise. Reactive programming is important for scalability and reducing resource utilisation. Unless you are building software for a financial services company with deep pockets it is going to be important to maximize resource utilisation.
b
Our default stance is high skepticism of any technology unless it is obviously beneficial. The degree to which we aren't using technologies is part of the research. I cannot imagine a situation where we will include patterns or technologies just to play with them.
For example, do we need a well-known web server or database? It's not clear, let's try building it in the simplest way and see how far we get and how successful we are without it.