Data science friends what do you think of this <https kotlin kotlinlang #datascience

Data-science friends, what do you think of this? <...

jmfayard

09/26/2019, 11:22 AM

Data-science friends, what do you think of this? https://kotlinfrompython.com/2017/10/29/when-will-your-project-grow-up-and-require-typesafe-code/

altavir

09/26/2019, 12:05 PM

I think I read the article some time ago (it is rather old). And I mostly agree that Python is good for prototyping. The problem is that when you understand that your prototype is beyond Python niche, it is too late to invest in rewriting it in statically-typed language. The advantage of Kotlin is that you can switch to it sooner (because of very flexible syntax).

jmfayard

09/26/2019, 12:57 PM

@altavir is there a better/more recent article you could recommend?

altavir

09/26/2019, 1:04 PM

You can see in #science, we were discussing some posts, but in general, I do not think it is something, you can write in an article. It is just a feeling you start to get after some time when developing some scientific application. I will give a talk about it in KotlinConf this year. Guys from my laboratory are migrating from python to kotlin. At the beginning there were some complaints about "complexity" of kotlin code compared to python, but in the end, the results are astonishing. The code is much more readable, maintainable and faster. For small tasks like "get data -> maake plot" python is better and I still do it in Python even when I do not like it, but for something larger, we write it in kotlin from the beginning.

💯 2

jmfayard

09/26/2019, 1:05 PM

I would be happy to see your talk at KotlinConf, see you there

Amir Gur

09/26/2019, 7:18 PM

@altavir - to your point:

For small tasks like "get data -> maake plot" python is better and I still do it in Python even when I do not like it

What's missing for you to make Kotlin the first choice even for smaller tasks like that? And next we need to ask what's missing for everyone and ensure python-first minded ppl are happy too.

altavir

09/26/2019, 7:43 PM

Most importantly - simple visualization tools. We are slowly working on plotly.kt and I hope that there will be other tools as well to remedy that. Second thing is quick and dirty scripting. In Kotlin you need to create a project for anything and it is not always nice. Notebook mode is available via beakerx, but scripting framework is still not able to handle it properly (they are working on it).

Amir Gur

09/26/2019, 7:50 PM

Sounds good, thanks. Are those scripting framework improvements part of the existing Kotlin-REPL?

altavir

09/26/2019, 7:56 PM

Old console REPL will be deprecated if I understand correctly. Current scripting based on keep-77 (https://github.com/Kotlin/KEEP/blob/master/proposals/scripting-support.md) and is really promising, but for now lacks some critical things for notebook mode.

jimn

09/28/2019, 4:00 PM

syntax freedom is orthogonal to all of the programmer benefits kotlin is focussing on, but really it looks like (c#)++ is all we're going to get from the kotlin language anytime soon.

jimn

09/28/2019, 4:06 PM

the ES shell (https://stuff.mit.edu/afs/sipb/user/yandros/doc/es-usenix-winter93.html) is embodied partially in bash with basically no restrictions on the ability to define an immediate mode scripting dialect on top of the basic building blocks which are by themselves adequate.

jimn

09/28/2019, 4:07 PM

i prefer bash to python personally, but kotlin is not as generally handy or comfortable for simple glue code.

jimn

09/28/2019, 4:08 PM

in fact, given the jdk's serialization headaches and the laggin native progress, it's really only for building shiny trophy examples of how to better express java than java.

altavir

09/28/2019, 4:08 PM

Well, @cedric would argue with you because he created a kotlin-based shell. Anyway, we are talking about development, not the glue code.

altavir

09/28/2019, 4:09 PM

There are no problem with serializarion in JDK and especially not in kotlin. We can discuss it in a separate thread.

jimn

09/28/2019, 4:11 PM

noone will step forward and give a reference to working hazelcast kotlin serialization on kotlinlang slack, i'll leave it at that.

Amir Gur

09/30/2019, 9:59 PM

@jimn - let's try to collaborate on your serialization Q. I noticed your ticket for that on the hazelcast project too. Yes, positive that serialization strategies as you describe can be defined for known and custom cases. As @altavir is saying the default serialization on jvm/kotlin does work. Are you trying improve performance and thus thinking of protobuf or custom serialization? Some alternatives to consider are Thrift, Fast Buffers. I don't know if it's just the setup we had when I last used protobuf but it was fairly cumbersome to maintain/extend/support. If what you need is a working protobuf-kotlin example, maybe we can kotlinize this simple java example: https://github.com/hazelcast/hazelcast-code-samples/tree/master/serialization/protobuf-serializer Also sounds like, folks working for either geode, ignite, or hazelcast may have an interest to support and add Kotlin examples, clarifications and improvements, since Kotlin is one of the fastest growing languages right now.

altavir

10/01/2019, 4:08 AM

You should also remember about kotlinx.serialization. it can do protobuf out of the box. I do not have time to read about hazelcast, but you can use it anywhere, where you can use custom serialization.

jimn

10/01/2019, 6:20 AM

val myMap:Map<String,Pair<String,String>>hz.getMap() is enough to break hazelcast distributed objects and remote callables

jimn

10/01/2019, 6:24 AM

explicit protobuf tooling is also kind of a sacrifice of enjoyable kotlin. im not really up to date on the specifics of kotlin serdes+tmfm but it seems pretty exciting. I think fixing hazelcast for kotlin is a great target for hazelcast inc. to keep current. the crossroads described by the 3-4 different models hazelcast belongs in more begs for serialization and coroutines to become externalized and distributed first class network smalltalk object descriptions, or whatever the hell works (toward first class kotlin)

jimn

10/01/2019, 6:29 AM

the usecases of running callable<Pair<String,myKotlinDataClass>> is ussuallly enough to cause kotlin rewrites to "working" serialization. the deficit is in hazelcast serializable java using java as a contract for remote execution and object description, and kotlin being only partially inside the box

jimn

10/01/2019, 6:34 AM

in https://github.com/0xCopy/prautobeans i have done the protobuff academia to allow protobuf definitions to shape snugly fit binary serialization packets as javabeans that don't choke GWT (a concept called autobeans)

jimn

10/01/2019, 6:35 AM

rather than leverage protobuffers i stuck with off-heap capable constructs in NIO, protobuff uses heap objects.

jimn

10/01/2019, 6:36 AM

in jvm-land you can define object proxies cheaply and that's really all you need to satisfy the proto IDL

jimn

10/01/2019, 6:36 AM

it's been a few more than a few more years since i had the energy to change the world on this scale.

jimn

10/01/2019, 6:39 AM

i honestly prefer hazelcast's Serialization interface solution, except that kotlin colors outside the lines and hazelcast forces you to write non-benefitical kotlin wrappers around distributed java objects. i had forgotten how much work is involved in maven generators with proto until now

Amir Gur

10/02/2019, 5:57 AM

@jimn - Regarding:

callable<Pair<String,myKotlinDataClass>> is ussuallly enough to cause kotlin rewrites to "working" serialization

would that still be an issue if using kotlinx.serialization? if yes, can you please help clarify

callable

you refer to, or better yet, give the lightest project that can reproduce the issue? That will help others look into it and try help.

➕ 1

jimn

10/02/2019, 8:07 AM

DbEntity.kt

jimn

10/02/2019, 8:09 AM

a kotlin class like this one arises from something to take the hand-off from jdbc, generify it, and then slowly descend the spiral of despair with hazelcast enablement incrementally introducing such things as "turds" to the object model to be transient or explicit serialization workarounds

jimn

10/02/2019, 8:12 AM

rather than "just" settling for kotlin with a hack like this, I'd love to mercilessly refactor in Arrow-kt expressions as well, where time permits. my dayjob affords me a few minutes at a time to add commits to this open source where im specifically not violating NDA's, and this is pre-existing of this engagement, again, a stable work from a long time ago getting new use.

jimn

10/02/2019, 8:15 AM

having a hazelcast serialization strategy that for instance kills destructuring syntax or forces "turd" get() methods is only enjoyable once or twice, but remains a code-smell

jimn

10/03/2019, 9:18 AM

a while ago i gave a look at borrowing the jvm compiler and building a compositional ISAM struct or otherwise, a wire format from declaratives. https://github.com/jnorthrup/1xam/blob/master/README.md#1xam i got as far as applying this generator to the subl-lisp object hierarchy of the CYC expert system in a related project on github (bitecode). the focus of this, again, was off-heap constructs first, and bounded chunks from which to assemble bigger assemblies for what would be called streaming architectures today.

Amir Gur

10/03/2019, 4:27 PM

@jimn - I cloned and ran your codebase with the

data class DbEntity

example on jdk 12. But: - It is still trying to read jdbc data, which I am not yet clear if depends on your specific db and schema or something I should expect to run smooth on a standard in-memory db out of the box. Best to give an example with no JDBC, if possible. - I am trying to understand the issue - got that the

turd

business seemed superfluousness to you. Is there anything else? Have seen, incorporated and supported numerous data frameworks on the JVM before, including data-class managed schema w/ full auto-evolution (w/ JPA, hibernate). Data classes and dao layers on the app side are supposed to be fairly minimal. ..., unfortunately I am on another project as many others here, and will try to respond when I can, but availability may vary, so apologizing for any inconsistent attention. Plz feel free to mark me or talk in person to get attention.

Amir Gur

10/03/2019, 4:29 PM

I am yet not sure how your

wire format from declaratives

fits in, and too short on time now, may check later.

Amir Gur

10/03/2019, 5:20 PM

@jimn does Apache Ignite suffer the same issues you see w/ Hazelcast?

jimn

10/04/2019, 3:33 AM

Having box format binary serialization is not new, and encourages performance, but for whatever reason academia and predicate calculus rely exclusively on string parsers

jimn

10/04/2019, 4:25 AM

message has been deleted

jimn

10/04/2019, 6:19 AM

i only put this codebase down in august, but the details were vague until i checked back to look at the code again today. so the codebase i have is a utility to convert formats, sometimes to couchdb, sometimes to pandas formats, generally from a direct sql source. the more recent additions have been all-kotlin, which is not always as simple as ctl-alt-K, so the original simplest-possible java code snippets are still relevant in ways that were awkward for porting to kotlin directly. most recently, i have been moving hundreds of gigs of relational data to .. couchdb as a parking spot to come back to later. this was convenient for unrelated projects not once but three times ina row, temporary or expensive access to a database with too much legacy and only a few meaningful data sets sometimes among thousands of tables; so the capability of doing complete copies and subsequent delta modifications from sql to couchdb for an operational migration earlier this year as ReiterateDb https://github.com/jnorthrup/jdbc2json/blob/19a2119d7c6eeedf795c3195924a35370216ee66/src/main/java/com/fnreport/ReiterateDb.kt#L22 and more recently two research/analytics jobs on monster legacy data stores which warranted some expansion of SqlExecToJson to JdbcToCouchDbBulk https://github.com/jnorthrup/jdbc2json/blob/844f9af8e1c50aa7fc42a3f216b0fb6560a155f2/src/main/java/com/fnreport/JdbcToCouchDbBulk.kt#L16 built to handle sick primary keys and store data-only rows with column-labelled couchdb views. I'm not really biased toward couchdb all told, but i use pouchdb syncronoization for some production web code and couchdb uses as little as curl as a database driver, which is code-agnostic. so things were great, everything was falling into place, and now i have half terabyte of raltional data in couchdb and projects are spinning up rapidly here; time for one more metadata addressable catalog of my ongoing soup to move the jdbc metadata to hazelcast and start to scavenge vm's around the company where i can squirrel some memory grid state while i figure out how to make this useful for others. This usecases is where kotlin serialization is a hard show-stopper - instead of running the metadata extraction and throwing it in hazelcast, i think i just checked in the jdbc url to a hazelcast Map and called it quits for now. wthere are a few kotlin hacks like QueryToFlat and QueryToFeather which interface directly ahead of python, pandas, and tensorflow here. generally speaking, pandas is a glorified groupby tool for numpy arrays. if you can take a step back and leverage sql for group-by you may be able to bypass python entirely and make a quick line to google bigquery and get the simplest possible linear regression before going full-keras

jimn

10/04/2019, 6:26 AM

the packaging methodology of building a uber-jar was feasable in the java 6 days, with like 8 years between vm changes - but shade and assemblies and maven have become brittle so the code creates a lib/*.jar structure and the bash scripts use a wildcard classpath; this way no collisions of manifests or multiple versions. having a repl, and having a distributed mesh of dataset metadata, and having a reasonably concise grammar for REPL interrogation by shell script seems like the next reasonable pragmatic step. it may not be as sexy as a whole math library, but i don't have coworkers who are technical and i need to make repeatable scripts with steps that can be repeated from an email alone

jimn

10/04/2019, 6:29 AM

this is far from complete and i have no real toy examples to share, but the datasets of 100-200 megs sometimes take an hour or more for regular python groupby fixtures because of RAM and swapping. Upgrading from 32 gigs to 128 on the server may improve some of these datasets but we're starting to see features numbering in the 10's of thousands and even 128GB is not enough RAM for pandas to synthesize in RAM alone

jimn

10/04/2019, 6:31 AM

divide and conquer with hazelcast would not be unthinkable once the jdbc sources and couchdb extractions are floating around in n=3 copies of network grid storage between idle machines here

jimn

10/04/2019, 6:34 AM

i don't have an end-to-end jvm based keras adaptation that i can simply port my stuff to kotlin; and im not the guy who invented deep learning so im still coming up to speed on the problem solving methodology steps, most notably it seems like sample code is a ticking time-bomb because the problems lack representative hardware inconveniences of useful datasets

jimn

10/06/2019, 4:08 AM

@Amir Gur @altavir basically Kotlin is missing a no-bullshit dht to approximate a p2p kvstore on top of plain old map interface. Hazel cast success here for jvm, but forces Kotlin to leave all of its features at the door and use a Java interface. Fixing Kotlin serialization may be less useful than just writing a portable Kotlin p2p bus for serialized objects.

Amir Gur

10/06/2019, 10:35 PM

Hi @jimn, thanks for the explanations. Sounds there would be plenty of things we need to develop or improve for data-science on kotlin. Is portable kotlin p2p bus for serialization the top prio you are aware of? Do we have it defined as a ticket? (@altavir, process wise, do we as a community just put those needs on tickets, prioritize and implement, or is there anything else?) (Also, in case we want to check Ignite, here is a sample kotlin project that looks useful: https://github.com/wojciech-zurek/kotlin-spring-boot-apache-ignite-example)

jimn

10/07/2019, 3:52 AM

@Amir Gur re: ignite:= has a comprehensive stack documented. that looks handy, i don't see an implicit remote kotlin invocation potential with this particular example, there's a data class with no kotlin objects (Pair). im pretty sure hazelcast goes that far as well. remote hazelcast execution via Callable is also possible, using Runnable, and also avoiding kotlin std artifacts. UInt is out of the question. re: ticket, kotlin mpp barely has kernel handle objects defined let alone mature objects. an oversimplified DHT (not an object bus, just a blob store) from way back is https://en.wikipedia.org/wiki/Venti with some cross platform implementation already; plan9 and inferno both have commercial backed open source arrivals for linux and windows incarnations. an inferno::Limbo target for kotlin would be very quaint.

5 Views

Open in Slack

Previous Next