A modern programming language that makes developers happier.

kotlinlang

Just did a 1.0-SNAPSHOT release of some NLP tools I've thrown together :tada:
<https://github.com/londogard/londogard-nlp-toolkit>
:thread: for details

:heavy_check_mark:WordEmbeddings (`WordEmbeddings` &amp; `LightWordEmbeddings`)
:heavy_check_mark:Stopwords
:heavy_check_mark:WordFrequencies
:heavy_check_mark:Tokenizer (`CharTokenizer` &amp; `SimpleTokenizer`)
:heavy_check_mark:Stemmer
:heavy_check_mark:Basic Trie
:heavy_check_mark:Sentence Embeddings (`AvgSentenceEmbeddings` &amp; `USifEmbeddings`)

At the top of TODOs:
• SubWordTokenization (think SentencePiece, BPE, WordPiece &amp; Unigram)
• Vectorization methods (TF-IDF, BM25, BagOfWords &amp; so on)
• Classifiers (leaning on adding another library to be used, e.g. smile or something like that)

Usage examples in :kotlin:  notebooks are very welcome)

Will try to get that up, added usage in the README for a lot of the tools.

<@UQ8NE6A86> added a `README.ipynb` which contains interactive examples.
Removed the examples from `README.md` to not have anything fall out-of-sync.

<https://github.com/londogard/londogard-nlp-toolkit/blob/main/README.ipynb>

Also added support for:
`SentencePieceTokenizer` (including simple download for 275 languages with ~7 different vocab sizes to choose from)
`BpeEmbeddings` which are BytePieceEncoded embeddings, has been shown to be very effective with little space (11mb perform approximately the same as 6GB of fastText embeddings)

And now there's a helper to simply instantiate either `WordEmbeddings` or `LightWordEmbeddings` through `LanguageSupport` where it will download embeddings from `fastText` meaning that there's 175 languages supported from the get-go!

Cool! Could you please PR a descriptor for your library? A link to it will be included in <https://github.com/Kotlin/kotlin-jupyter#list-of-supported-libraries|this list>
Example: <https://github.com/Kotlin/kotlin-jupyter/blob/master/libraries/kmath.json>

Will do when I find the time for sure! Hopefully this weekend:slightly_smiling_face:

<@UQ8NE6A86> <https://github.com/Kotlin/kotlin-jupyter/pull/186>
:partying_face: