Wrote a Markov chain sampler. It seems to work wel...
# datascience
b
Wrote a Markov chain sampler. It seems to work well, but it's kinda slow, any ideas how to speed it up? https://github.com/breandan/markovian/blob/master/src/main/kotlin/edu/mcgill/markovian/mcmc/MarkovChain.kt
j
if you're lucky enough to have idea universal you can profile with a flame graph to report the hot spots.
a
There could be several places since the algorithm is a little bit obscure. My first guess is the
keys
property, it is recalculated on call and contains a number of rather heavy operations like sorting and distinct (by the way, they are done separately, so the time additionally increases). I recommend to use a TreeSet instead.
b
Yeah, good call the
keys
were the main problem, now it’s much faster
a
By the way, it would be interesting to hear a feedback about kmath Chain (https://github.com/mipt-npm/kmath/blob/dev/kmath-coroutines/src/commonMain/kotlin/kscience/kmath/chains/Chain.kt, including MarkovChain) and Sampler (https://github.com/mipt-npm/kmath/blob/dev/kmath-stat/src/commonMain/kotlin/kscience/kmath/stat/Distribution.kt#L8) API. I took rather unorthodox approach by using all random numbers as cold flow providers instead of single-pick. It seems to work nice so far, but it requires a slight change of paradigm. It is not simple to just take a simple rundom number from the distribution, you need to take a chain and work with it.
b
Ah very nice, I like how you’ve defined this. I wonder if there is a way to define
Distribution
using a measurable space so you get a proper σ algebra
a
For that I need to understand how to use it, because I am teaching statistics and I never found where sigma-algebra is used outside of abstract mathematics. There is a Sampler algebra used to generate combined samples: https://github.com/mipt-npm/kmath/blob/dev/kmath-stat/src/commonMain/kotlin/kscience/kmath/stat/SamplerAlgebra.kt, but it is a different thing.
b
Yeah, I’m not sure either, maybe @dievsky would know more about that
р
You cannot define conditional expectations without the notion of a sigma algebra
An applied problem where this crops up is for example the kalman filter. Most packages I have seen have a very awkward API for conditional expectations.
a
It would be interesting to discuss later. In MST we have a way to define Boolean expressions inside numeric expressions in a quite straight-forward way (not yet implemented, but it could be done any time). If it has its usages, we can pursue this. Kalman filter is used broadly in particle track reconstruction.
р
Yes definitely - and providing functionality for state space models in Kmath is in my Todo list