I’m absolute zero in statistics, but I need to rep...
# datascience
o
I’m absolute zero in statistics, but I need to report couple of values (mean, error, confidence interval, etc) out of the sample dataset (DoubleArray). It is all in Kotlin Multiplatform (actually a benchmarking suite), so I can’t use any JVM library. I would also like to avoid dependencies at all if possible. Can someone point me to a minimum common Kotlin code I can steal shamelessly (Apache 2)? Or help me develop one. I’ve got this from some place (I copied it several times from a project to project, so I don’t remember where it came from originally) https://github.com/orangy/gradle-benchmarks/blob/master/runtime/nativeMain/src/org/jetbrains/gradle/benchmarks/NativeBenchmarksStatistics.kt Looking at some resources, I see TDistribution, inverseCumulativeProbability and such, but unfortunately I don’t have any idea what is it. The target goal is to generate JMH-like JSON (sample in a thread)
3
Copy code
"primaryMetric": {
			"score": 3467631.2300505415,
			"scoreError": 98563.33258873208,
			"scoreConfidence": [
				3369067.897461809,
				3566194.5626392737
			],
			"scorePercentiles": {
				"0.0": 3333994.173460993,
				"50.0": 3474993.5017186473,
				"90.0": 3532339.4656069754,
				"95.0": 3532453.3325443356,
				"99.0": 3532453.3325443356,
				"99.9": 3532453.3325443356,
				"99.99": 3532453.3325443356,
				"99.999": 3532453.3325443356,
				"99.9999": 3532453.3325443356,
				"100.0": 3532453.3325443356
			},
			"scoreUnit": "ops/s",
			"rawData": [
				[
					3524193.1540743182,
					3531314.663170731,
					3532453.3325443356,
					3520395.64538141,
					3474431.2865468427
				],
				[
					3440045.3067861893,
					3333994.173460993,
					3445535.4557112623,
					3475555.716890452,
					3398393.5659388755
				]
			]
		},
I need to go from
rawData
to all other numbers
rawData are 2 runs 5 iterations each
a
The generla library for statistics is https://github.com/thomasnield/kotlin-statistics. It is not multiplatform yet. I plan to add this functinoality t multiplatform kmath. For your code it needs few lines of code, so I think I can just write them here.
☝️ 1
You don't need distributions etc for simple means and confidence intervals.
Arithmetic mean is just average
fun DoubleArray.mean() = sum()/size
Dispersion is defined like this:
Copy code
fun DoubleArray.dispersion(){
  val mean  = mean()
  return sumByDouble{ pow(it - mean, 2.0) }/size
}
It could be optimized for large data if needed.
The standard diviation (aka error) is a square root of dispersion.
I am not sure you need confidence interval for this task. Probably you can just use +-2 standard diviation interval from mean. For normal distribution it has 95% coverage
Ah, I see, you you want percentiles. It is a bit more complicated. For that you need to first calculate cumulative sum of your data and then find appropriate percentile. I will write code later if you still need it. It is a little bit more complicated. Of course, percentiles do not make any sense on such small samples.
o
@altavir thanks! But did you check what I’ve linked? I think there is all you said. stdDev, percentiles, etc. Do I have everything needed already?
a
Yeah, missed that. It seems that you can use it as is.
The quantile does not make and sens for such small samples, but it seems to be correct. I was thinking about more complicated thing like distribution quantile.
o
Looks like I figured it all out (some fixes re CI are coming)