I m absolute zero in statistics but I need to report couple kotlinlang #datascience

I’m absolute zero in statistics, but I need to rep...

orangy

01/21/2019, 11:17 PM

I’m absolute zero in statistics, but I need to report couple of values (mean, error, confidence interval, etc) out of the sample dataset (DoubleArray). It is all in Kotlin Multiplatform (actually a benchmarking suite), so I can’t use any JVM library. I would also like to avoid dependencies at all if possible. Can someone point me to a minimum common Kotlin code I can steal shamelessly (Apache 2)? Or help me develop one. I’ve got this from some place (I copied it several times from a project to project, so I don’t remember where it came from originally) https://github.com/orangy/gradle-benchmarks/blob/master/runtime/nativeMain/src/org/jetbrains/gradle/benchmarks/NativeBenchmarksStatistics.kt Looking at some resources, I see TDistribution, inverseCumulativeProbability and such, but unfortunately I don’t have any idea what is it. The target goal is to generate JMH-like JSON (sample in a thread)

✅ 3

orangy

01/21/2019, 11:17 PM

Copy code

"primaryMetric": {
			"score": 3467631.2300505415,
			"scoreError": 98563.33258873208,
			"scoreConfidence": [
				3369067.897461809,
				3566194.5626392737
			],
			"scorePercentiles": {
				"0.0": 3333994.173460993,
				"50.0": 3474993.5017186473,
				"90.0": 3532339.4656069754,
				"95.0": 3532453.3325443356,
				"99.0": 3532453.3325443356,
				"99.9": 3532453.3325443356,
				"99.99": 3532453.3325443356,
				"99.999": 3532453.3325443356,
				"99.9999": 3532453.3325443356,
				"100.0": 3532453.3325443356
			},
			"scoreUnit": "ops/s",
			"rawData": [
				[
					3524193.1540743182,
					3531314.663170731,
					3532453.3325443356,
					3520395.64538141,
					3474431.2865468427
				],
				[
					3440045.3067861893,
					3333994.173460993,
					3445535.4557112623,
					3475555.716890452,
					3398393.5659388755
				]
			]
		},

orangy

01/21/2019, 11:17 PM

I need to go from

rawData

to all other numbers

orangy

01/21/2019, 11:18 PM

rawData are 2 runs 5 iterations each

altavir

01/22/2019, 6:06 AM

The generla library for statistics is https://github.com/thomasnield/kotlin-statistics. It is not multiplatform yet. I plan to add this functinoality t multiplatform kmath. For your code it needs few lines of code, so I think I can just write them here.

☝️ 1

altavir

01/22/2019, 6:07 AM

You don't need distributions etc for simple means and confidence intervals.

altavir

01/22/2019, 6:11 AM

Arithmetic mean is just average

fun DoubleArray.mean() = sum()/size

altavir

01/22/2019, 6:14 AM

Dispersion is defined like this:

Copy code

fun DoubleArray.dispersion(){
  val mean  = mean()
  return sumByDouble{ pow(it - mean, 2.0) }/size
}

altavir

01/22/2019, 6:15 AM

It could be optimized for large data if needed.

altavir

01/22/2019, 6:16 AM

The standard diviation (aka error) is a square root of dispersion.

altavir

01/22/2019, 6:18 AM

I am not sure you need confidence interval for this task. Probably you can just use +-2 standard diviation interval from mean. For normal distribution it has 95% coverage

altavir

01/22/2019, 6:54 AM

Ah, I see, you you want percentiles. It is a bit more complicated. For that you need to first calculate cumulative sum of your data and then find appropriate percentile. I will write code later if you still need it. It is a little bit more complicated. Of course, percentiles do not make any sense on such small samples.

orangy

01/22/2019, 7:40 AM

@altavir thanks! But did you check what I’ve linked? I think there is all you said. stdDev, percentiles, etc. Do I have everything needed already?

orangy

01/22/2019, 7:41 AM

Here is the usage: https://github.com/orangy/gradle-benchmarks/blob/master/runtime/nativeMain/src/org/jetbrains/gradle/benchmarks/NativeBenchmarkSuite.kt#L41-L60

altavir

01/22/2019, 7:42 AM

Yeah, missed that. It seems that you can use it as is.

altavir

01/22/2019, 7:47 AM

The quantile does not make and sens for such small samples, but it seems to be correct. I was thinking about more complicated thing like distribution quantile.

orangy

01/22/2019, 9:09 AM

Looks like I figured it all out (some fixes re CI are coming)

11 Views

Open in Slack

Previous Next