I came across a paper discussing an experiment and tried to kotlinlang #datascience

I came across a paper discussing an experiment and...

eenriquelopez

04/08/2025, 2:13 PM

I came across a paper discussing an experiment and tried to reproduce it. Here’s a brief summary: • Portfolio A: In a bull market, grows by 20%; in a bear market, drops by 20%. • Portfolio B: In a bull market, grows by 25%; in a bear market, drops by 35%. • Bull market probability: 75%. According to the paper, both portfolios should have a one year expected return of 10%. However, the paper claims that Portfolio A wins over Portfolio B around 90% of the time. After running a Monte Carlo simulation (code attached), I found that Portfolio A outperforms Portfolio B around 66% of the time. Question: Am I doing something wrong in my simulation, or is the assumption in the original paper incorrect?

Copy code

// Simulation parameters
val years = 30
val simulations = 10000
val initialInvestment = 1.0

// Market probabilities (adjusting bear probability to 30% and bull to 70%)
val bullProb = 0.75 // 75% for Bull markets

// Portfolio returns
val portfolioA = mapOf("bull" to 1.20, "bear" to 0.80)
val portfolioB = mapOf("bull" to 1.25, "bear" to 0.65)

// Function to simulate one portfolio run and return the accumulated return for each year
fun simulatePortfolioAccumulatedReturns(returns: Map<String, Double>, rng: Random): List<Double> {
    var value = initialInvestment
    val accumulatedReturns = mutableListOf<Double>()
    
    repeat(years) {
        val isBull = rng.nextDouble() < bullProb
        val market = if (isBull) "bull" else "bear"
        value *= returns[market]!!

        // Calculate accumulated return for the current year
        val accumulatedReturn = (value - initialInvestment) / initialInvestment * 100
        accumulatedReturns.add(accumulatedReturn)
    }
    return accumulatedReturns
}

// Running simulations and storing accumulated returns for each year (for each portfolio)
val rng = Random(System.currentTimeMillis())

val accumulatedResults = (1..simulations).map {
    val accumulatedReturnsA = simulatePortfolioAccumulatedReturns(portfolioA, rng)
    val accumulatedReturnsB = simulatePortfolioAccumulatedReturns(portfolioB, rng)
    
    mapOf("Simulation" to it, "PortfolioA" to accumulatedReturnsA, "PortfolioB" to accumulatedReturnsB)
}

// Count the number of simulations where Portfolio A outperforms Portfolio B and vice versa
var portfolioAOutperformsB = 0
var portfolioBOutperformsA = 0
accumulatedResults.forEach { result ->
    val accumulatedA = result["PortfolioA"] as List<Double>
    val accumulatedB = result["PortfolioB"] as List<Double>

    if (accumulatedA.last() > accumulatedB.last()) {
        portfolioAOutperformsB++
    } else {
        portfolioBOutperformsA++
    }
}

// Print the results
println("Number of simulations where Portfolio A outperforms Portfolio B: $portfolioAOutperformsB")
println("Number of simulations where Portfolio B outperforms Portfolio A: $portfolioBOutperformsA")
println("Portfolio A outperformed Portfolio B in ${portfolioAOutperformsB.toDouble() / simulations * 100}% of simulations.")
println("Portfolio B outperformed Portfolio A in ${portfolioBOutperformsA.toDouble() / simulations * 100}% of simulations.")

👍 2

altavir

04/08/2025, 4:17 PM

Not exactly #C4W52CFEZ topic, more like general #CEXV2QWNM or #CE5HPKBRN. I am not sure why you need Monte-Carlo here, since subsequent returns do not depend on each year result. it is a general binomial model. Expected return is Bayesian average of proposed strategies.

altavir

04/08/2025, 4:37 PM

Here is your task in a DataLore notebook: https://datalore.jetbrains.com/notebook/ptQDfQAcrjNxzIO0AEqovZ/pzlRH0fSA6XdwIeKHNk0Ym. If we count simple one-year return, then the average return on strategy A is 0.75*1.20 + 0.25*0.8 = 1.1 the same as 0.75*1.25 + 0.25*0.65 for strategy B. But since you are multiplying results, not just summing them, the result over 30 years could be different.

altavir

04/08/2025, 4:40 PM

I've added a second plot. It shows year to year value. And it is the same for both strategies as expected. But it seems like more high risks high gains strategy B is no justified.

altavir

04/08/2025, 4:44 PM

I am not a finance guy (thought I sometimes create mathematical and data engineering tools for finance), but I think the picture will be quite different if you will add more complex behavior. For example, expected returns will change over years. Or you withdraw some money periodically.

eenriquelopez

04/08/2025, 4:49 PM

TIL about Datalore, thanks @altavir! The calculation seems correct, IMO. Let me add here the exact text I read on the paper I mention. So, if using Montecarlo (i.e., basically repeating the same event over and over) I come up always with a similar percentage. The author claims that he is coming up with 90% in favour of portfolio B, but I don’t see that with the math I am using, and I don’t see any obvious mistake (basically run a random calculation for A and B to see if the year is bearish or bullish, and adjust accordingly)

altavir

04/08/2025, 4:52 PM

Well, he says that the simulation is done with AI. I would not rely on it. Especially if he does not show the code.

eenriquelopez

04/08/2025, 4:53 PM

Yeah, the part “this process seemed like magic” is very meaningful about what it can be happening (i.e., a wrong calculation). So yeah, I was basically thinking if there was any flaw on my code above, which I am tempted to say it is correct.

altavir

04/08/2025, 4:55 PM

I did not see obvious flaws. I cleaned it up a bit, but otherwise it is fine. Some more tests with corner cases perhaps.

eenriquelopez

04/08/2025, 4:56 PM

Excellent. Thanks for the heads-up, @altavir! I will see if I can contact the author and see if I can verify his code. Not a death-or-life situation, but the result he proposed did not seem right to me

altavir

04/08/2025, 5:14 PM

Based on the picture in the article, I think I know what is the difference. He probably uses the same random sequence for both simulations. Let me check it.

altavir

04/08/2025, 5:17 PM

Yep. Check the last cell. If we use exactly the same random sample for both simulations, we get the same result as in the article. It is an interesting result.

altavir

04/08/2025, 5:23 PM

This situation could not be checked with simple statistics, you indeed need a simulation for that.

eenriquelopez

04/08/2025, 5:26 PM

Isn’t the data still the same, regarding the amount of times Portfolio A outperforms Portfolio B?

Copy code

Number of simulations where Portfolio A outperforms Portfolio B: 6283
Number of simulations where Portfolio B outperforms Portfolio A: 3717
Portfolio A outperformed Portfolio B in 62.83% of simulations.
Portfolio B outperformed Portfolio A in 37.169999999999995% of simulations.

altavir

04/08/2025, 5:26 PM

Check the last cell in the notebook

altavir

04/08/2025, 5:27 PM

It uses the same random sequence for both simulations

altavir

04/08/2025, 5:27 PM

Copy code

Number of simulations where Portfolio A outperforms Portfolio B: 9025
Number of simulations where Portfolio B outperforms Portfolio A: 975
Portfolio A outperformed Portfolio B in 90.25% of simulations.
Portfolio B outperformed Portfolio A in 9.75% of simulations.

altavir

04/08/2025, 5:28 PM

It is a good example for my statistics students.

eenriquelopez

04/08/2025, 5:31 PM

Ah, I see. So the only difference is how the random data is being selected: val rng1 = Random(123) val rng2 = Random(123) vs. the same one using the current unix time as a seed.

altavir

04/08/2025, 5:34 PM

The difference is that you use the same generator for both and it takes samples sequentially. Effectively they use two independent samples. In the last case I create two quasi random sequences that provide the same numbers. It is an ugly solution, you can pre-generate samples for that, it would be better. But it is easier for quick and dirty check.

eenriquelopez

04/08/2025, 5:36 PM

Wouldn’t it make more sense, from a mathematical perspective, to always use the same sample? For this comparison we are studying if the portfolios have a certain performance over the same period of time.

altavir

04/08/2025, 5:39 PM

From real-life point of view, using the same sample makes sense. Because you test it on the same "live" data. It is harder to explain in statistical terms though.

eenriquelopez

04/08/2025, 5:42 PM

My thoughts, too. If we use two different “time” segments, of course the portfolio results might vary. It is still interesting that they might vary more then the samples are different, but a real comparison of the performance of two portfolios should be under the same timespan.

5 Views

Open in Slack

Previous Next