Research: minimum noticable probability change in play?

Best Selling RPGs - Available Now @ DriveThruRPG.com

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
I'm trying to find a general range of values for the minimum "success rate" that's noticeable by the general population during an activity like playing a game. In plain English: How small of a % change do people notice in play? I haven't been able to really find any research on it, just one paper where it was sort of tangentially mentioned as an aside. I've read like the intros of like thirty plus papers now, checks some of their references, tried dozens of search phrases, and come up with nothing.

I know that I, personally, don't notice a 10% change in success rate during play, but I do notice ones at about 15%. I can see it on a character sheet, calculate it, model it in code... but in the game a +/-10% is basically fuck-all lost-in-the-noise bupkis as far as I can tell. I know that to some extent it's dependent on how the games play. something like a d100 or blackjackD20 with no hidden modifiers are super easy because you know you had 60% last time and 65% this time and rolled a 63. But stuff with limited info or hidden variables like the D&Ds where it's d20+8 vs ???, that's where the uncertainty comes in, where improving or "leveling up" blurs the line between knowing you have a slight statistical improvement versus feeling that you've gotten better at something happens.

Anyone have any leads or ideas short of me running my own damn study?
 

TJS

Legendary Member
Joined
May 5, 2018
Messages
2,574
Reaction score
5,479
I would say probably around 10%.

Less than that may be noticeable if its something that's rolled often enough like an attack roll in combat.

The other thing to think about is the variability - something that doesn't significantly improve the average but reduces the variability by introducing a bell curve can have an effect beyond that of just the average increase in success - most notably in a system where your level of success may matter. (but even if it doesn't that extra sense of reliability can influence decision making in play).
 

Fenris-77

Small God of the Dozens
Joined
Jul 9, 2020
Messages
8,496
Reaction score
21,696
I think this has way more to do with the mechanic in question than it does anything about a random % yes or no. Just my two cents.
 

thebigh

Gelatinous noob
Joined
Feb 25, 2022
Messages
306
Reaction score
953
It depends. The difference between 0% and 10% chance of success (or failure) feels huge. The difference between 45% and 55% seems imperceptible.
 

AsenRG

Legendary Member
Joined
Apr 28, 2018
Messages
11,826
Reaction score
13,721
With known or unknown TN, and on how many attempts? That would probably change the outcome quite a bit.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
It depends. The difference between 0% and 10% chance of success (or failure) feels huge. The difference between 45% and 55% seems imperceptible.
The diff between 0% and 10% is huge. It's infinite. But I'm looking for data on less extreme stuff. And the possible variables of all sorts of stuff is why I'm looking for the bottom line base ability of people to detect the changes before trying to address more variables.

I've gotten search terms narrowing down to "change point detection" and "time series data", but it's an important algorithm research field so finding human ability among the 'make/have a better algorithm' stuff is rough.
 

Raleel

The Lemon LeCroix of Mythras
Joined
May 15, 2017
Messages
4,859
Reaction score
11,464
It depends. The difference between 0% and 10% chance of success (or failure) feels huge. The difference between 45% and 55% seems imperceptible.
this is an important distinction though. it's not about the value, it's about the percentage of change vs the base chance. there is also a "notice it" vs "consider it important" distinction, and I think you are going for the latter. I certainly notice a +1%, but as you say, it gets lost in the noise. I don't really care about that.

Some of it is cultural as well, I think. D&D nerds will notice +1s and +2s more readily, as those numbers are common in the culture. Some of it is contextual - if you have a system with a lot of +1s, you might not notice them if they can stack, or you might if they CAN stack, and it's a rare bonus type. if a +1 lets you do a thing that is otherwise impossible, it gains some weight.

also consider, it is resolution dependent. +1 die in SR means something A LOT different than +1 in D&D. It's probably closer to +1% in d100. Consider 5e's advantage (and all the systems before that did something like this) where the +1 is +1 die roll and shifts the odds substantially based on where you are in the success curve.

So, I don't think the answer is simple. I tend to prefer bigger bonuses to care about when i GM - 15% is probably close to the floor of that, but tend to think more like 20%. But if I am optimizing, I'll for sure be all over every single 1%, in every situation i can get it from.
 

AsenRG

Legendary Member
Joined
Apr 28, 2018
Messages
11,826
Reaction score
13,721
this is an important distinction though. it's not about the value, it's about the percentage of change vs the base chance. there is also a "notice it" vs "consider it important" distinction, and I think you are going for the latter. I certainly notice a +1%, but as you say, it gets lost in the noise. I don't really care about that.
Yeah, in the same vein, the ratio of improvement between 0 and 10% is infinite, but both 0 and 10% chances of success are in the same camp: "don't bother trying...unless it's over 0% and you really MUST":thumbsup:.
So, smaller bonuses have a bigger effect if you have a low starting odds of success, but their effect on the behaviour is only evident when the situation is pressing, or when they end up tipping the roll from "low odds" to "decent odds". In all other cases, it's a wash: you don't take the roll if you can avoid it.
Same for bonuses that end up shifting the roll from "decent odds" to "pretty decent", or from there to "rather good, I'd say" and beyond.
Funny enough, the same bonus might have the strongest effect when it shifts the odds of success from "near certain" to "certain" or "guaranteed success with benefits"...due to the player simply wanting to see what the GM is going to do, for example:grin:!

So, I don't think the answer is simple. I tend to prefer bigger bonuses to care about when i GM - 15% is probably close to the floor of that, but tend to think more like 20%. But if I am optimizing, I'll for sure be all over every single 1%, in every situation i can get it from.
And that's because of the second set of variables I pointed out in my post: how often do you roll. In other words, a roll that's going to come up 50 times per session is going to matter a lot more than a roll that happens once per session, if that...unless the latter is a crucial roll that determines the rest of the session (see: Charisma bonus in D&D when the GM is using reaction tables).
So, as a player, you know you'd be rolling everything on your character sheet for the rest of the campaign. But when you are giving bonii as a Referee, you're giving one bonus for this one roll (and might deny it the next time), so every bonus is only applied once, from your POV...:shade:
 

Gabriel

Legendary Member
Joined
Mar 4, 2019
Messages
2,585
Reaction score
6,562
I would say probably around 10%.

Less than that may be noticeable if its something that's rolled often enough like an attack roll in combat.

The other thing to think about is the variability - something that doesn't significantly improve the average but reduces the variability by introducing a bell curve can have an effect beyond that of just the average increase in success - most notably in a system where your level of success may matter. (but even if it doesn't that extra sense of reliability can influence decision making in play).

I was thinking 5%, but it does depend on how often things are getting rolled for. In a long D&D combat with lots of iterative rolls, that +/-1 on a d20 will be noticed. But if rolls are much less frequent, then 10% is probably the threshold.

The caveat is that any percentage variability can be noticed if the rolls go certain ways. I had a game a few weeks back where I blew half a dozen skill rolls and every single one was failed by exactly 1 point on d%. I definitely noticed that 1%, probably the most keenly aware of that level of granularity I had ever been in a game.

But I'd say generally it's only going to be noticed at a range of 5-10% depending on how much rolling is going on.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
For sure, at game time presentation absolutely matters.

Like in a d10 roll & keep game adding 'reroll 1s' is (super generalized) basically a +1 or +2 and about the same as adding a rolled die, on result numbers going from mid teens to mid 30s (just the sets I checked). Statistically it's really minor, but at the table it has a huge impression on the players who remember that 1 turning into a 8 and pushing them into a success. Or a d% game where you know your % and your roll perfectly so you see the 3% increase and note it, even though its again a statistically minor change.

Contrarywise in something like a d&d dungeon crawl you might be hard pressed to tell the difference between a -1 & +6 swing on the d20 charisma checks vs unknown target numbers if rolling for talking to npcs only happens once every three once-a-week sessions.

I'd noticed a trend on some boards for people to fetishize stuff like a d&d character going from d20+2 to d20+4 vs unknown but assumed targets of 15 to 20, like it was some major power boost to a pc. I got to wondering if people really noticed that sort of change or if it was just whiteroom stats wankery.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
Finally starting to get decent hits with "regime change" and "subjective probability" with some other stuff. So far looks like people are shit at detecting a 60/40 split change over and seriously overestimate the increases.
 

Vile

Legendary Member
Joined
Apr 30, 2017
Messages
304
Reaction score
760
Following this with interest. Personally I'm not really concerned with perception vs. reality, because real modifiers are of course perceived for what they are and thus fulfil both criteria, while "apparent" modifiers feel like trying to fool the players.

I generally stick to 30%-60% modifiers. Any less seems pointless, while any more and I'd probably just waive the roll. Of course, I haven't done the maths (I play RPGs for fun, after all), so I'm still looking for confirmation from someone who has.
 

Mankcam

Coiner of Thread-Falls, & Inadvert Founder of Swo'
Joined
Sep 24, 2017
Messages
3,997
Reaction score
10,613
In a nutshell:
+/- a level on a Skill Level scale
+/-1 on a D20 scale
+/-10% on a D100 scale

For more broad modifiers
+/-5 on a D20 scale
+/-25% on a D100 scale

Of course, I care not for mathematical probabilities and whatnot; I'm just speaking entirely outta my arse on this -
:shade:
 
Last edited:

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
Bear with me for phone posting.

Like with my search for basic perception to make a %chart for game, this search led me back to a set of core research in the '60s & '70s that people keep referencing & building on. Unlike that search these people didn't have a crapton of military peeps available as subjects and had limited budgets*. Also different is the civvy research almost never includes the data at the end of the papers, uses way more dicipline specific jargon, and sometimes doesn't label axis on charts. Label your chart axis fuckwad.

So I don't have nice numbers to play with this time, just general observations.

1. People have a really hard time discriminating a 60/40 split.
2. They aren't as good as you'd expect at catching a 70-75/30-25 split or a 5% vs 60% hit rate. For stuff around 70/30 they're overlapping numbers in error bars with the 60/40 results.
3. People are really good at noticing a 90/10 split or 50% vs 100% hit rate.
4. People reliably estimate a 60% rate as a 50% rate because they don't get streaks right, and trying to human produce a 50% "random" set reliably put out a 60% set with too few/short streaks.
5. Personal tendencies towards thinking about stuff in aggregate (all events over time & multiple people/trials) vs individual (only consider my own last 10 trials) has a potentially major difference in perception of the rates & changes when combined with other factors.
6. Minimum 6 to 10 trials in a short amount of time to detect any change, increasing up to 20+ trials for some 60/40 sets. Consistent across several studies. 5 or 6 was to be sure of a 50% vs 100% or 90/10 split, 8 to 10 was for a 70/30 type split.
7. Some people simply couldn't tell a difference in a 60/40 and it was worse if you went closer like 54/46.

My personal generalized conclusions.

A. You need 4+ trials to actually tell any difference in rates, and they need to be fairly close in time. Minutes at most. Anything rolled 1/hour you can't tell except massive massive swings (50%v100% or 90/10) or by direct comparison of numbers ("used to fail this roll on 17- and just rolled a 17=success")
B. You need at least a 20% swing in effect for everyone in the audience** actually tell any difference in rates if you don't have the numbers in your face.
C. Systems that produce memorable success spikes (rerolls or post roll "add another die" type stuff) have a magnifying effect on people noticing rate increases, partially because they require attention & knowledge of the numbers (ref A & B caveats).

For reference, four of the more relevant & useful papers to start with.

Detecting Regime Shifts: The Causes of Under- And Over-Reaction
Cade Massey and George Wu

Detection of Change in Nonstationary, Random Sequences
DONALD M. BARRY AND GORDON F. PITZ

Detecting Regime Shifts: The Role of Construal Levels on System Neglect
Samuel N. Kirshner

Detection of change in nonstationary binary sequences
JOHN THEIOS and JOHN W. BRELSFORD, JR

* "Do we know how good people are at spotting shit from jets?" "No. Lets send a bunch of guys out to the range in jeeps, trucks, and tanks, then spend two weeks flying fighter jets around at different altitudes to spot them." vs "If I grab 20 students and pay them $5, minus 7 cents per miss, estimate misses as... hmm... ok, yeah, that comes in under budget if I can get my roomate to write the software for a $8 pizza."

** People who know probability (especially stats students) and are looking specifically to identify rate changes notice it more. Everyone else you want to err on ghe high side.
 

Raleel

The Lemon LeCroix of Mythras
Joined
May 15, 2017
Messages
4,859
Reaction score
11,464
Nice job in researching that. Your point #5 usually points me to a person who is trained or experienced vs a person who is not. This might be why I think about it the way I do, as all of my table is trained or experienced in probability and analysis.

I’ll point out that this reinforces the elegance of advantage in 5e.
 

Vile

Legendary Member
Joined
Apr 30, 2017
Messages
304
Reaction score
760
I’ll point out that this reinforces the elegance of advantage in 5e.
Indeed. Does all this point towards advantage/disadvantage as the best modifier system out there?
 

Raleel

The Lemon LeCroix of Mythras
Joined
May 15, 2017
Messages
4,859
Reaction score
11,464
Indeed. Does all this point towards advantage/disadvantage as the best modifier system out there?
Best is probably a strong and indefinite statement, but certainly one of the most noticeable. Play practice says it’s also one of the easiest.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
Best is probably a strong and indefinite statement, but certainly one of the most noticeable. Play practice says it’s also one of the easiest.

As much as D&D 5e annoys me in general, the base idea of the advantage mechanic is easy to use and notice in a way that increases it's perceived impact even beyond it being a meaningful modifier to the rolls. I still disagree with the "any opposing modifier cancels all modifiers" implementation for it's adding perverse and janky incentives, and I still think "roll three take middle" has a use, but it's an easy and noticable mechanic.

Even easier and stronger in perception is D&D 5e's 'lucky' feat. Which is a truely just a more directly player driven version of the advantage/disadvantage mechanisim. It's still roll 2d20 and pick one but directly player controlled instead of being indirectly controlled through the player putting the character through specific prerequsite actions.

By contrast the D&D 5e halfling species luck trait makes a great example of conclusion B. If you run a D&DBeyond halfling character with the Roll20 web browser plugin on the Roll20 website, then it auto calculates it into your rolls. Functionally it turns your d20 into a d19+1 (& still vs a target 5 to 20), a statistically rather minor difference that you have to look for in order to find it. But at the tabletop with real dice rerolling the 1s becomes much more visible and your perception of the impact is much higher because it's an active thing (roll again) and you're directly shown a fail->success flip (& random rewards from it too), despite it being the exact same %s improvement.

Contrast again with one of that game's "reroll 1s on damage" abilities. Statistically still a minor increase as it's likely something like an axe (1d12+5 -> 1d11+6 still vs 50 to 200 hp) or damage spell (8d6 -> 8d5+8 vs still lots of hp) but much less impactful because it's a low relative improvement in hp degredation instead of a binary success/fail trial, and much less memorable for the same reason.
 

robertsconley

Legendary Member
Joined
May 3, 2018
Messages
4,189
Reaction score
9,106
In the mid 2010s I did extensive playtesting of my Fantasy Fudge/Fate RPG.

Many players I knew who were not math savvy noticed the outsized benefit of getting +1 bonus using 4DF. Which at it's max benefit is 81.46-61.75= 19.75%
It wasn't noticed right away but by the time I ran 3 sessions it was being commented on. And pretty much sunk the whole project. Because of the bell curve it wasn't picked on up right away as the exact benefit of +1 varied on the initial odds. It came about pretty much after players spend some XP and started noticing their character was not a little better but way way better.

1668801389935.png
 

Raleel

The Lemon LeCroix of Mythras
Joined
May 15, 2017
Messages
4,859
Reaction score
11,464
there are some games that are like that, where they were not especially rigorous with the math and have strange humps or dips. I do like games with tight math, for sure.
 

ffilz

Legendary Member
Joined
Dec 17, 2018
Messages
2,152
Reaction score
3,692
Probabilities are interesting beasts...

When the designer of Cold Iron produced a combat simulator that would run 100s or 1000s of combats between two combatants factoring in several variables, one thing that quickly became clear was that a +1 which in Cold Iron is about +5% near the center of the bell curve (always one 6/20 of a standard deviation no matter where on the curve) turned combat into I think something like a 70-30 split. I did a similar simulator for AD&D combat and found the same thing. So small deltas can be very significant when compounded over several rolls.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
In the mid 2010s I did extensive playtesting of my Fantasy Fudge/Fate RPG.

Many players I knew who were not math savvy noticed the outsized benefit of getting +1 bonus using 4DF. Which at it's max benefit is 81.46-61.75= 19.75%
It wasn't noticed right away but by the time I ran 3 sessions it was being commented on. And pretty much sunk the whole project. Because of the bell curve it wasn't picked on up right away as the exact benefit of +1 varied on the initial odds. It came about pretty much after players spend some XP and started noticing their character was not a little better but way way better.

What I think may have been happening is the players were aggregating experience over... 3 sessions would be how many rolls? And they're looking more at something like the 1+ results which went from ~20% of results to ~40% of results, so a full doubling of positive results. In addition the -1 & less results went from ~40% to ~20% of results, halving them. I think the graduated nature of the results may have increased the visibility of the change.

Nothing I found was absolutes or hard limits. What I generalized out was average/trend stuff. The multiple variables of number of trials, hit rate, how much attention people can or are forced to apply, etc., are all interacting in often non-linear ways.
 
Joined
Nov 3, 2022
Messages
2
Reaction score
1
I'm trying to find a general range of values for the minimum "success rate" that's noticeable by the general population during an activity like playing a game. In plain English: How small of a % change do people notice in play? I haven't been able to really find any research on it, just one paper where it was sort of tangentially mentioned as an aside. I've read like the intros of like thirty plus papers now, checks some of their references, tried dozens of search phrases, and come up with nothing.

I know that I, personally, don't notice a 10% change in success rate during play, but I do notice ones at about 15%. I can see it on a character sheet, calculate it, model it in code... but in the game a +/-10% is basically fuck-all lost-in-the-noise bupkis as far as I can tell. I know that to some extent it's dependent on how the games play. something like a d100 or blackjackD20 with no hidden modifiers are super easy because you know you had 60% last time and 65% this time and rolled a 63. But stuff with limited info or hidden variables like the D&Ds where it's d20+8 vs ???, that's where the uncertainty comes in, where improving or "leveling up" blurs the line between knowing you have a slight statistical improvement versus feeling that you've gotten better at something happens.

Anyone have any leads or ideas short of me running my own damn study?

I think you overestimate how easy it is for people to notice things even if they are written clearly and unambiguously on their character sheets.

What I mean by this, is that the subjective impression of probability is usually quite different to the reality. For example, if you play a percentile system, in which someone is trying to roll under a number like 45, I guarantee you that a significant number of players will complain that they 'almost always' fail; even though they (should) know they are, in fact, probably only failing a little more than half the time. There is a name for this common bias that I don't remember.
 

Telok

The eggnog is one third rum.
Joined
Jun 20, 2022
Messages
186
Reaction score
446
I think you overestimate how easy it is for people to notice things even if they are written clearly and unambiguously on their character sheets.
4. People reliably estimate a 60% rate as a 50% rate because they don't get streaks right, and trying to human produce a 50% "random" set reliably put out a 60% set with too few/short streaks.

Yup. People guess that 60% rates are really 50% rates, in part because of how badly they underestimate streaks in true randomness.
 

bookstore44

Member
Joined
Nov 19, 2022
Messages
1
Reaction score
2
I saw this post while browsing and I was curious about it myself, so I did a statistical analysis.

So the question is: how well people can sense unknown probabilities based purely off of the successes and failures that they roll? I don't know of a study that looks into players' perceptions like that, and I also haven't played many ttrpgs. I know that in a real game, players can often see the stated probabilities, and when I played boardgames the core gameplay was to hunt for optimal moves based off those stated probabilities, so small numbers could matter a lot. Also, in a real game a player can look at their diceroll and see "I managed to get a hit despite rolling suboptimally. Even from a single roll I can see that this +2 modifier is a significant, visible increase in the target area." But for the purposes of this analysis I'll be assuming the dicerolls are secret and I'll only be trying to see if the change in hitrate itself is noticeable. I've messed around with statistics before and I think a fancy, mathematical analysis should act as an upper bound on how well a person can sense the probability of hitting. (Admittedly I had to look up some of the math on wikipedia, and I kept getting off by one errors, but I think I've sorted that all out now.)

An issue is that I don't know how many rolls a player can keep track of. There's no way a person can remember a hundred rolls in a row without consciously tallying them. However, there's one exception if the probability is very high or very low. If you have a 5% chance of being killed from every hit, then you can remember "in the past several games I died 5 times" and that's basically the same as "I got 5/100 failures". Another loophole is that a player can notice if they have 5 bad days (number of successes is in the lower quartile of expected results) and 0 good ones (number of successes is in the upper quartile of expected results). IF data like that is gathered correctly and IF it is analyzed intelligently, then I think it can be comparable to actually remembering over a hundred rolls. I don't know how good people's intuition is with that. But for the rest of this post I'm mainly going to focus on how well you can do with only 20 rolls. My graphs and tables will go up to 200 rolls for the sake of comparison.

...

One question is:
The gm says they're giving you a modifier, but it's pure placebo. Do you notice?

So we're dealing with a binomial distribution, where n is the number of rolls we make, p is the stated probability of getting a success, and k is the number of successes we actually got.

The strategy that I know goes like this: "if I got less than x successes or more than y successes then I'll claim the probabilities are fudged". This is called a "null hypothesis test", where p is our null hypothesis. We pick a confidence level, like 95%. We plug p into our binomial distribution and find the 2.5 percentile and 97.5 percentile. Then we use those as our critical values. You can see the values I got on my "critical values" table. You call the result abnormal if your successes equal the lower bound or less, or if they equal the higher bound or more. I also calculated these intervals using the clopper-pearson method, and it sort-of acts like an interpolation so its more precise than the critical values. You can see that on my clopper-pearson confidence interval table. The odds of a false positive aren't exactly 5% like you'd expect because the number of successes is discrete, so I made a table of the false-positive values also.

Because we're only looking at the dice results, our results are random. If the dice just randomly gave you 10 successes exactly like you expected with p = 0.5, then there's literally nothing for you to notice. And if the dice just randomly gave you 0 successes, you'll call them fudged even if they're not. So I think a study on the minimal difference a person can notice in a small number of rolls would end up just studying the probability that the dice give an abnormal result, because that's the dominant factor here. Although a study on people can still tell you how many dicerolls people will remember, which strategy they use (tallying the dice? counting successful streaks? I don't know), and where people place the cutoffs for what an "abnormal" result is.

Because of the randomness in the die rolls, it's impossible to make any hard cut off for what a "minimum detectable shift" is. Instead I'll give a range of results. With this test, 20 dicerolls, and a 95% confidence level, you can correctly notice a 40 percentage point difference almost all the time, you can notice a 20 percentage point difference only 40% of the time (it often gives you a low, but still plausible result. A good result for 0.3 overlaps a bad result for 0.5). You can detect a 10 percentage point change like 10-15% of time. You can even "detect" a 0 percentage point change 4% of the time (but since you only "detect" stuff by rolling an abnormal amount of successes, you'll think it's some sort of major change in the probability). You can see these detection rates on my "power / chance-to-reject-null-hypothesis" graph.

...

A second question is:
The chance of a success might be p, or it might be anything else. Both options are equally likely. Can you correctly guess which one it is more than half of the time?

To do this you can just do a null hypothesis test with the 50% confidence interval from question 1. The chance of a false positive is 50%. The chance of a false negative is less than 50% for any success rate that's not equal to p. Therefore, this strategy lets you pick the right one over 50% of the time no matter how small the change in percentage is.

...

A third question is:
You're not told the true probability. What's the range of plausible guesses?

Again we're dealing with a binomial distribution, but this time p is left as a complete unknown.

For this I'm going to do a bayesian analysis, which I think just means: "calculate through all the possibilities". I'm going to imagine that the gm is picking a random probability p that's uniformly distributed between 0 and 1. This assumption about the possible values for p is called the "prior".

Picking a uniform distribution is a fudge, because irl you wouldn't expect a game or a gm to literally pick any value from 0 to 1 with equal probability. I don't know what the prior should be though, and I think this is good enough. The null hypothesis test we did actually has the same issue, and I've been told it complicates medical testing. Say you take a 99.99% accurate test for a rare disease and get a positive result. So that means there's a 99.99% chance you're sick, right? No, the probability is 0. The test was for smallpox. The test's confidence level only gives you the range of expected results if the null hypothesis is true. I don't know why people call those "confidence levels" if it can't actually tell you your level of confidence in the result. Depending on the prior, you might refuse to believe something despite strong evidence, or vice versa.

Moving on, I calculated the possibility of rolling a success rate of 0.05, 0.25, or 0.5 for every value of p between 0 and 1. You can see the result in my "chance of getting a success rate equal to 0.05, 0.25, 0.5" graphs. This is called our "likelihood". In our case the function is irrelevantconstant * p^k * (1-p)^(n-k) (where k is number of successes and n is number of rolls). This equation happens to form a beta distribution. Also, I know it's impossible to have a success rate of 0.05 when you only roll 5 dice, but the math works out so I plotted those curves anyway.

I then multiply our likelihood function (the probability of getting k for each p) by our prior function (the corresponding probability of getting each of those p's). That gets the final probability of any given p being the culprit for our k successes. This is called the "posterior". Except my prior is a uniform line, so the multiplication does nothing and the posterior ends up being equal to my likelihood function.

Once I have all that, I make an interval to cover 95% of the area. People call this a "credible interval" to distinguish it from confidence intervals. You can pick and choose which 5% to leave out, but it makes the most sense to focus on the bump on the graph and leave out the tails. This is called a "high density interval". I show the results in my "credible interval" table. This table is different from the clopper-pearson confidence interval table, because that one shows the range of expected successes give a certain probability p, and this one shows a range of expected values for p given a certain amount of successes. For this problem they happen to be similar.

The final result of all this is that if you get 10 successes in 20 rolls, the 95% credible interval stretches from 0.3 to 0.7.

...

So I think the main takeaways from all this is that there isn't any hard cutoff for a minimum detectable difference, its random whether an abnormal probability actually creates an abnormal amount of successes, even a 20 percentage point difference might not create an abnormal result within 20 rolls, and you need to make hundreds of rolls to really narrow things down. Players aren't going to do any better than this analysis, so they'll need hundreds of rolls also. And so the remaining question is if the player can somehow keep track of how often they're succeeding across hundreds of rolls (assuming they ever do that many rolls with the exact same probability anyway).

Here are some bonus observations:
The randomness means you can't rely on rolling the expected value of hits. If you got a stat increase from 40% to 50% then that gives you a chance to roll a higher than normal amount of hits, but the only thing you can be 95% sure on is that you'll roll at least 6 out of 20 hits.

You can also see how an absolute difference of 10 percentage points is more noticeable near the edges. The 95% interval around 0.05 is half the size of the one around 0.5.

You can also see how a relative difference of percentage points is less noticeable near the edges. The credible interval I made for 1/20 successes goes from 0.003 (rounded to 0.00) to 0.21. That's 3 orders of magnitude. This makes sense, since a single fluke success will take a long time to average out.

...

My graphs, tables, and python code should be attached to this post.
 

Attachments

  • all the tables tab separated.txt
    6 KB · Views: 0
  • all the tables.pdf
    124.8 KB · Views: 0
  • power chance to reject null hypothesis for 95cl.pdf
    35.3 KB · Views: 0
  • binomial distributions.pdf
    49.7 KB · Views: 0
  • chance of getting a success rate equal to 005 025 05.pdf
    35.2 KB · Views: 0
  • binomialanalysiscode.txt
    4.3 KB · Views: 0

Zebraman

Member
Joined
Nov 23, 2022
Messages
21
Reaction score
44
Really interesting thread. One factor which I think has been mentioned but just to emphasise is the consequence of the action as a focus for PCs noticing more granular changes. And of course whether the players actually cam see the dice rolled makes a difference. Going from 70%-71% in Gossip Skill will probably get lost in the mix while going from 70%-71% chance of chance of fumbling or critically will be more likely be noticed when it comes up. Although perhaps that's less noticing the probabilities than knowing 70 on the dice used to be a success and now 71 is a success.

The other thing to throw into the mix is differences in non dice based percentages, for example damage reduction and hitpoints. As a GM I feel I notice even small differences in the groups overall hitpoints as they level up and will adjust future combat encounters accordingly. But then that is an entirely unscientific statement!
 
Cthulhu Mythos - Available Now @ DriveThruRPG.com
Top