>Being able to apply statistics is like having a secret superpower.
I totally with this sentence. BUT If you ask for my opinion, merely knowing a list of statistical formulas is not very helpful. Most of the time, people don’t remember the underlying assumptions, so there is a fair chance they will use them in inappropriate situations.
I recommend watching these two YouTube videos. The presenters advocate using simulation/bootstrapping/shuffling methods instead of memorizing formulas.
> The presenters advocate using simulation/bootstrapping/shuffling methods instead of memorizing formulas.
Yeah, I often find it much easier to make a little Python script to do 10,000 monte-carlo trial, as opposed to properly" working things out and then not even being confident-enough in my result anyway.
mont_tag 6 hours ago [-]
IIRC, Jake's video inspired the example section in the Python random module docs. It takes about 15 minutes with those examples to learn how to put Jake's ideas into practice. https://docs.python.org/3/library/random.html#examples .
wodenokoto 8 hours ago [-]
While I really liked the video by vanderplas, I did return to it after a year or two and paused every time he presented a problem and then tried to solve it using for loops and thinking hard.
I barely succeeded in any of it. So at that point just look up the formula instead of bootstrapping.
I’ll give the second one a shot too.
Terr_ 9 hours ago [-]
I think I avoid imposter syndrome in some areas, but Not Enough Real Math is definitely a weak spot.
When people start talking about eigenvalues, I'm just a business-rule caveman with a little discrete-math unga bunga.
This kind of statistical stuff falls somewhere in-between.
MrLeap 8 hours ago [-]
Eigenvalues are a topic in linear algebra. They're coefficients you can put in front of some matrices or vectors that change their magnitude.
Linear Algebra was the most useful and fun math class I took in college. Highly recommended if you ever wanna do gamedev. It's more approachable than you probably think.
For me, when people start talking about differential equations, specifically the symbols you'll see in a wikipedia article about Navier Stokes equations, I'm just a business-rule caveman with a little linear algebra zug zug.
vector_spaces 2 hours ago [-]
> Eigenvalues are a topic in linear algebra. They're coefficients you can put in front of some matrices or vectors that change their magnitude.
Multiplying a vector or a matrix by any nonunit scalar changes its magnitude (hence scalar!! i.e. something that scales). Not all scalars are eigenvalues. So this isn't quite right
Think about it geometrically instead. A linear operator transforms a space. Geometrically the transformation can be one or more of stretching, compressing, or rotating (taking shearing to be a kind of stretching). The directions in the space which remain the same other than having been scaled by some factor are the eigenvectors of the transformation. The scaling factor of one of those such directions is its eigenvalue.
roenxi 3 hours ago [-]
Studying more statistics is often clever. Although in this case Mr. Miller led the the most important part - if there are two numbers (like 7 and 5) in a statistical context they might be the same number. That throws a lot of people into such a tailspin that they never really recover after making the obvious mistake of thinking they are different.
The powerful heuristic for the less technically inclined is to say "well, this evidence isn't conclusive until someone who knows statistics has tried to shoot it down".
bob1029 11 hours ago [-]
I'd add z-score (standard score) to your tool belt. The ability to identify or reject outliers is invaluable when trying to stabilize real-world business processes.
For example, if you are building heuristics that determine if a customer's bank account is "reasonably active", you may not want to consider very small transactions unless that is typical activity for a given customer.
FYI, using this stuff without understanding Test Power is dangerous and can lead to making bad decisions with false confidence.
mcphage 12 hours ago [-]
The article "How Not To Sort By Average Rating" by the same author (and also linked in this article) is really good, and definitely changed my thinking about any kind of "sort by best to worst" list: https://www.evanmiller.org/how-not-to-sort-by-average-rating...
Joker_vD 12 hours ago [-]
Hm. I wonder how well would "Score = [Positive ratings] / ([Total ratings] + 1)" fare.
mcphage 12 hours ago [-]
It'll help some, but I don't think enough—it's way, way easier to get a good score on a small number of ratings than a large number. And the span on the number of ratings is several orders of magnitude—for instance, on Amazon you can do a search and get back products with less than 10 ratings along side products with over 10,000 ratings.
cmdrmac 10 hours ago [-]
This is certainly a very useful resource - even for a seasoned data scientist!
snitzr 12 hours ago [-]
Why isn't 7 greater than 5?
DeepSeaTortoise 11 hours ago [-]
Statistics gave him the superpower of predicting the future:
yes, and informative. i was looking at the article and i thought everything made sense but i could tell i was missing something about this line...
senkora 11 hours ago [-]
Treat them as two draws from possibly different, independent distributions.
The question is whether the distribution that drew 7 “stochastically dominates” the distribution that drew 5. You may or may not be able to conclude that based on the available data and assumptions about the distributions.
For example, if you assume that the two distributions are approximately normal with very small variances, then you can probably conclude that the distribution that drew 7 stochastically dominates the distribution that drew 5. But if you assume that the variances are large, then you probably can’t conclude that.
dlivingston 12 hours ago [-]
Sounds like you should read the article. :)
Kidding. The idea is that there may be some statistical uncertainty associated with the measurement of 7, and also of 5, and so the "real" value of 7 may actually be less than the "real" value of 5.
8 hours ago [-]
curtisszmania 11 hours ago [-]
[dead]
hmcamp 12 hours ago [-]
[flagged]
9 hours ago [-]
extrememacaroni 12 hours ago [-]
[flagged]
Jtsummers 12 hours ago [-]
Converting the math in here to code isn't very hard.
Hussell 11 hours ago [-]
The statisticians have a bunch of tricks to transform the formulas into more-easily computable forms, e.g. calculate both the average and the standard deviation in a single pass through the data instead of one pass to calculate the average and a second to calculate the standard deviation. Converting the math in here to efficient code isn't very easy.
glitchc 11 hours ago [-]
You mean Welford's algorithm. Since code was requested:
I totally with this sentence. BUT If you ask for my opinion, merely knowing a list of statistical formulas is not very helpful. Most of the time, people don’t remember the underlying assumptions, so there is a fair chance they will use them in inappropriate situations.
I recommend watching these two YouTube videos. The presenters advocate using simulation/bootstrapping/shuffling methods instead of memorizing formulas.
Jake Vanderplas - Statistics for Hackers https://www.youtube.com/watch?v=Iq9DzN6mvYA
John Rauser - Statistics Without the Agonizing Pain https://www.youtube.com/watch?v=5Dnw46eC-0o
Yeah, I often find it much easier to make a little Python script to do 10,000 monte-carlo trial, as opposed to properly" working things out and then not even being confident-enough in my result anyway.
I barely succeeded in any of it. So at that point just look up the formula instead of bootstrapping.
I’ll give the second one a shot too.
When people start talking about eigenvalues, I'm just a business-rule caveman with a little discrete-math unga bunga.
This kind of statistical stuff falls somewhere in-between.
Linear Algebra was the most useful and fun math class I took in college. Highly recommended if you ever wanna do gamedev. It's more approachable than you probably think.
For me, when people start talking about differential equations, specifically the symbols you'll see in a wikipedia article about Navier Stokes equations, I'm just a business-rule caveman with a little linear algebra zug zug.
Multiplying a vector or a matrix by any nonunit scalar changes its magnitude (hence scalar!! i.e. something that scales). Not all scalars are eigenvalues. So this isn't quite right
Think about it geometrically instead. A linear operator transforms a space. Geometrically the transformation can be one or more of stretching, compressing, or rotating (taking shearing to be a kind of stretching). The directions in the space which remain the same other than having been scaled by some factor are the eigenvectors of the transformation. The scaling factor of one of those such directions is its eigenvalue.
The powerful heuristic for the less technically inclined is to say "well, this evidence isn't conclusive until someone who knows statistics has tried to shoot it down".
For example, if you are building heuristics that determine if a customer's bank account is "reasonably active", you may not want to consider very small transactions unless that is typical activity for a given customer.
https://knowyourmeme.com/memes/fight-club-57-movie
The question is whether the distribution that drew 7 “stochastically dominates” the distribution that drew 5. You may or may not be able to conclude that based on the available data and assumptions about the distributions.
https://en.m.wikipedia.org/wiki/Stochastic_dominance
For example, if you assume that the two distributions are approximately normal with very small variances, then you can probably conclude that the distribution that drew 7 stochastically dominates the distribution that drew 5. But if you assume that the variances are large, then you probably can’t conclude that.
Kidding. The idea is that there may be some statistical uncertainty associated with the measurement of 7, and also of 5, and so the "real" value of 7 may actually be less than the "real" value of 5.
https://jonisalonen.com/2013/deriving-welfords-method-for-co...
There are others like this out there.
It is a good frequentist's toolbox, but it is not immediately translatable to code, no.