Suppose today is sunny. What’s tomorrow? It might be sunny again, it
might rain — but here’s the assumption that makes the whole machinery
work: tomorrow depends only on today, not on whether it rained last
Tuesday or stayed sunny all of June. The past doesn’t matter once you
know where you are right now. That memoryless property is
the entire Markov trick, and the surprising payoff is that whole
systems of weather, gamblers’ bankrolls, and even the web graph fall
neatly into the same little bit of algebra.
So how do you write that down? You list the states (Sun, Rain) and
give a transition probability for each ordered pair. From Sun: stay
sunny with probability 0.8, switch to rain with 0.2. From Rain: dry
out with probability 0.3, stay wet with 0.7. Stack those rows and you
have a transition matrix PP — each row sums
to 1 because something has to happen tomorrow.
Now walk one day forward. Today is definitely sunny, so the
distribution is π0=(1,0)\pi_0 = (1, 0). Tomorrow’s
sun probability is “chance I started in Sun × chance Sun→Sun” plus
“chance I started in Rain × chance Rain→Sun”, which is
1⋅0.8+0⋅0.3=0.81 \cdot 0.8 + 0 \cdot 0.3 = 0.8. The rain
half works the same way and lands on 0.2. So
π1=(0.8,0.2)\pi_1 = (0.8, 0.2). That whole
calculation is just the row-vector matrix product
π1=π0P\pi_1 = \pi_0 P.
Do it again. π2=π1P\pi_2 = \pi_1 P gives
(0.8⋅0.8+0.2⋅0.3,0.8⋅0.2+0.2⋅0.7)=(0.70,0.30)(0.8 \cdot 0.8 + 0.2 \cdot 0.3,\; 0.8 \cdot 0.2 + 0.2 \cdot 0.7) = (0.70, 0.30).
One more step lands on (0.65,0.35)(0.65, 0.35), then
(0.625,0.375)(0.625, 0.375), then
(0.6125,0.3875)(0.6125, 0.3875). The numbers are clearly
crawling towards something — and that something is
(0.6,0.4)(0.6, 0.4). Sixty percent sun, forty
percent rain, forever, regardless of what happens this afternoon.
That limit is called the stationary distribution,
written π\pi. It’s the unique mix of states
the chain settles into no matter where you started. The defining
property is the cleanest equation in the course: applying one more
step changes nothing.
πP=π\pi P = \pi
Read literally, that says π\pi is a left
eigenvector of PP with eigenvalue 1. The
probabilistic story (long-run occupancy) and the linear-algebra
story (special eigenvector) are the same story.
Why does the algebra promise a unique answer? Because for “nice”
chains, a theorem (Perron–Frobenius) guarantees exactly one
stationary distribution with all-positive entries. “Nice” means two
things in plain English: you can get from any state to any other
given enough time (no isolated islands), and the chain isn’t stuck
in a clockwork cycle that revisits states only on multiples of some
period. Weather satisfies both — sun and rain are reachable from
each other, and the self-loops break any rigid periodicity.
Not every chain mixes like that. Swap to the Absorbing
chain preset: state Sink has a row of (0,0,1)(0, 0, 1) — once you enter, you never leave. There’s no
unique stationary distribution that forgets the start; everything
eventually piles up at the sink with probability 1. Absorbing chains
are how you model “gambler goes broke” or “particle reaches the
boundary” — the interesting question shifts from “long-run mix” to
“how long until I’m absorbed, and from which non-sink state?”
Try it yourself.
So what is this good for? Anywhere you have a system that bounces
between states with fixed transition probabilities. The
Gamblers’ ruin demo further down is a Markov chain
on bankrolls with two absorbing states (broke, target). And the
PageRank toy at the bottom of this page is the
headline application — it models a random surfer clicking links on
the web, and the PageRank of a page is literally its entry in the
stationary distribution of that chain. The featured solution
“Stationary distribution of a 3-state Markov chain via the
eigenvalue 1” works the πP=π\pi P = \pi
arithmetic on a three-state weather model end-to-end; read it next
to see the same machinery one dimension up.
t=0t = 0∥π−π∗∥1=0.800\|\pi - \pi^*\|_1 = 0.800
Markov chain "Weather (sun/rain)" with 2 states; after 0 steps, the distribution is 0.8000 (L1) away from its stationary distribution.
Time between Markov steps. Lower = faster animation.
1002000
step 0
Reflection
Pre-interview preview. Akwasi’s first-person reflection on this course is pending —
track at issue #45.
Bayes’ theorem
Sliders for prior, likelihood, and false-positive rate; live posterior and 2×2
table update as you drag.
Bayes' theorem on a population of 10000: with prior P(D) = 0.0010, sensitivity 0.990, and specificity 0.990, the posterior P(D|+) is 0.0902. When the prior is small the posterior after a positive test is often much lower than the test's accuracy suggests.
Base rate of the condition in the population before testing.
0.00001.0000
Probability the test is positive given the person has the condition.
0.0001.000
Probability the test is negative given the person does not have the condition.
0.0001.000
Population:
pop = 10000
Birthday paradox simulator
Drag the room size and watch the collision probability climb past 50% near
23 people.
Walk me through this step by step↓
Imagine a room of 23 strangers. What are the chances that two of them
share a birthday? Most people guess one or two percent — there are
365 days, and 23 people feels tiny. The real answer is just over 50
percent. This walkthrough is about why that gap exists, and
it turns out to be one specific counting mistake everyone makes.
The mistake is asking the wrong question. Without realising it, your
gut estimates “what is the chance someone in the room matches
me?” — but the real puzzle is “what is the chance any
two people in the room match each other?” Those are very
different questions.
Here is the trick. With 23 people, how many pairs are there?
Pick any two people and you have a pair. The count is
23×22/2=25323 \times 22 / 2 = 253. So you are not
doing 23 comparisons against the calendar — you are running 253
mini-comparisons between people, all at once. Now 365 days suddenly
feels a lot less roomy.
To compute the probability we count the easy thing first: the chance
that nobody matches anyone. Then collision probability is
just 1 minus that. (This “count the complement” move is one of the
handful of probability tricks worth memorising — the “bad” event is
usually one tidy product; the “good” event is a messy union.)
Walk through 23 people one at a time. Person 1 picks any day — no
conflict possible. Person 2 needs to avoid one taken day, so they
have 364/365364/365 of the calendar free. Person
3 needs to avoid two taken days: 363/365363/365.
Multiply them all the way to person 23.
That product is roughly
(364/365)(363/365)⋯(343/365)≈0.493(364/365)(363/365)\cdots(343/365) \approx 0.493.
So the probability nobody shares is about 49.3%, and the probability
that at least one pair does share is about 50.7%. That is
where the magic “23 people gives 50%” number actually comes from —
it is just this multiplication, no fancy math required.
The bigger lesson is the rule of thumb. Whenever you have
D equally-likely options (calendars, hash outputs, fingerprint
buckets), you start expecting matches after about
D\sqrt{D} samples, not D. The
square root falls out of the pair-counting we just did:
n2/2n^2/2 pairs need to fit into
D slots, so a collision is likely once
n≈Dn \approx \sqrt{D}.
For birthdays, 365≈19\sqrt{365} \approx 19, and
the exact threshold for a 50/50 collision is 23 — close to the
square-root estimate, exactly as predicted. The demo’s chart shows
both the exact curve and the
1−e−n2/(2D)1 - e^{-n^2/(2D)} approximation. Past
about n = 10 they agree to within a percent, so feel free
to use the simpler one for back-of-the-envelope estimates.
Try it yourself.
So the trick is this. Linear intuition asks “out of 365 days, how
many will match me?” Pair intuition asks “out of 253 pairs, how many
will match each other?” Pairs grow as
n2n^2, days stay flat at D, and
the crossover where pairs catch up to days is right at
D\sqrt{D}. That single shift in question
explains every “birthday-style” collision result you will ever meet.
The featured problem “Birthday paradox — why 23 people gives a
50% collision probability” below walks the full algebra and
derives the
2ln2\sqrt{2 \ln 2} constant for the exact
50% crossover. For a different counting trick where indicator
variables turn a messy “how long until all coupons collected?”
problem into a one-line sum, see the coupon collector featured
problem in the same section.
Central Limit Theorem demo: drawing 2000 sample means of size n = 30 from a Uniform(0, 1) distribution. As n grows the histogram of sample means approaches a normal curve regardless of the underlying distribution.
Distribution
How many i.i.d. draws are averaged into each sample mean. CLT predicts the distribution of means approaches normal as n grows.
1200
How many sample means are drawn for the histogram. More samples → smoother histogram.
1005000
Seed for the deterministic PRNG so runs are reproducible and shareable.
1100000
n = 30, samples = 2000
Combinations & permutations
Calculator + visual enumeration for nPk and nCk with small n.
Erdős-Rényi random graph G(n = 50, p = 0.050), seed 42: above the p ≈ 1/n threshold (supercritical, giant component). Each of the 50 nodes has expected degree 2.45.
Number of vertices in the graph. Edges are drawn between every unordered pair independently with probability p.
5100
Per-pair edge probability. The phase transition sits near p ≈ 1/n: below it the graph is dust; above it a giant component emerges.
0.0001.000
PRNG seed for reproducible sampling. Change the seed to draw a different random graph from the same G(n, p) distribution.
1100000
50 nodes, 59 edges
Gambler’s ruin / Monte Carlo
Biased coin walks to ruin or fortune; theoretical vs simulated absorption
probabilities side by side.
Gambler's ruin random walk on {0..50} starting at wealth k=25 with win probability p=0.50. Simulating 50 walks; 0 completed so far. Analytical P(ruin) ≈ 0.500, empirical 0.000.
Upper absorbing barrier — the gambler wins by reaching N.
5100
Initial wealth; clamped to [1, N-1].
149
Probability of a +1 step. p=0.5 is a fair walk.
0.300.70
How many independent random walks to simulate.
10500
PRNG seed (mulberry32). Same seed → same walks.
1100000
walks 0 / 50
Hypothesis testing
Visualise α, β, p-values, and effect size for a one-sided z-test.
Scenario
z=1.643z = 1.643p=0.1003p = 0.1003fail to reject H0\text{fail to reject H0}
Testing H₀: μ = 0.00 vs two-sided alternative with x̄ = 0.30, σ = 1.00, sample size 30. Observed z = 1.643, p = 0.1003. At α = 0.05, fail to reject H0.
Alternative
α
Observed sample mean of the data.
-3.003.00
Hypothesised population mean under H₀.
-3.003.00
Known population standard deviation.
0.15.0
Number of observations. Bigger samples → smaller standard error.
1200
n = 30
Monte Carlo π estimator
Drop random darts into a unit square; convergence of 4·hits/total to π.