Esc to close · to navigate · to open

FA19

Discrete & Probabilistic Systems

Counting structures, modelling random processes, and the surprising amount of analysis that falls out of carefully placed independence assumptions.

Concepts learned

Tech

  • Python
  • NumPy
  • Matplotlib
  • Jupyter

Hero demo Markov Chain Visualizer

Walk me through this step by step
  1. Suppose today is sunny. What’s tomorrow? It might be sunny again, it might rain — but here’s the assumption that makes the whole machinery work: tomorrow depends only on today, not on whether it rained last Tuesday or stayed sunny all of June. The past doesn’t matter once you know where you are right now. That memoryless property is the entire Markov trick, and the surprising payoff is that whole systems of weather, gamblers’ bankrolls, and even the web graph fall neatly into the same little bit of algebra.

  2. So how do you write that down? You list the states (Sun, Rain) and give a transition probability for each ordered pair. From Sun: stay sunny with probability 0.8, switch to rain with 0.2. From Rain: dry out with probability 0.3, stay wet with 0.7. Stack those rows and you have a transition matrix P — each row sums to 1 because something has to happen tomorrow.

    P = \begin{pmatrix} 0.8 & 0.2 \\ 0.3 & 0.7 \end{pmatrix}
  3. Now walk one day forward. Today is definitely sunny, so the distribution is \pi_0 = (1, 0). Tomorrow’s sun probability is “chance I started in Sun × chance Sun→Sun” plus “chance I started in Rain × chance Rain→Sun”, which is 1 \cdot 0.8 + 0 \cdot 0.3 = 0.8. The rain half works the same way and lands on 0.2. So \pi_1 = (0.8, 0.2). That whole calculation is just the row-vector matrix product \pi_1 = \pi_0 P.

  4. Do it again. \pi_2 = \pi_1 P gives (0.8 \cdot 0.8 + 0.2 \cdot 0.3,\; 0.8 \cdot 0.2 + 0.2 \cdot 0.7) = (0.70, 0.30). One more step lands on (0.65, 0.35), then (0.625, 0.375), then (0.6125, 0.3875). The numbers are clearly crawling towards something — and that something is (0.6, 0.4). Sixty percent sun, forty percent rain, forever, regardless of what happens this afternoon.

  5. That limit is called the stationary distribution, written \pi. It’s the unique mix of states the chain settles into no matter where you started. The defining property is the cleanest equation in the course: applying one more step changes nothing.

    \pi P = \pi

    Read literally, that says \pi is a left eigenvector of P with eigenvalue 1. The probabilistic story (long-run occupancy) and the linear-algebra story (special eigenvector) are the same story.

  6. Why does the algebra promise a unique answer? Because for “nice” chains, a theorem (Perron–Frobenius) guarantees exactly one stationary distribution with all-positive entries. “Nice” means two things in plain English: you can get from any state to any other given enough time (no isolated islands), and the chain isn’t stuck in a clockwork cycle that revisits states only on multiples of some period. Weather satisfies both — sun and rain are reachable from each other, and the self-loops break any rigid periodicity.

  7. Not every chain mixes like that. Swap to the Absorbing chain preset: state Sink has a row of (0, 0, 1) — once you enter, you never leave. There’s no unique stationary distribution that forgets the start; everything eventually piles up at the sink with probability 1. Absorbing chains are how you model “gambler goes broke” or “particle reaches the boundary” — the interesting question shifts from “long-run mix” to “how long until I’m absorbed, and from which non-sink state?”

  8. Try it yourself.

  9. So what is this good for? Anywhere you have a system that bounces between states with fixed transition probabilities. The Gamblers’ ruin demo further down is a Markov chain on bankrolls with two absorbing states (broke, target). And the PageRank toy at the bottom of this page is the headline application — it models a random surfer clicking links on the web, and the PageRank of a page is literally its entry in the stationary distribution of that chain. The featured solution “Stationary distribution of a 3-state Markov chain via the eigenvalue 1” works the \pi P = \pi arithmetic on a three-state weather model end-to-end; read it next to see the same machinery one dimension up.

t = 0\|\pi - \pi^*\|_1 = 0.800

Markov chain "Weather (sun/rain)" with 2 states; after 0 steps, the distribution is 0.8000 (L1) away from its stationary distribution.

600

Time between Markov steps. Lower = faster animation.

step 0

Reflection

Pre-interview preview. Akwasi’s first-person reflection on this course is pending — track at issue #45.

Bayes’ theorem

Sliders for prior, likelihood, and false-positive rate; live posterior and 2×2 table update as you drag.

P(D|+) = 0.0902P(D|-) = 0.0000

Bayes' theorem on a population of 10000: with prior P(D) = 0.0010, sensitivity 0.990, and specificity 0.990, the posterior P(D|+) is 0.0902. When the prior is small the posterior after a positive test is often much lower than the test's accuracy suggests.

0.0010

Base rate of the condition in the population before testing.

0.990

Probability the test is positive given the person has the condition.

0.990

Probability the test is negative given the person does not have the condition.

Population:
pop = 10000

Birthday paradox simulator

Drag the room size and watch the collision probability climb past 50% near 23 people.

Walk me through this step by step
  1. Imagine a room of 23 strangers. What are the chances that two of them share a birthday? Most people guess one or two percent — there are 365 days, and 23 people feels tiny. The real answer is just over 50 percent. This walkthrough is about why that gap exists, and it turns out to be one specific counting mistake everyone makes.

  2. The mistake is asking the wrong question. Without realising it, your gut estimates “what is the chance someone in the room matches me?” — but the real puzzle is “what is the chance any two people in the room match each other?” Those are very different questions.

  3. Here is the trick. With 23 people, how many pairs are there? Pick any two people and you have a pair. The count is 23 \times 22 / 2 = 253. So you are not doing 23 comparisons against the calendar — you are running 253 mini-comparisons between people, all at once. Now 365 days suddenly feels a lot less roomy.

  4. To compute the probability we count the easy thing first: the chance that nobody matches anyone. Then collision probability is just 1 minus that. (This “count the complement” move is one of the handful of probability tricks worth memorising — the “bad” event is usually one tidy product; the “good” event is a messy union.)

  5. Walk through 23 people one at a time. Person 1 picks any day — no conflict possible. Person 2 needs to avoid one taken day, so they have 364/365 of the calendar free. Person 3 needs to avoid two taken days: 363/365. Multiply them all the way to person 23.

  6. That product is roughly (364/365)(363/365)\cdots(343/365) \approx 0.493. So the probability nobody shares is about 49.3%, and the probability that at least one pair does share is about 50.7%. That is where the magic “23 people gives 50%” number actually comes from — it is just this multiplication, no fancy math required.

  7. The bigger lesson is the rule of thumb. Whenever you have D equally-likely options (calendars, hash outputs, fingerprint buckets), you start expecting matches after about \sqrt{D} samples, not D. The square root falls out of the pair-counting we just did: n^2/2 pairs need to fit into D slots, so a collision is likely once n \approx \sqrt{D}.

  8. For birthdays, \sqrt{365} \approx 19, and the exact threshold for a 50/50 collision is 23 — close to the square-root estimate, exactly as predicted. The demo’s chart shows both the exact curve and the 1 - e^{-n^2/(2D)} approximation. Past about n = 10 they agree to within a percent, so feel free to use the simpler one for back-of-the-envelope estimates.

  9. Try it yourself.

  10. So the trick is this. Linear intuition asks “out of 365 days, how many will match me?” Pair intuition asks “out of 253 pairs, how many will match each other?” Pairs grow as n^2, days stay flat at D, and the crossover where pairs catch up to days is right at \sqrt{D}. That single shift in question explains every “birthday-style” collision result you will ever meet.

  11. The featured problem “Birthday paradox — why 23 people gives a 50% collision probability” below walks the full algebra and derives the \sqrt{2 \ln 2} constant for the exact 50% crossover. For a different counting trick where indicator variables turn a messy “how long until all coupons collected?” problem into a one-line sum, see the coupon collector featured problem in the same section.

n = 1P(\text{collision}) = 0.000

Birthday paradox with 365 days in the year: the probability of at least one shared birthday hits 50% at n = 23 people.

365

Size of the birthday alphabet. 365 = real calendar; 256 = one-byte hash.

50

Upper bound of people swept on the x-axis.

0.50

Horizontal reference line; n* is the smallest n that meets this.

200

Pause between successive n increments in the animation.

n = 1

Central limit theorem

Pick a base distribution, stack the sample means, watch a Gaussian emerge.

\mu = 0.500\sigma/\sqrt{n} = 0.053\text{empirical } \sigma = 0.052

Central Limit Theorem demo: drawing 2000 sample means of size n = 30 from a Uniform(0, 1) distribution. As n grows the histogram of sample means approaches a normal curve regardless of the underlying distribution.

Distribution
30

How many i.i.d. draws are averaged into each sample mean. CLT predicts the distribution of means approaches normal as n grows.

2000

How many sample means are drawn for the histogram. More samples → smoother histogram.

42

Seed for the deterministic PRNG so runs are reproducible and shareable.

n = 30, samples = 2000

Combinations & permutations

Calculator + visual enumeration for nPk and nCk with small n.

\binom{6}{2}=15P_{6,2}=30\binom{6}{2}=\frac{6!}{2!(6-2)!}

Scenario "Pick 6 of 20". There are 15 ways to choose 2 items from 6 (unordered), or 30 ways if order matters.

6

Total number of items to choose from.

2

How many items are chosen.

C(6, 2) = 15

Erdős-Rényi random graph

Tune the edge probability p and watch the giant component appear past the threshold.

n = 50p = 0.050\text{mean deg} = 2.36\text{largest comp} = 45 / 50

Erdős-Rényi random graph G(n = 50, p = 0.050), seed 42: above the p ≈ 1/n threshold (supercritical, giant component). Each of the 50 nodes has expected degree 2.45.

50

Number of vertices in the graph. Edges are drawn between every unordered pair independently with probability p.

0.050

Per-pair edge probability. The phase transition sits near p ≈ 1/n: below it the graph is dust; above it a giant component emerges.

42

PRNG seed for reproducible sampling. Change the seed to draw a different random graph from the same G(n, p) distribution.

50 nodes, 59 edges

Gambler’s ruin / Monte Carlo

Biased coin walks to ruin or fortune; theoretical vs simulated absorption probabilities side by side.

P(\text{ruin}) = 0.500\text{Empirical} = 0.000E[T] = 625

Gambler's ruin random walk on {0..50} starting at wealth k=25 with win probability p=0.50. Simulating 50 walks; 0 completed so far. Analytical P(ruin) ≈ 0.500, empirical 0.000.

50

Upper absorbing barrier — the gambler wins by reaching N.

25

Initial wealth; clamped to [1, N-1].

0.50

Probability of a +1 step. p=0.5 is a fair walk.

50

How many independent random walks to simulate.

42

PRNG seed (mulberry32). Same seed → same walks.

walks 0 / 50

Hypothesis testing

Visualise α, β, p-values, and effect size for a one-sided z-test.

Scenario
z = 1.643p = 0.1003\text{fail to reject H0}

Testing H₀: μ = 0.00 vs two-sided alternative with x̄ = 0.30, σ = 1.00, sample size 30. Observed z = 1.643, p = 0.1003. At α = 0.05, fail to reject H0.

Alternative
α
0.30

Observed sample mean of the data.

0.00

Hypothesised population mean under H₀.

1.0

Known population standard deviation.

30

Number of observations. Bigger samples → smaller standard error.

n = 30

Monte Carlo π estimator

Drop random darts into a unit square; convergence of 4·hits/total to π.

\hat\pi = 0.0000SE = 0.0000n = 0

Monte Carlo π estimation with seed 42: 0 of 10000 samples drawn so far, current estimate π̂ ≈ 0.0000 (true π ≈ 3.14159).

42

PRNG seed (mulberry32). Same seed → same sequence of darts.

10000

How many darts to throw in total.

100

Darts thrown per animation frame.

n 0 / 10000

PageRank toy

A tiny web graph where power iteration converges to the dominant eigenvector of the transition matrix.

\text{iterations} = 26\text{converged} = \text{yes}\text{top rank} = 0.308 \text{ at node } 0

PageRank on the Web example graph (7 nodes, 13 edges), damping d = 0.85, up to 200 power-iteration steps. Node size and color encode each node's rank.

0.85

Probability of following a link; 1 − d is the teleport probability.

200

Cap on power-iteration steps before bailing out.

7 nodes, 13 edges

What’s coming

The author’s written reflection lands alongside the v4 interview. The demos and featured problems above are wired and playable today.