SP20
Computational Methods
Numerical root-finding, linear-system solvers, and time-stepping ODEs — algorithms that let a computer chase analytic problems past the point of paper.
Concepts learned
Tech
- Python
- NumPy
- SciPy
- Matplotlib
Hero demo Lorenz Solver (RK4)
RK4 trajectory for the lorenz attractor system, integrated with step size dt = 0.010 from t = 0 to t = 50. The butterfly: chaotic 3D flow projected onto the xz-plane.
Reflection
Pre-interview preview. Akwasi’s first-person reflection on this course is pending — track at issue #46.
Bisection method
Bracket-and-halve root finding with a live convergence trace.
Walk me through this step by step
-
Suppose I ask you to find x such that x² = 2. Your calculator returns 1.41421356… in a millisecond, but how does it actually do that? You can’t solve it on paper. Bisection is the simplest answer — the slowest, most reliable algorithm in the toolkit, and the one every “fancy” method falls back to when it gets confused.
-
The whole trick is one observation about continuous functions. If f(a) is negative and f(b) is positive, then the graph has to cross zero somewhere between a and b. We don’t know where yet, but we know the root is trapped in the interval. Bisection’s job is to shrink that trap until it is small enough to call the answer.
-
Look at f(x) = x^2 - 2 on the interval [0, 2]. We have f(0) = -2 (negative) and f(2) = 2 (positive), so a root is trapped inside. Cut the interval in half. The midpoint is 1, and f(1) = -1 is still negative. So the sign change must happen between 1 and 2 — we just threw half the work away.
-
Do it again on [1, 2]: midpoint 1.5, f(1.5) = 0.25 (positive), so the root is now in [1, 1.5]. Again on [1, 1.5]: midpoint 1.25, f(1.25) = -0.4375, so root is in [1.25, 1.5]. Four steps in we are at [1.375, 1.4375] — already two decimal digits of \sqrt{2}.
-
Every step exactly halves the width of the bracket. After 10 steps the trap is 1/1024 of its original size. After 20 steps it is one in a million. After 50 it is smaller than a double-precision float can even represent. That is the entire convergence story — the bracket width is (b_0 - a_0) / 2^n after n steps, and you stop when it is below your tolerance.
-
To get one more decimal digit of precision, you need just over three more bisection steps (because 2³ = 8 and 2⁴ = 16, sandwich 10 between them). So 21 steps gets you 6 decimal digits, 30 gets you 9, 50 gets you ~15. The demo’s default tolerance is 10^{-6}, which is why it halts around iteration 21.
-
The catch is that bisection is slow. Every step buys you the same constant factor of improvement — one bit. Newton’s method, which we will meet next, can buy you doubling-digits per step when it works. But bisection’s superpower is that it always works. It never diverges, never oscillates, never needs a derivative. It cannot fail as long as its preconditions hold.
-
Those preconditions are everything: f must be continuous, and the starting interval must have a sign change. Skip either and the IVT argument breaks. A jump discontinuity inside [a, b] can let f(a) \cdot f(b) < 0 be true with no actual root in between — the graph “crosses” the axis by teleporting over it.
-
Try it yourself.
-
So that is bisection: the bracketing baseline that every faster root-finder is compared to. The very next demo, Newton’s method, throws away bracketing in exchange for using the derivative to take much bigger steps — and the featured problem “Quadratic convergence of Newton’s method near a simple root” counts exactly how much faster Newton is when it works. Read them back-to-back: bisection gives you one bit per step always; Newton gives you doubling-digits per step until it doesn’t.
Bisection root finder on f(x) = x² − 2, bracketed by [0.000, 2.000], with tolerance 1e-6 and up to 30 iterations.
Newton’s method
Tangent-line iteration with quadratic-convergence visualisation.
Walk me through this step by step
-
Bisection earns you one bit of precision per step. If you also know the slope of the function, you can do much better — converging in about five steps instead of twenty for the same accuracy. That is Newton’s method, and the leap from “one bit per step” to “doubling digits per step” is one of the most useful speedups in all of numerical computing.
-
The geometric idea: at your current guess, draw the tangent line to the graph. The tangent crosses the x-axis at one specific point — that crossing becomes your next guess. Up close, any smooth function looks almost like a straight line, so the tangent’s prediction of where the graph hits zero is almost exactly right. Then you repeat from the new spot.
-
A worked example with f(x) = x^2 - 2 and starting guess x_0 = 1.5: we have f(1.5) = 0.25 and slope f'(1.5) = 3. The tangent line crosses zero at 1.5 - 0.25/3 \approx 1.4167. That’s our next guess. Check it: f(1.4167) \approx 0.0069 — already two decimal digits closer to \sqrt{2} from one step.
-
The pattern that produces is the famous Newton formula:
x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}Read it as: “from where you are now, step backward by (current value) divided by (current slope).” If the value is already small, the step is small. If the slope is steep, the step is small. If the slope is gentle, the step is big.
-
What makes Newton fast is the famous doubling rule: each step squares the error rather than halving it. If you start with one digit of accuracy, the next step gives two, then four, then eight, then sixteen. With the square-root preset, five iterations take you from 1.5 to \sqrt{2} accurate to a dozen digits — past anything a single-precision float could represent.
-
Why does this work? Near a smooth root, the function genuinely is almost a straight line, and the tangent is that line. So the new error is roughly the previous error squared — small numbers squared get tiny fast. This is what mathematicians call quadratic convergence, and it is the gold standard for iterative algorithms.
-
The catch is that Newton is fragile. If your starting guess is far from the root, or the slope is nearly horizontal, or the function has a kink, Newton can wander off in surprising directions. It can cycle. It can shoot to infinity. It can land on a different root than the one you wanted. Bisection, in contrast, just slowly does its job.
-
The “Bad seed (cycle)” preset shows the worst case. With f(x) = x^3 - 2x + 2 seeded at x_0 = 0: f(0) = 2, slope = -2, tangent crosses zero at 0 - 2/(-2) = 1. Now from x = 1: f(1) = 1, slope = 1, tangent crosses at 1 - 1/1 = 0. We’re back where we started, and the iteration ping-pongs between 0 and 1 forever.
-
Try it yourself.
-
So the trade is concrete: bisection gives you one bit per step always; Newton gives you doubling-digits per step until it doesn’t. The featured problem “Quadratic convergence of Newton’s method near a simple root” below derives the error-squaring rule from a Taylor expansion. For the slow-but-bulletproof counterpart that needs no derivative, see the bisection demo just above. Production root finders like Brent’s method actually cheat and combine both — Newton when it is racing, bisection as a safety net when it isn’t.
Newton's method on x² − 2, starting at x₀ = 1.50, hunting for a root with tolerance 1e-6 over up to 20 iterations.
Numerical integration comparator
Side-by-side midpoint, trapezoidal, and Simpson’s rule error analysis.
Quadrature: integrating Quadratic (x^2) from 0.00 to 1.00 with 8 subintervals using the midpoint rule.
Monte Carlo integration
Compare convergence rates of Monte Carlo integration against the trapezoidal rule on the same integrand.
Walk me through this step by step
-
Imagine you want to know the area of an oddly-shaped pond in a rectangular field. You can’t measure it directly, but you can throw a thousand pebbles at random across the field and count how many splash into the water. The fraction that splash, multiplied by the field’s area, is a surprisingly good estimate of the pond’s area. Monte Carlo integration is exactly that idea, dressed up for any function you can evaluate. No calculus required — just averages and a good random number generator.
-
Suppose you want to compute the integral of f(x) from a to b. Instead of slicing the interval into neat strips like the trapezoidal or Simpson’s rule, you pick N points at random inside [a, b] and evaluate f at each one. The average f-value tells you roughly how tall the function is on average, and multiplying by the interval width (b - a) gives you the area underneath. That is the whole algorithm. The cleverness is in believing it actually converges.
-
Let’s estimate \int_0^1 x^2\, dx by hand. The exact answer is 1/3 \approx 0.3333. Pick four random x-values in [0, 1] — say 0.17, 0.42, 0.68, 0.91 — and square them: 0.029, 0.176, 0.462, 0.828. The average is 0.374, and the interval width is 1, so our estimate is 0.374. With only four darts we are already in the right neighbourhood, off by about 12%. Bump the count to 1000 and the error typically drops below 2%.
-
Writing the recipe down:
\int_a^b f(x)\, dx \approx \frac{b - a}{N} \sum_{i=1}^{N} f(x_i)where the points x_i are drawn uniformly from [a, b]. That’s it — no derivatives, no smoothness assumptions, no fancy weights. The same formula generalises to two, three, or twenty dimensions: replace (b - a) with the volume of the bounding box and draw points uniformly inside it. The integrator in this demo is roughly thirty lines of code.
-
Why does averaging random samples work at all? The Law of Large Numbers is the engine: as N grows, the sample average converges to the true average value of f on the interval. Each new dart you throw nudges the running mean a little closer to the right answer. More darts, more accurate average — that is the whole intuition. The same theorem underpins every opinion poll, every clinical trial, and every casino’s pricing model.
-
The catch is how slowly it converges. The error shrinks like 1/\sqrt{N}, not like 1/N. To gain one decimal digit of accuracy you must multiply N by 100. Going from 1000 samples to 100,000 only buys you one extra digit. The trapezoidal rule, by contrast, gives roughly two extra digits per ten-times-finer mesh in 1D. So for a smooth one-dimensional integrand, Monte Carlo is genuinely worse than the deterministic rules.
-
So why bother? Because deterministic rules suffer the curse of dimensionality. To halve the trapezoidal-rule error in 20 dimensions you need roughly a million times more grid points. Monte Carlo does not care — its 1/\sqrt{N} rate is independent of dimension. By around four or five dimensions Monte Carlo starts winning, and by twenty it is the only thing that fits in memory. Financial risk modelling, particle physics, and graphics rendering all live in that high-dimensional regime.
-
A nice bonus: the sample variance gives you a free error bar. The demo reports a 95% confidence half-width of about 1.96 \cdot s / \sqrt{N}, where s is the sample standard deviation of the f-values. So you do not just get an estimate — you get an honest “I’m 95% sure the true value lies within plus-or-minus this much” range. Plain quadrature has nothing comparable without a separate error analysis. With Monte Carlo the uncertainty is baked in.
-
Try it yourself.
-
Read this one paired with the Numerical integration comparator above — they are the two halves of numerical integration. Quadrature spends a careful, deterministic budget of function evaluations and crushes smooth low-dimensional integrals. Monte Carlo spends a sloppy random budget and crushes high-dimensional ones. The same Strang chapter connects them: deterministic rules optimise the placement of N points; Monte Carlo gives up and lets randomness do the placing. Pick the right tool for your dimension and you will never look back.
Monte Carlo integration of x squared on [0.00, 1.00] using 1000 random samples. The exact integral equals 0.3333; each estimate averages f(x) over uniformly drawn x_i.
Gaussian elimination — step by step
Watch row operations reduce an augmented matrix to row-echelon form.
Gaussian elimination on the 3x3 simple (no swap needed) 3x3 system. 4 row operations (pivot + eliminate) drive the matrix to upper-triangular form.
LU decomposition
Animate the LU factorisation step by step on a 4×4 matrix.
LU decomposition of "Well-conditioned". det(A) = 12.000, factored cleanly. For b = (1, 2, 3), x = (-0.17, 0.00, 0.83).
Least squares curve fitting
Click to drop points; fit polynomials of varying degree with sum-of-squared-residuals shown live.
Fitting y = mx + b to 24 points from preset "Linear" using closed-form normal equations. Estimated slope 1.537 vs true 1.50. R² = 0.994.
Lagrange vs cubic spline
Add interpolation nodes; compare Runge’s-phenomenon-prone Lagrange against natural cubic splines.
Interpolating runge's function (equispaced) through 11 nodes with Lagrange polynomial and natural cubic spline.
Eigenvalue power iteration
Iterate Ax → x until the dominant eigenvector emerges.
Power iteration on matrix preset "Symmetric 2×2" — well-conditioned symmetric matrix, dominant eigenvalue ≈ 3.618. v ← Av/‖Av‖ converges to the dominant eigenvector at rate |λ₂/λ₁|. Currently λ ≈ 3.500.
Discrete Fourier transform
Stem plots for a signal’s DFT amplitude and phase, with windowing options.
Discrete Fourier transform of the single sinusoid (k=4) signal sampled at N = 64 points (sample rate 64 Hz). Peak frequency component sits at bin k = 4, or about 4.00 Hz.
Featured problems
What’s coming
The author’s written reflection lands alongside the v5 interview. The demos and featured problems above are wired and playable today.