1. Preface
This sample provides a concise preview of the book’s learning experience: a short Preface outlining the approach and objectives, followed by the complete first chapter as a concrete example. The full book blends mathematics, Python, and modern data science practice into an executable learning journey.
1.1. Approach
-
Emphasize intuition-first mathematics with clear geometric and probabilistic perspectives.
-
Translate concepts into code early to build a strong mental model of computation.
-
Use small, focused examples and figures to make abstract ideas tangible.
1.2. Objectives
-
Provide the mathematical foundations most relevant to data science and machine learning.
-
Build practical skills in Python and its scientific stack for modeling and analysis.
-
Foster reproducible, executable workflows that scale from teaching to real projects.
1.3. Content Overview
-
Python essentials; numerical thinking and error analysis.
-
Vectors, norms, linear maps; eigenvalues, SVD, and low-rank structure.
-
Differentiation, chain rule, and optimization landscapes.
-
Probability, statistical learning basics, and linear models end-to-end.
-
Stochastic optimization, spectral methods, autodiff, and modern ML motifs.
The example chapter below demonstrates the book’s narrative style and how code and math reinforce each other.
2. Why Math With Code (and Code With Math)
We learn math faster, deeper, and more honestly when every idea is paired with a small experiment that can try to break it.
-
Why couple every concept to a runnable check
-
Reproducibility: seeds, assertions, and invariants
-
Vectorization over loops; shapes as first‑class citizens
-
Set up tiny numerical experiments to verify a claim
-
Use seeds and sanity checks for reproducibility
-
Translate a statement into code and back into math
2.1. Concept & Intuition
We will use a simple “Math ↔ Code” loop: state a claim, test it with a minimal experiment, and then adjust our intuition based on what actually happens.
This keeps us honest about edge cases, finite‑sample quirks, and numerical precision.
Two habits power this loop:
-
Make statements falsifiable. If you say “the sample mean approaches the true mean,” decide how you’ll measure “approaches” and what deviations are acceptable for a given sample size.
-
Prefer tiny, vectorized experiments. Short code makes it easy to read output and spot when reality diverges from expectation.
2.2. Formalism
Setup (i.i.d. data). Let \(X_1, X_2, \dots\) be independent and identically distributed with finite expectation \(\mathbb{E}[X\) = \mu]. The sample mean of the first \(n\) observations is \(\bar X_n = \tfrac{1}{n}\sum_{i=1}^n X_i\).
Convergence in probability. A sequence \(Y_n\) converges in probability to \(Y\) if for every \(\varepsilon > 0\), the probability of deviating by more than \(\varepsilon\) vanishes: \(\Pr(|Y_n - Y| > \varepsilon) \to 0\) as \(n \to \infty\).
Law of Large Numbers (LLN). Under the setup above, the sample mean converges in probability to the true expectation:
Convergence rate intuition. For light-tailed distributions the typical error of \(\bar X_n\) decays on the order of \(1/\sqrt{n}\). Heavy tails slow convergence dramatically and may even invalidate the LLN if \(\mathbb{E}[X\)] does not exist.
2.3. From Math → Code
We turn the LLN into a quick numerical check. Keep the code small and the outputs legible so patterns jump out at a glance.
In [1]: import numpy as np
In [2]: rs = np.random.default_rng(42) # reproducible RNG
In [3]: def lln_demo(dist, n_values=(100, 1_000, 10_000)):
...: # dist: small dict with 'mean' (float) and 'gen'(rng,size)->array
...: """Check how the sample mean approaches the expectation."""
...: mu = dist['mean']
...: for n in n_values:
...: x = dist['gen'](rs, size=n) # draw n samples
...: m = x.mean() # compute sample mean
...: err = abs(m - mu) # distance to expectation
...: print(f"n={n:>6d} mean={m:+.6f} |mean-mu|={err:.6f}")
...: assert np.isfinite(m) # finite output sanity
In [4]: normal = {
...: 'mean': 0.0, # E[X] = 0 for standard normal
...: 'gen': lambda rng, size: rng.standard_normal(size=size).astype(np.float64),
...: }
In [5]: lln_demo(normal)
n= 100 mean=+0.023189 |mean-mu|=0.023189
n= 1000 mean=-0.004349 |mean-mu|=0.004349
n= 10000 mean=-0.002052 |mean-mu|=0.002052
What to observe:
-
Errors typically decrease as n grows (not strictly monotone—randomness persists).
-
Using float64 keeps rounding noise low; float32 makes small differences noisier.
-
Heavy‑tailed distributions may need much larger n for the same stability.
-
If you rerun with a different seed, the overall pattern persists, but individual errors wobble — randomness never fully disappears.
2.4. From Code → Math
Let’s empirically check that variance is non‑negative via the identity
In [6]: def variance_nonneg_demo(rng, n=50_000):
...: x = rng.standard_normal(size=n).astype(np.float64) # samples
...: ex = x.mean() # E[X]
...: ex2 = (x * x).mean() # E[X^2]
...: var_est = ex2 - ex * ex # Var~
...: print(f"E[X]={ex:+.4f}, E[X^2]={ex2:+.4f}, Var~={var_est:+.6f}")
...: assert var_est > -1e-12 # allow tiny negatives from rounding
In [7]: variance_nonneg_demo(rs)
E[X]=+0.0004, E[X^2]=+0.9808, Var~=+0.9807
We didn’t “prove” the identity; we turned it into a falsifiable check. Seeing small negative estimates is a clue about rounding and finite samples: increase n, switch to float64, and confirm that the estimate hugs zero from above.
2.5. Visualization: How the sample mean stabilizes
We now visualize the running mean \(\bar X_n\) as n grows. Expect noisy early behavior and gradual stabilization around the true mean \(\mu=0\). The x‑axis is logarithmic to reveal early‑n dynamics. See [fig-gen-ch01-lln] for the exact script.
The following figure is about the running mean under the Law of Large Numbers; it shows in detail:
-
Early samples dominate: for very small n, individual draws swing \(\bar X_n\) widely; the log x‑axis makes these fluctuations visible.
-
Averaging tames noise: as n increases, the line narrows around \(\mu=0\), reflecting the \(1/\sqrt{n}\) scaling of typical error for light‑tailed data.
-
Randomness persists: the running mean never becomes exactly flat; instead, it wanders within a shrinking band. A different seed yields a different path with the same overall behavior.
2.6. Common Pitfalls & Stability
-
Seed management: create and pass a local RNG (
np.random.default_rng
) instead of relying on global state; this makes behavior reproducible across functions. -
Dtypes: prefer float64 for teaching clarity; use float32 only when memory/speed is binding and you can tolerate extra numeric noise.
-
Shapes and vectorization: sketch array shapes before coding; most numeric experiments become clearer and faster without Python loops.
-
Heavy tails: Cauchy‑like data violate the assumptions behind LLN as commonly stated — sample means wander; make this visible deliberately.
2.7. Exercises
-
LLN across distributions (builds on
lln_demo(…)
from “From Math → Code”): runlln_demo(normal)
as shown, then repeat with a Bernoulli generator{'mean': p, 'gen': lambda rng, size: (rng.uniform(size=size) < p).astype(float)}
for \(p \in \{0.2, 0.5, 0.8\}\) andn_values=(100, 1_000, 10_000)
. Report \(\big|\,\bar X_n - \mu\,\big|\) per n in a compact table; comment on the rough \(1/\sqrt{n}\) trend. -
Heavy tails vs LLN (reuses
lln_demo(…)
): use a Cauchy generatorrng.standard_cauchy(size=…)
withmean
set tonp.nan
(undefined) and runn_values=(102, 103, 10**4)
. Print the sample mean and the sample median for each \(n\); note that means wander while medians stabilize. Briefly explain why the usual LLN statement does not apply when \(\mathbb{E}[X\)] does not exist. -
Variance non‑negativity under dtypes (builds on
variance_nonneg_demo(…)
): modify the helper to accept a dtype and run 50 trials each forfloat32
andfloat64
with \(n=20{,}000\). Count how many times the estimate fell below \(-10^{-10}\). Summarize counts in a two‑row table; explain the role of rounding and why increasing \(n\) reduces the issue. -
Reproducibility checklist (independent of prior code): write a 6–8 line snippet that fixes a seed, asserts finite outputs, and prints labeled values with units for any mini‑experiment you run later. Use it to wrap one of the demos above and show the labeled outputs.
2.8. Where We’re Heading Next
We’ll formalize numeric error (machine epsilon, conditioning vs stability) and adopt a checklist of sanity checks and performance baselines that we’ll reuse throughout the book.