Python & Mathematics for Data Science and Machine Learning

1. Preface

This sample provides a concise preview of the book’s learning experience: a short Preface outlining the approach and objectives, followed by the complete first chapter as a concrete example. The full book blends mathematics, Python, and modern data science practice into an executable learning journey.

1.1. Approach

Emphasize intuition-first mathematics with clear geometric and probabilistic perspectives.
Translate concepts into code early to build a strong mental model of computation.
Use small, focused examples and figures to make abstract ideas tangible.

1.2. Objectives

Provide the mathematical foundations most relevant to data science and machine learning.
Build practical skills in Python and its scientific stack for modeling and analysis.
Foster reproducible, executable workflows that scale from teaching to real projects.

1.3. Content Overview

Python essentials; numerical thinking and error analysis.
Vectors, norms, linear maps; eigenvalues, SVD, and low-rank structure.
Differentiation, chain rule, and optimization landscapes.
Probability, statistical learning basics, and linear models end-to-end.
Stochastic optimization, spectral methods, autodiff, and modern ML motifs.

The example chapter below demonstrates the book’s narrative style and how code and math reinforce each other.

2. Why Math With Code (and Code With Math)

We learn math faster, deeper, and more honestly when every idea is paired with a small experiment that can try to break it.

You’ll Need

Python 3.10+
NumPy (for arrays, RNG)
Matplotlib optional (for quick visuals)

At a Glance

Why couple every concept to a runnable check
Reproducibility: seeds, assertions, and invariants
Vectorization over loops; shapes as first‑class citizens

You’ll Learn

Set up tiny numerical experiments to verify a claim
Use seeds and sanity checks for reproducibility
Translate a statement into code and back into math

2.1. Concept & Intuition

We will use a simple “Math ↔ Code” loop: state a claim, test it with a minimal experiment, and then adjust our intuition based on what actually happens.

This keeps us honest about edge cases, finite‑sample quirks, and numerical precision.

Two habits power this loop:

Make statements falsifiable. If you say “the sample mean approaches the true mean,” decide how you’ll measure “approaches” and what deviations are acceptable for a given sample size.
Prefer tiny, vectorized experiments. Short code makes it easy to read output and spot when reality diverges from expectation.

2.2. Formalism

Setup (i.i.d. data). Let \(X_1, X_2, \dots\) be independent and identically distributed with finite expectation \(\mathbb{E}[X\) = \mu]. The sample mean of the first \(n\) observations is \(\bar X_n = \tfrac{1}{n}\sum_{i=1}^n X_i\).

Convergence in probability. A sequence \(Y_n\) converges in probability to \(Y\) if for every \(\varepsilon > 0\), the probability of deviating by more than \(\varepsilon\) vanishes: \(\Pr(|Y_n - Y| > \varepsilon) \to 0\) as \(n \to \infty\).

Law of Large Numbers (LLN). Under the setup above, the sample mean converges in probability to the true expectation:

\[\bar X_n \xrightarrow[]{\mathbb{P}} \mu.\]

Convergence rate intuition. For light-tailed distributions the typical error of \(\bar X_n\) decays on the order of \(1/\sqrt{n}\). Heavy tails slow convergence dramatically and may even invalidate the LLN if \(\mathbb{E}[X\)] does not exist.

2.3. From Math → Code

We turn the LLN into a quick numerical check. Keep the code small and the outputs legible so patterns jump out at a glance.

In [1]: import numpy as np

In [2]: rs = np.random.default_rng(42)  # reproducible RNG

In [3]: def lln_demo(dist, n_values=(100, 1_000, 10_000)):
   ...:  # dist: small dict with 'mean' (float) and 'gen'(rng,size)->array
   ...:     """Check how the sample mean approaches the expectation."""
   ...:     mu = dist['mean']
   ...:     for n in n_values:
   ...:         x = dist['gen'](rs, size=n)  # draw n samples
   ...:         m = x.mean()  # compute sample mean
   ...:         err = abs(m - mu)  # distance to expectation
   ...:         print(f"n={n:>6d}  mean={m:+.6f}  |mean-mu|={err:.6f}")
   ...:         assert np.isfinite(m)  # finite output sanity

In [4]: normal = {
   ...:     'mean': 0.0,  # E[X] = 0 for standard normal
   ...:     'gen': lambda rng, size: rng.standard_normal(size=size).astype(np.float64),
   ...: }

In [5]: lln_demo(normal)
n=   100  mean=+0.023189  |mean-mu|=0.023189
n=  1000  mean=-0.004349  |mean-mu|=0.004349
n= 10000  mean=-0.002052  |mean-mu|=0.002052

What to observe:

Errors typically decrease as n grows (not strictly monotone—randomness persists).
Using float64 keeps rounding noise low; float32 makes small differences noisier.
Heavy‑tailed distributions may need much larger n for the same stability.
If you rerun with a different seed, the overall pattern persists, but individual errors wobble — randomness never fully disappears.

Sanity Box

Always fix a seed for demos.
Assert invariants (finite results, expected shapes, non‑negativity where applicable).
Print units or scales; numbers without context mislead.

2.4. From Code → Math

Let’s empirically check that variance is non‑negative via the identity

\[\mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 \ge 0.\]

In [6]: def variance_nonneg_demo(rng, n=50_000):
   ...:     x = rng.standard_normal(size=n).astype(np.float64)  # samples
   ...:     ex = x.mean()  # E[X]
   ...:     ex2 = (x * x).mean()  # E[X^2]
   ...:     var_est = ex2 - ex * ex  # Var~
   ...:     print(f"E[X]={ex:+.4f}, E[X^2]={ex2:+.4f}, Var~={var_est:+.6f}")
   ...:     assert var_est > -1e-12  # allow tiny negatives from rounding

In [7]: variance_nonneg_demo(rs)
E[X]=+0.0004, E[X^2]=+0.9808, Var~=+0.9807

We didn’t “prove” the identity; we turned it into a falsifiable check. Seeing small negative estimates is a clue about rounding and finite samples: increase n, switch to float64, and confirm that the estimate hugs zero from above.

2.5. Visualization: How the sample mean stabilizes

We now visualize the running mean \(\bar X_n\) as n grows. Expect noisy early behavior and gradual stabilization around the true mean \(\mu=0\). The x‑axis is logarithmic to reveal early‑n dynamics. See [fig-gen-ch01-lln] for the exact script.

The following figure is about the running mean under the Law of Large Numbers; it shows in detail:

Early samples dominate: for very small n, individual draws swing \(\bar X_n\) widely; the log x‑axis makes these fluctuations visible.
Averaging tames noise: as n increases, the line narrows around \(\mu=0\), reflecting the \(1/\sqrt{n}\) scaling of typical error for light‑tailed data.
Randomness persists: the running mean never becomes exactly flat; instead, it wanders within a shrinking band. A different seed yields a different path with the same overall behavior.

Running mean against sample size on a log x-axis, with dashed zero line

Figure 1. LLN in action: running mean vs. sample size

2.6. Common Pitfalls & Stability

Seed management: create and pass a local RNG (np.random.default_rng) instead of relying on global state; this makes behavior reproducible across functions.
Dtypes: prefer float64 for teaching clarity; use float32 only when memory/speed is binding and you can tolerate extra numeric noise.
Shapes and vectorization: sketch array shapes before coding; most numeric experiments become clearer and faster without Python loops.
Heavy tails: Cauchy‑like data violate the assumptions behind LLN as commonly stated — sample means wander; make this visible deliberately.

2.7. Exercises

LLN across distributions (builds on lln_demo(…) from “From Math → Code”): run lln_demo(normal) as shown, then repeat with a Bernoulli generator {'mean': p, 'gen': lambda rng, size: (rng.uniform(size=size) < p).astype(float)} for \(p \in \{0.2, 0.5, 0.8\}\) and n_values=(100, 1_000, 10_000). Report \(\big|\,\bar X_n - \mu\,\big|\) per n in a compact table; comment on the rough \(1/\sqrt{n}\) trend.
Heavy tails vs LLN (reuses lln_demo(…)): use a Cauchy generator rng.standard_cauchy(size=…) with mean set to np.nan (undefined) and run n_values=(102, 103, 10**4). Print the sample mean and the sample median for each \(n\); note that means wander while medians stabilize. Briefly explain why the usual LLN statement does not apply when \(\mathbb{E}[X\)] does not exist.
Variance non‑negativity under dtypes (builds on variance_nonneg_demo(…)): modify the helper to accept a dtype and run 50 trials each for float32 and float64 with \(n=20{,}000\). Count how many times the estimate fell below \(-10^{-10}\). Summarize counts in a two‑row table; explain the role of rounding and why increasing \(n\) reduces the issue.
Reproducibility checklist (independent of prior code): write a 6–8 line snippet that fixes a seed, asserts finite outputs, and prints labeled values with units for any mini‑experiment you run later. Use it to wrap one of the demos above and show the labeled outputs.

2.8. Where We’re Heading Next

We’ll formalize numeric error (machine epsilon, conditioning vs stability) and adopt a checklist of sanity checks and performance baselines that we’ll reuse throughout the book.

Python & Mathematics for Data Science and Machine Learning — Sample