Simulating Random Variables


Lecture 10

February 26, 2024

Review

Bayesian Statistics

  • Focused on conditional probability (conditioned on the data)
  • Bayes’ Theorem to update priors using likelihood.
  • Priors can be finicky: use predictive simulations.
  • Outstanding question: how do we sample from the posterior (versus just taking the MAP?)

Random Variable Simulation

Why Simulate?

  • We want to see implications of a probability model.
  • We want to test statistical procedures (synthetic data simulation).
  • Easier than computing integrals (Monte Carlo).
  • Computational efficiency (e.g. stochastic gradient descent).

Generally: Turns calculus/analytical problems into data summary problems.

Example: Posterior Sampling

\(p(\theta)\) and \(p(y | \theta)\) are often “nice” distributions, but nothing says \[p(\theta | y) \propto p(\theta) \times p(y | \theta)\]

has to be!

Code
normal_samples = rand(Normal(2, 0.5), 10)

lik(σ) = prod(pdf(Normal.(2, σ), normal_samples))
prior(σ) = pdf(LogNormal(log(0.25), 0.25), σ)
posterior(σ) = lik(σ) * prior(σ)

σ_range = 0.1:0.01:1
plot(σ_range, posterior.(σ_range) / maximum(posterior.(σ_range)), color=:black, label="Posterior", linewidth=3)
plot!(σ_range, lik.(σ_range) / maximum(lik.(σ_range)), color=:blue, label="Likelihood", linewidth=3, linestyle=:dash)
plot!(σ_range, prior.(σ_range) / maximum(prior.(σ_range)), color=:orange, label="Prior", linewidth=3, linestyle=:dot)
xlabel!(L"$\sigma$")
ylabel!("Scaled Log-Density")
plot!(size=(550, 500))
Figure 1: Comparison of prior, posterior, and likelihood.

Built-In Sampling Functions

R: sample, rnorm, rbinom, etc.

Julia: rand, Distributions.rand

Python: numpy.random.rand, scipy.stats.xx.rand

What Are These Functions Doing?

Think of a biased coin with probability of heads \(\theta\).

Want to obtain a Bernoulli random variable.

What can we do without using a built-in Bernoulli function?

Coin Flip Simulation

Given heads probability \(\theta\):

  1. Draw \(u \sim Uniform(0, 1)\).
  2. If \(u < \theta\), return H.
  3. Else return T.

What About Discrete Variables?

How can we generalize this strategy for discrete distributions with category probabilities \(\theta_1, \theta_2, \ldots, \theta_n\)?

Discrete Variable Simulation

Given category probabilities \(\theta_1, \theta_2, \ldots, \theta_n\):

  1. Draw \(u \sim Uniform(0, 1)\).
  2. If \(u < \theta_1\), return 1.
  3. If \(u < \theta_1 + \theta_2\), return 2,
  4. \(\vdots\)
  5. Else return \(n\).

Generalization: Quantile Transform Method

Given \(U \sim Uniform(0, 1)\), target CDF \(F\), \(X = F^{-1}(U)\) has CDF \(F\).

Why?

\[\mathbb{P}(X \leq a) = \mathbb{P}(F^{-1}(U) \leq a) = \mathbb{P}(U \leq F(a)) = F(a)\]

In other words, if we can generate uniform variables and calculate quantiles, can generate non-uniform variables.

Problem Solved…Right?

  • Often don’t have quantile functions in closed form.
  • They also often don’t have nice numerical solutions.

Using the PDF

Suppose the PDF \(f\) has support on \([a, b]\) and \(f(x) \leq M\).

What could we do to sample \(X \sim f(\cdot)\)?

Figure 2: Beta Distribution

Rejection Sampling Algorithm

Given pdf \(f\) for \(X\), upper bound \(M\) on \(f\) in \([a, b]\):

  1. Simulate uniformly: \(y \sim [a, b]\), \(u \sim [0, 1]\).
  2. If \(Mu < f(y)\), keep \(y\) as a sample of \(X\).
  3. Otherwise reject.

Rejection Sampling Visualization

Figure 3: Example of rejection sampling for a Beta distribution

Rejection Sampling Efficiency

Important: Only kept 27% of proposals.

(a) Results from uniform rejection sampling
(b)
Figure 4

More General Rejection Sampling

Use a proposal density \(g(\cdot)\) which “covers” target \(f(\cdot)\) and is easy to sample from.

Sample from \(g\), reject based on \(f\).

Figure 5: Using a t distribution as proposal density for normal.

General Rejection Sampling Algorithm

Suppose \(f(x) \leq M g(x)\) for some \(1 < M < \infty\).

  1. Simulate a proposal \(y \sim g(x)\) (e.g. by quantile method).
  2. Simulate \(u \sim \text{Unif}(0, 1)\)
  3. If \[u < \frac{f(y)}{M g(y)},\] accept \(y\); otherwise reject.

Rejection Sampling Example

Figure 6: Example of rejection sampling for a Normal distribution

This time we kept 35% of the proposals.

Rejection Sampling Efficiency

  1. Probability of accepting a sample is \(1/M\), so the “tighter” the proposal distribution coverage the more efficient the sampler.
  2. Need to be able to compute \(M\) and sample from the proposal.

Finding a good proposal and computing \(M\) may not be easy (or possible) for complex distributions!

Bimodal Rejection Sampling Example

Code
# specify target distribution
mixmod = MixtureModel(Normal, [(-1, 0.75), (1, 0.4)], [0.5, 0.5])
x = -5:0.01:5
p1 = plot(x, pdf.(mixmod, x), lw=3, color=:red, xlabel=L"$x$", ylabel="Density", label="Target")
plot!(p1, x, 2.5 * pdf.(Normal(0, 1.5), x), lw=3, color=:blue, label="Proposal (M=2.5)")
plot!(p1, size=(550, 500))

nsamp = 10_000
M = 2.5
u = rand(Uniform(0, 1), nsamp)
y = rand(Normal(0, 1.5), nsamp)
g = pdf.(Normal(0, 1.5), y)
f = pdf.(mixmod, y)
keep_samp = u .< f ./ (M * g)
p2 = histogram(y[keep_samp], normalize=:pdf, xlabel=L"$x$", ylabel="Density", label="Kept Samples", legend=:topleft)
plot!(p2, x, pdf.(mixmod, x), linewidth=3, color=:black, label="True Target")
density!(y[keep_samp], label="Sampled Density", color=:red)
plot!(p2, size=(550, 500))

display(p1)
display(p2)
(a) rejection sampling for a mixture model
(b)
Figure 7

Random Number Generators

Where Do “Random” Samples Come From?

There’s no such thing as a truly random number generator!

Need to generate samples in a deterministic way that have random-like properties.

XKCD Cartoon 221

Source: XKCD 221

Pseudorandom Number Generators

We want:

  • Number of \(U_i \in [a, b] \subset [0, 1]\) is \(\propto b-a\)
  • No correlation between successive \(U_i\).
  • No detectable dependencies in longer series.

There are several of these implemented in modern programming languages: typical default is the Mersenne Twister.

Example: Rotations

\[U_{t+1} = U_t + \alpha \mod 1\]

  • If \(\alpha \neq k/n\) is irrational, this never repeats (no \(m\) such that \(m\alpha = 1\).)
  • If \(\alpha = k/n\) is rational, this repeats, but with a long period for large \(n\).

(P)RNGs and Chaos

We can get similar dynamics from area-preserving chaotic dynamical systems:

  • Long periods with dense orbits (well-mixing);
  • Area-perserving (uniformly distributed);
  • Rapid divergence of nearby points (sensitivity to initial conditions);

Example: Arnold Cat Map

\[ \begin{align*} \phi_{t+1} &= 2U_t + \phi_t \mod 1 \\ U_{t+1} &= \phi_t + U_t \mod 1 \end{align*} \]

Report only \((U_t)\): get hard to predict uniformly distributed data.

Arnold Cat Map

Source: Wikipedia

Cat Map vs Mersenne Twister

Code
function cat_map(N)
    U = zeros(N)
    ϕ = zeros(N)
    U[1], ϕ[1] = rand(Uniform(0, 1), 2)
    for i = 2:N
        ϕ[i] = rem(U[i-1] + 2 * ϕ[i-1], 1.0)
        U[i] = rem(U[i-1] + ϕ[i-1], 1.0)
    end
    return U
end

N = 1_000
Z = cat_map(N)
U = rand(MersenneTwister(1), N) # Mersenne Twister is not the default in Julia, but is in other languages, so using it here by explicitly setting it as the generator

p1 = scatter(Z[1:end-1], Z[2:end], xlabel=L"$U_t$", ylabel=L"$U_{t+1}$", title="Cat Map", label=false, size=(500, 500))
p2 = scatter(U[1:end-1], U[2:end], xlabel=L"$U_t$", ylabel=L"$U_{t+1}$", title="Mersenne Twister", label=false, size=(500, 500))

display(p1)
display(p2)
(a) Comparison of the Arnold cat map and output from the Mersenne Twister
(b)
Figure 8

Random Seeds

Pseudo-random numbers are deterministic, so can be repeated. The sequence depends on a seed.

  • If you don’t set a seed explicitly, the computer will choose one (usually based on the date/time stamp when you execute the script or program).
  • Setting a seed resets the sequence.

Set seeds to ensure reproducibility.

But…Be Careful With Seeds

  • Possible for a statistical procedure to appear to work with just one seed; test across several.
  • Seeds may match, but RNG algorithm may be different across different languages/versions.

Key Points and Upcoming Schedule

Key Points: Bayesian Statistics

  • Use prior predictive simulations to refine priors.
  • Priors matter less when likelihood is highly informative.

Key Points: Random Numbers

  • Can generate uniform distributions using pseudorandom number generators (unstable dynamical systems).
  • Transform into other distributions using:
    • Quantile method;
    • Rejection method.
  • Default functions work well; only really have to worry about any of this when sampling from “non-nice” distributions

Upcoming Schedule

Next Classes

Next Week: Monte Carlo Simulation and the Bootstrap

Term Project

  • Can work in groups of 1-2
  • Proposal due 3/21.
  • Max three pages, should include background and problem statement, data overview, proposed probability models, and research plan.
  • Deliverables include presentations (in class, last 2-3 sessions) and written report (during finals week).

References

References (Scroll for Full List)