Bayesian Workflow Example


Lecture 09

February 24, 2024

Review

Bayes’ Rule

\[ \underbrace{{p(\theta | y)}}_{\text{posterior}} = \frac{\overbrace{p(y | \theta)}^{\text{likelihood}}}{\underbrace{p(y)}_\text{normalization}} \overbrace{p(\theta)}^\text{prior} \]

Bayesian Model Components

A fully specified Bayesian model includes:

  1. Prior distributions over the parameters, \(p(\theta)\)
  2. Probability model for the data given the parameters (the likelihood), \(p(y | \theta)\)t

Think: Prior provides proposed explanations, likelihood re-weights based on ability to produce the data.

Bayes and Parametric Uncertainty

Frequentist: Parametric uncertainty is purely the result of sampling variability

Bayesian: Parameters have probabilities based on consistency with data and priors.

Think: how “likely” is a set of parameters to have produced the data given the specified data generating process?

Bayesian Updating

  • The posterior is a “compromise” between the prior and the data.
  • The posterior mean is a weighted combination of the data and the prior mean.
  • The weights depend on the prior and the likelihood variances.
  • More data usually makes the posterior more confident.

Bayesian Example: Local Sea Level Extremes

San Francisco Tide Gauge Data

Code
# read in data and get annual maxima
function load_data(fname)
    date_format = DateFormat("yyyy-mm-dd HH:MM:SS")
    # This uses the DataFramesMeta.jl package, which makes it easy to string together commands to load and process data
    df = @chain fname begin
        CSV.read(DataFrame; header=false)
        rename("Column1" => "year", "Column2" => "month", "Column3" => "day", "Column4" => "hour", "Column5" => "gauge")
        # need to reformat the decimal date in the data file
        @transform :datetime = DateTime.(:year, :month, :day, :hour)
        # replace -99999 with missing
        @transform :gauge = ifelse.(abs.(:gauge) .>= 9999, missing, :gauge)
        select(:datetime, :gauge)
    end
    return df
end

dat = load_data("data/surge/h551.csv")

# detrend the data to remove the effects of sea-level rise and seasonal dynamics
ma_length = 366
ma_offset = Int(floor(ma_length/2))
moving_average(series,n) = [mean(@view series[i-n:i+n]) for i in n+1:length(series)-n]
dat_ma = DataFrame(datetime=dat.datetime[ma_offset+1:end-ma_offset], residual=dat.gauge[ma_offset+1:end-ma_offset] .- moving_average(dat.gauge, ma_offset))

# group data by year and compute the annual maxima
dat_ma = dropmissing(dat_ma) # drop missing data
dat_annmax = combine(dat_ma -> dat_ma[argmax(dat_ma.residual), :], groupby(DataFrames.transform(dat_ma, :datetime => x->year.(x)), :datetime_function))
delete!(dat_annmax, nrow(dat_annmax)) # delete 2023; haven't seen much of that year yet
rename!(dat_annmax, :datetime_function => :Year)
select!(dat_annmax, [:Year, :residual])
dat_annmax.residual = dat_annmax.residual / 1000 # convert to m

# make plots
p1 = plot(
    dat_annmax.Year,
    dat_annmax.residual;
    xlabel="Year",
    ylabel="Annual Max Tide Level (m)",
    label=false,
    marker=:circle,
    markersize=5,
    tickfontsize=16,
    guidefontsize=18
)
p2 = histogram(
    dat_annmax.residual,
    normalize=:pdf,
    orientation=:horizontal,
    label=:false,
    xlabel="PDF",
    ylabel="",
    yticks=[],
    tickfontsize=16,
    guidefontsize=18
)

l = @layout [a{0.7w} b{0.3w}]
plot(p1, p2; layout=l, link=:y, ylims=(1, 1.7), bottom_margin=5mm, left_margin=5mm)
plot!(size=(1000, 450))
Figure 1: Annual maxima surge data from the San Francisco, CA tide gauge.

Proposed Probability Model

\[ \begin{align*} & y \sim LogNormal(\mu, \sigma) \tag{likelihood}\\ & \left. \begin{aligned} & \mu \sim Normal(0, 1) \\ & \sigma \sim HalfNormal(0, 5) \end{aligned} \right\} \tag{priors} \end{align*} \]

Want to find:

\[p(\mu, \sigma | y) \propto p(y | \mu, \sigma) p(\mu)p(\sigma)\]

Are Our Priors Reasonable?

Key idea: what do the priors imply for observable variables?

Let’s simulate data from the prior predictive distribution to see we get plausible outcomes.

\[y \sim p(\tilde{y}) = \int_{\Theta} p(\tilde{y} | \theta) p(\theta) d\theta\]

Prior Predictive Check

Code
# sample from priors
μ_sample = rand(Normal(0, 1), 1_000)
σ_sample = rand(truncated(Normal(0, 5), 0, +Inf), 1_000)

# define return periods and cmopute return levels for parameters
return_periods = 2:100
return_levels = zeros(1_000, length(return_periods))
for i in 1:1_000
    return_levels[i, :] = quantile.(LogNormal(μ_sample[i], σ_sample[i]), 1 .- (1 ./ return_periods))
end

plt_prior_1 = plot(; yscale=:log10, yticks=10 .^ collect(0:2:16), ylabel="Return Level (m)", xlabel="Return Period (yrs)",
    tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=10mm, left_margin=10mm, legend=:topleft)
for idx in 1:1_000
    label = idx == 1 ? "Prior" : false
    plot!(plt_prior_1, return_periods, return_levels[idx, :]; color=:black, alpha=0.1, label=label)
end
plt_prior_1
Figure 2: Prior predictive check of return periods with revised model

Let’s Revise the Prior

\[ \begin{align*} & y \sim LogNormal(\mu, \sigma) \tag{likelihood}\\ & \left. \begin{aligned} & \mu \sim Normal(0, 0.5) \\ & \sigma \sim HalfNormal(0, 0.1) \end{aligned} \right\} \tag{priors} \end{align*} \]

Prior Predictive Check 2

Code
# sample from priors
μ_sample = rand(Normal(0, 0.5), 1_000)
σ_sample = rand(truncated(Normal(0, 0.1), 0, +Inf), 1_000)

return_periods = 2:100
return_levels = zeros(1_000, length(return_periods))
for i in 1:1_000
    return_levels[i, :] = quantile.(LogNormal(μ_sample[i], σ_sample[i]), 1 .- (1 ./ return_periods))
end

plt_prior_2 = plot(; ylabel="Return Level (m)", xlabel="Return Period (yrs)", tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=10mm, left_margin=10mm)
for idx in 1:1_000
    label = idx == 1 ? "Prior" : false
    plot!(plt_prior_2, return_periods, return_levels[idx, :]; color=:black, alpha=0.1, label=label)
end
plt_prior_2
Figure 3: Prior predictive check of return periods with revised model

Compute Posterior

Code
ll(μ, σ) = sum(logpdf(LogNormal(μ, σ), dat_annmax.residual))
lprior1(μ, σ) = logpdf(Normal(0, 1), μ) + logpdf(truncated(Normal(0, 5), 0, Inf), σ)
lprior2(μ, σ) = logpdf(Normal(0, 0.5), μ) + logpdf(truncated(Normal(0, 0.1), 0, Inf), σ)
lposterior1(μ, σ) = ll(μ, σ) + lprior1(μ, σ)
lposterior2(μ, σ) = ll(μ, σ) + lprior2(μ, σ)

p_map1 = optimize(p -> -lposterior1(p[1], p[2]), [0.0, 0.0], [1.0, 1.0], [0.5, 0.5]).minimizer
p_map2 = optimize(p -> -lposterior2(p[1], p[2]), [0.0, 0.0], [1.0, 1.0], [0.5, 0.5]).minimizer

μ = 0.15:0.005:0.35
σ = 0.04:0.01:0.1
posterior1_vals = @. lposterior1', σ)
posterior2_vals = @. lposterior2', σ)

p_post1 = contour(μ, σ, posterior1_vals, 
    levels=100, 
    clabels=false, 
    cbar=false, lw=1, 
    fill=(true,cgrad(:grays,[0,0.1,1.0])),
    title = "Diffuse Prior"
)
scatter!(p_post1, [p_map1[1]], [p_map1[2]], label="MLE", markersize=10, marker=:star)
xlabel!(p_post1, L"$\mu$")
ylabel!(p_post1, L"$\sigma$")
plot!(p_post1, size=(600, 500))

p_post2 = contour(μ, σ, posterior2_vals, 
    levels=100, 
    clabels=false, 
    cbar=false, lw=1, 
    fill=(true,cgrad(:grays,[0,0.1,1.0])),
    title = "More Informed Priors"
)
scatter!(p_post2, [p_map2[1]], [p_map2[2]], label="MAP", markersize=10, marker=:star)
xlabel!(p_post2, L"$\mu$")
ylabel!(p_post2, L"$\sigma$")
plot!(p_post2, size=(600, 500))

display(p_post1)
display(p_post2)
(a) Posterior samples from surge model.
(b)
Figure 4
p_map1 = [0.25470601822948175, 0.05548961712923218]
p_map2 = [0.2546874075933288, 0.05542213686160764]

Assess MAP Fit

Code
p1 = histogram(
    dat_annmax.residual,
    normalize=:pdf,
    legend=:false,
    ylabel="PDF",
    xlabel="Annual Max Tide Level (m)",
    tickfontsize=16,
    guidefontsize=18,
    bottom_margin=5mm, left_margin=5mm
)
plot!(p1, LogNormal(p_map2[1], p_map2[2]),
    linewidth=3,
    color=:red)
xlims!(p1, (1, 1.7))
plot!(p1, size=(600, 450))

return_periods = 2:500
return_levels = quantile.(LogNormal(p_map2[1], p_map2[2]), 1 .- (1 ./ return_periods))

# function to calculate exceedance probability and plot positions based on data quantile
function exceedance_plot_pos(y)
    N = length(y)
    ys = sort(y; rev=false) # sorted values of y
    nxp = xp = [r / (N + 1) for r in 1:N] # exceedance probability
    xp = 1 .- nxp
    return xp, ys
end
xp, ys = exceedance_plot_pos(dat_annmax.residual)

p2 = plot(return_periods, return_levels, linewidth=3, color=:blue, label="Model Fit", tickfontsize=16, legendfontsize=18, guidefontsize=18, bottom_margin=5mm, left_margin=5mm, right_margin=10mm, legend=:bottomright)
scatter!(p2, 1 ./ xp, ys, label="Observations", color=:black, markersize=5)
xlabel!(p2, "Return Period (yrs)")
ylabel!(p2, "Return Level (m)")
xlims!(-1, 300)
plot!(p2, size=(600, 450))

display(p1)
display(p2)
(a) Checks for model fit.
(b)
Figure 5

What About The Posterior Distribution?

One of the points of Bayesian statistics is we get a distribution over parameters.

Sampling from this distribution is often more involved.

Exception: Conjugate Priors

When the mathematical forms of the likelihood and the prior(s) are conjugate, the posterior is a nice closed-form distribution.

Examples:

  • Normal \(p(y | \mu)\), Normal \(p(\mu)\) ⇒ Normal \(p(\mu | y)\)
  • Binomial \(p(y | \theta)\), Beta \(p(\theta)\), ⇒ Beta \(p(\theta | y)\)

Sampling using conjugate priors is called Gibbs sampling.

When Does The Prior Matter?

In general, priors matter more for:

  • Less data (likelihood less informative);
  • More complex models (more degrees of freedom).

Always justify and test your priors. Explicitly compare the prior to the posterior to see whether your inferences are driven by the prior or by the data (probability model).

Probabilistic Programming Languages

Overview of PPLs

  • Speciality “languages” for specifying probability models.
  • Rely on automatic differentiation to compile likelihood/posterior functions.
  • Many frameworks developing over the last few years:
    • Python: pyMC3
    • Julia: Turing.jl
    • Cross-platform: Stan

Specifying Extreme Example with Turing.jl

using Turing
## y: observed data
## can also specify covariates or auxiliary data in the function if used
@model function tide_model(y)
    # specify priors
    μ ~ Normal(0, 0.5)
    σ ~ truncated(Normal(0, 0.1), 0, Inf)
    # specify likelihood
    y ~ LogNormal(μ, σ)
    # returning y allows us (later) to generate predictive simulations
    return y 
end

Finding MLE and MAP

m = tide_model(dat_annmax.residual)
θ_mle = maximum_likelihood(m)
θ_map = maximum_a_posteriori(m)

@show θ_mle;
@show θ_map;
@show p_map2;
θ_mle = [0.25471224255244257, 0.055489643349750276]
θ_map = [0.2546874075936701, 0.05542213631132044]
p_map2 = [0.2546874075933288, 0.05542213686160764]

More PPL Tips

  • Parameterization can matter (more when we talk about simulation from posterior than MLE/MAP); read documentation and tips and don’t feel shy about checking Reddit/forums
  • Sometimes can use external models: easy in Turing, more difficult in Stan and not sure of current status in pyMC3.
  • Packages rely on a lot of dependencies which may not be trivial to install.

More PPL Tips

When are PPLs useful?

  1. Readable model code;
  2. Complex models (hierarchical models, external models with infeasible parameters, etc)
  3. “Full Bayes” (haven’t discussed yet, but generating samples from the posterior distribution).

Key Points and Upcoming Schedule

Key Points: Bayesian Workflow

  • Use prior predictive simulations to refine priors.
  • Priors matter less when likelihood is highly informative.
  • Can use PPLs to specify models without formally writing out likelihoods.

Upcoming Schedule

Next Classes

Wednesday: Random variate generation and sampling from distributions.

Next Week: Monte Carlo and the Bootstrap.

Term Project

  • Can work in groups of 1-2
  • Proposal due 3/21.
  • Max three pages, should include background and problem statement, data overview, proposed probability models, and research plan.
  • Deliverables include presentations (in class, last 2-3 sessions) and written report (during finals week).

References

References (Scroll for Full List)