Prior distributions over the parameters, \(p(\theta)\)
Probability model for the data given the parameters (the likelihood), \(p(y | \theta)\)t
Think: Prior provides proposed explanations, likelihood re-weights based on ability to produce the data.
Bayes and Parametric Uncertainty
Frequentist: Parametric uncertainty is purely the result of sampling variability
Bayesian: Parameters have probabilities based on consistency with data and priors.
Think: how “likely” is a set of parameters to have produced the data given the specified data generating process?
Bayesian Updating
The posterior is a “compromise” between the prior and the data.
The posterior mean is a weighted combination of the data and the prior mean.
The weights depend on the prior and the likelihood variances.
More data usually makes the posterior more confident.
Bayesian Example: Local Sea Level Extremes
San Francisco Tide Gauge Data
Code
# read in data and get annual maximafunctionload_data(fname) date_format =DateFormat("yyyy-mm-dd HH:MM:SS")# This uses the DataFramesMeta.jl package, which makes it easy to string together commands to load and process data df =@chain fname begin CSV.read(DataFrame; header=false)rename("Column1"=>"year", "Column2"=>"month", "Column3"=>"day", "Column4"=>"hour", "Column5"=>"gauge")# need to reformat the decimal date in the data file@transform:datetime =DateTime.(:year, :month, :day, :hour)# replace -99999 with missing@transform:gauge =ifelse.(abs.(:gauge) .>=9999, missing, :gauge)select(:datetime, :gauge)endreturn dfenddat =load_data("data/surge/h551.csv")# detrend the data to remove the effects of sea-level rise and seasonal dynamicsma_length =366ma_offset =Int(floor(ma_length/2))moving_average(series,n) = [mean(@view series[i-n:i+n]) for i in n+1:length(series)-n]dat_ma =DataFrame(datetime=dat.datetime[ma_offset+1:end-ma_offset], residual=dat.gauge[ma_offset+1:end-ma_offset] .-moving_average(dat.gauge, ma_offset))# group data by year and compute the annual maximadat_ma =dropmissing(dat_ma) # drop missing datadat_annmax =combine(dat_ma -> dat_ma[argmax(dat_ma.residual), :], groupby(DataFrames.transform(dat_ma, :datetime =>x->year.(x)), :datetime_function))delete!(dat_annmax, nrow(dat_annmax)) # delete 2023; haven't seen much of that year yetrename!(dat_annmax, :datetime_function =>:Year)select!(dat_annmax, [:Year, :residual])dat_annmax.residual = dat_annmax.residual /1000# convert to m# make plotsp1 =plot( dat_annmax.Year, dat_annmax.residual; xlabel="Year", ylabel="Annual Max Tide Level (m)", label=false, marker=:circle, markersize=5, tickfontsize=16, guidefontsize=18)p2 =histogram( dat_annmax.residual, normalize=:pdf, orientation=:horizontal, label=:false, xlabel="PDF", ylabel="", yticks=[], tickfontsize=16, guidefontsize=18)l =@layout [a{0.7w} b{0.3w}]plot(p1, p2; layout=l, link=:y, ylims=(1, 1.7), bottom_margin=5mm, left_margin=5mm)plot!(size=(1000, 450))
Figure 1: Annual maxima surge data from the San Francisco, CA tide gauge.
Sampling using conjugate priors is called Gibbs sampling.
When Does The Prior Matter?
In general, priors matter more for:
Less data (likelihood less informative);
More complex models (more degrees of freedom).
Always justify and test your priors. Explicitly compare the prior to the posterior to see whether your inferences are driven by the prior or by the data (probability model).
Probabilistic Programming Languages
Overview of PPLs
Speciality “languages” for specifying probability models.
Rely on automatic differentiation to compile likelihood/posterior functions.
Many frameworks developing over the last few years:
Python: pyMC3
Julia: Turing.jl
Cross-platform: Stan
Specifying Extreme Example with Turing.jl
usingTuring## y: observed data## can also specify covariates or auxiliary data in the function if used@modelfunctiontide_model(y)# specify priors μ ~Normal(0, 0.5) σ ~truncated(Normal(0, 0.1), 0, Inf)# specify likelihood y ~LogNormal(μ, σ)# returning y allows us (later) to generate predictive simulationsreturn y end
Finding MLE and MAP
m =tide_model(dat_annmax.residual)θ_mle =maximum_likelihood(m)θ_map =maximum_a_posteriori(m)@show θ_mle;@show θ_map;@show p_map2;
Parameterization can matter (more when we talk about simulation from posterior than MLE/MAP); read documentation and tips and don’t feel shy about checking Reddit/forums
Sometimes can use external models: easy in Turing, more difficult in Stan and not sure of current status in pyMC3.
Packages rely on a lot of dependencies which may not be trivial to install.
More PPL Tips
When are PPLs useful?
Readable model code;
Complex models (hierarchical models, external models with infeasible parameters, etc)
“Full Bayes” (haven’t discussed yet, but generating samples from the posterior distribution).
Key Points and Upcoming Schedule
Key Points: Bayesian Workflow
Use prior predictive simulations to refine priors.
Priors matter less when likelihood is highly informative.
Can use PPLs to specify models without formally writing out likelihoods.
Upcoming Schedule
Next Classes
Wednesday: Random variate generation and sampling from distributions.
Next Week: Monte Carlo and the Bootstrap.
Term Project
Can work in groups of 1-2
Proposal due 3/21.
Max three pages, should include background and problem statement, data overview, proposed probability models, and research plan.
Deliverables include presentations (in class, last 2-3 sessions) and written report (during finals week).