9-element Vector{String}:
"H"
"H"
"H"
"T"
"H"
"H"
"H"
"H"
"H"
Lecture 08
February 19, 2024
\[\underbrace{h_t}_{\substack{\text{hare} \\ \text{pelts}}} \sim \text{LogNormal}(\log(\underbrace{p_H}_{\substack{\text{trap} \\ \text{rate}}} H_T), \sigma_H)\] \[l_t \sim \text{LogNormal}(\log(p_L L_T), \sigma_L)\]
\[ \begin{align*} \frac{dH}{dt} &= H_t b_H - H_t (L_t m_H) \\ H_T &= H_1 + \int_1^T \frac{dH}{dt}dt \end{align*} \]
\[ \begin{align*} \frac{dL}{dt} &= L_t (H_t b_L) - L_t m_L \\ L_T &= L_1 + \int_1^T \frac{dL}{dt}dt \end{align*} \]
So far: no way to use prior information about parameters (other than bounds on MLE optimization).
For example: what “trap rates” are more plausible?
Original version (Bayes, 1763):
\[P(A | B) = \frac{P(B | A) \times P(A)}{P(B)} \quad \text{if} \quad P(B) \neq 0.\]
“Modern” version (Laplace, 1774):
\[\underbrace{{p(\theta | y)}}_{\text{posterior}} = \frac{\overbrace{p(y | \theta)}^{\text{likelihood}}}{\underbrace{p(y)}_\text{normalization}} \overbrace{p(\theta)}^\text{prior}\]
The version of Bayes’ rule which matters the most for 95% (approximate) of Bayesian statistics:
\[p(\theta | y) \propto p(y | \theta) \times p(\theta)\]
“The posterior is the prior times the likelihood…”
Bayesian credible intervals are straightforward to interpret: \(\theta\) is in \(I\) with probability \(\alpha\).
Choose \(I\) such that \[p(\theta \in I | \mathbf{y}) = \alpha.\]
A fully specified Bayesian model includes:
Think: Prior provides proposed explanations, likelihood re-weights based on ability to produce the data.
Bayesian models lend themselves towards generative simulation by generating new data \(\tilde{y}\) through the posterior predictive distribution:
\[p(\tilde{y} | \mathbf{y}) = \int_{\Theta} p(\tilde{y} | \theta) p(\theta | \mathbf{y}) d\theta\]
One perspective: Priors should reflect “actual knowledge” independent of the analysis (Jaynes, 2003)
Another: Priors are part of the probability model, and can be specified/changed accordingly based on predictive skill (Gelman et al., 2017; Gelman & Shalizi, 2013)
We would like to understand if a coin-flipping game is fair. We’ve observed the following sequence of flips:
The data-generating process here is straightforward: we can represent a coin flip with a heads-probability of \(\theta\) as a sample from a Bernoulli distribution,
\[y_i \sim \text{Bernoulli}(\theta).\]
Suppose that we spoke to a friend who knows something about coins, and she tells us that it is extremely difficult to make a passable weighted coin which comes up heads more than 75% of the time.
Since \(\theta\) is bounded between 0 and 1, we’ll use a Beta distribution for our prior, specifically \(\text{Beta}(5,5)\).
Combining using Bayes’ rule lets us calculate the maximum a posteriori (MAP) estimate:
θ_range = 0:0.01:1
plot(θ_range, flip_lposterior.(θ_range), color=:black, label="Posterior", linewidth=3)
plot!(θ_range, flip_ll.(θ_range), color=:black, label="Likelihood", linewidth=3, linestyle=:dash)
plot!(θ_range, flip_lprior.(θ_range), color=:black, label="Prior", linewidth=3, linestyle=:dot)
vline!([θ_map], color=:red, label="MAP", linewidth=2)
vline!([θ_mle], color=:blue, label="MLE", linewidth=2)
xlabel!(L"$\theta$")
ylabel!("Log-Density")
plot!(size=(1000, 450))Frequentist: Parametric uncertainty is purely the result of sampling variability
Bayesian: Parameters have probabilities based on consistency with data and priors.
Think: how “likely” is a set of parameters to have produced the data given the specified data generating process?
Next Week: Sampling! Specifically, Monte Carlo.
Homework 2 due Friday (2/21).
Quiz: Due Monday (all on today’s lecture).
Project: Will discuss Monday, start thinking about possible topics.