Homework 3: Bayesian Models and Monte Carlo

BEE 4850/5850, Spring 2025

Due Date

Friday, 3/14/25, 9:00pm

To do this assignment in Julia, you can find a Jupyter notebook with an appropriate environment in the homework’s Github repository. Otherwise, you will be responsible for setting up an appropriate package environment in the language of your choosing. Make sure to include your name and NetID on your solution.

Overview

Instructions

The goal of this homework assignment is to practice developing and working with probability models for data.

Problem 1 asks you to quantify uncertainties for a simple model using Bayesian statistics and rejection sampling.
Problem 2 asks you to use Monte Carlo simulation to calculate annual expected flooding damages.
Problem 3 (only required for students in BEE 5850) asks you to conduct cost-benefit analyses of housing elevation levels under future flooding uncertainty using Monte Carlo simulations.

Load Environment

The following code loads the environment and makes sure all needed packages are installed. This should be at the start of most Julia scripts.

import Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate()

The following packages are included in the environment (to help you find other similar packages in other languages). The code below loads these packages for use in the subsequent notebook (the desired functionality for each package is commented next to the package).

using Random # random number generation and seed-setting
using DataFrames # tabular data structure
using CSV # reads/writes .csv files
using Distributions # interface to work with probability distributions
using Plots # plotting library
using StatsBase # statistical quantities like mean, median, etc
using StatsPlots # some additional statistical plotting tools
using Optim # optimization tools

Problems

Scoring

Problem 1 is worth 13 points.
Problem 2 is worth 7 points.
Problem 3 is worth 5 points.

Problem 1

Consider the class-cheating testing procedure from Homework 1. As a reminder, the procedure is as follows:

Each student flips a fair coin, with the results hidden from the interviewer. The student answers honestly if the coin comes up heads. Otherwise, if the coin comes up tails, the student flips the coin again, and answers “I did cheat” if heads, and “I did not cheat”, if tails.

Suppose that, after conducting the procedure on your class of 100 students, you get 40 “Yes, I cheated” responses. You are skeptical that your class is broadly engaged in cheating and have a prior belief that the probability of cheating $\theta$ is unlikely to be high. We’ll express this prior using a beta distribution, $(3, 40), which is shown below. You would like to understand the range of cheating propensity which is consistent with your prior beliefs and the outcomes from the detection procedure.

Figure 1: The Beta(3, 40) prior used in Problem 1.

In this problem:

Construct a Bayesian probability model (based on the data-generating process from Homework 1 and our prior on the cheating probability) for the “Yes, I cheated” counts. You can compute the likelihood based on the combination of the fair coin flip and the “true” cheating probability $\theta$.
Plot the posterior $p(\theta | y)$ using a grid approximation (looping over values of $\theta$).
Use rejection sampling to draw 1,000 samples from the posterior and plot the distribution of samples. How well are you capturing the posterior you obtained from the grid approximation? Would you prefer to have more samples?
Report the expected value of $\theta$ and its 90% credible interval. What conclusions can you draw about cheating probability and whether your belief in your class was warranted? How much uncertainty exists as a result of this privacy-protecting procedure?
Repeat the analysis with a less constrained prior of your choosing. How does that change your inferences and conclusions? What does this tell you about the effectiveness of the procedure?

Problem 2

You have been asked by a client to assess the risks of flooding to their home (which is valued at $400,000 and only floods when the stream water levels exceed 1.5m). They would like to know the probability distribution of annual flood damages to their home if they do not take any preventative floodproofing measures. After some hydrologic analysis, you have developed a 40-year record of annual maximum water levels (in m) at the nearby stream.

You also have a depth-damage function for the fraction of the home’s value which is damaged at varying flood depths:

\[d(h) = \mathbb{I}[h > 0] \frac{1}{1+\exp(-k(x-x_0))},\]

where $k=1.25$ and $x_0=2$¹. The graph of this function is given in the figure below.

¹ As a reminder, the indicator function $\mathbb{I}[h > 0]$ is $0$ when the condition is not satisfied (in this case, when $h=0$) and $1$ when it is satisfied.

Depth-damage function in Problems 2 and 3.

In this problem:

Fit a log-normal distribution to the water depth data by maximizing likelihood.
Use 10,000 samples from this distribution to estimate the expected damages from an annual maximum flood event using Monte Carlo simulation. Report your estimate as well as its standard error. Would you want to use more samples? Why or why not?

Problem 3

GRADED FOR 5850 STUDENTS ONLY

Next, your client would like advice on whether it would be cost-effective to elevate their home to reduce flood risks. To address this question, you will need to calculate the net present value (NPV) of flooding damages, which converts damages over time to a present value using a discount rate² For example, if your discount rate is 4%, a dollar next year is worth the equivalent of about 96 cents today. More generally, with a discount rate of $\gamma$ (as a decimal), a dollar of benefits in $t$ years is worth $\$(1-\gamma)^t$ today.

² Discount rates reduce future monetary values to reflect the time-value of money; that is, you would rather have a bit less money today than none today and more next year, as you could save or invest that money. The actual choice of discount rates is the subject of much economic theory and plays an important role in environmental decision-making, particularly for multi-generational investments such as climate mitigation. For example, with a relatively large discount rate (such as 7%), all costs and benefits after a decade are effectively zero, which means it never appears cost-effective to e.g. reduce fossil fuel emissions.

The NPV of a sequence of money $x_t$ with discount rate $\gamma$ is the sum of all of the discounted values, that is,

\[NPV = \sum_{t=0}^T x_t (1-\gamma)^t,\]

where $T$ is the time horizon. We’ll use a discount rate of 4% in this problem, which is typical for this type of problem, and a design horizon of 30 years.

For this problem, we will assume that the cost of elevating the house $\Delta h$ m is $C(\Delta h) = \mathbb{I}[h > 0](100,000 + 2,000\Delta h)$.

Be careful about the quantity you’re estimating. Calculating the benefits of elevating by $\Delta h$ m requires calculating how the elevation changes the flooding damages over the design horizon relative to a no-heightening baseline.

In this problem:

Using the extreme water level distribution from Problem 2, use Monte Carlo simulation to estimate the NPV of the benefits of elevation levels between 0 and 5m (you don’t have to consider increments finer than 0.5m). What elevation heights pass the cost-benefit test? You can assume that the costs of elevation are all up front (that is, in year 0).
What height (including a possible elevation of 0m) maximizes the NPV of net benefits (benefits - ocsts)?³

³ This is an example of Monte Carlo optimization as you are using Monte Carlo simulation to estimate the optimization objective function.