Figure 5: 2015 detrended tide gauge data from the Norfolk, VA tide gauge.
Block Maxima
Block Maxima
Given independent and identically-distributed random variables \(X_1, X_2, \ldots, X_{mk}\), what is the distribution of maxima of “blocks” of size \(m\):
\[\tilde{X}_i = \max_{(i-1)m < j \leq im} X_j,\]
for \(i = 1, 2, \ldots, k\)?
How Are Maxima Distributed?
Consider the CDF \(F_X\) of \(X\) and let \(Y_n = \max \{X_1, \ldots, X_n \}\).
Figure 6: 2015 detrended tide gauge data from the Norfolk, VA tide gauge.
Stability Postulate
By analogy with sum stability, postulate that for a max-stable process,
\[F^n(z) = F(a_n z + b_n)\]
for some constants \(a_n, b_n \geq 0\).
Extremal Types Theorem
Let \(X_1, \ldots, X_n\) be a sample of i.i.d. random variables.
If a limiting distribution for \(Y = \max\{X_1, \ldots, X_n\}\) exists, it can only by given as a Generalized Extreme Value (GEV) distribution:
\[H(y) = \exp\left\{-\left[1 + \xi\left(\frac{y-\mu}{\sigma}\right)\right]^{-1/\xi}\right\},\] defined for \(y\) such that \(1 + \xi(y-\mu)/\sigma > 0\).
Figure 7: Shape of the GEV distribution with different choices of \(\xi\).
GEV Types
\(\xi < 0\): extremes are bounded (the Weibull distribution comes up in the context of temperature and wind speed extremes).
\(\xi > 0\): tails are heavy, and there is no expectation if \(\xi > 1\). Common for streamflow, storm surge, precipitation.
The Gumbel distribution (\(\xi = 0\)) is common for extremes from normal distributions, doesn’t occur often in real-world data.
Return Levels
Return Levels are a central (if poorly named) concept in risk analysis.
The \(T\)-year return level is the value expected to be observed on average once every \(T\) years.
From a GEV fit to annual maxima: \(T\)-year return level is the \(1-1/T\) quantile.
Return Periods
The return period of an extreme value is the inverse of the exceedance probability.
Example: The 100-year return period has an exceedance probability of 1%, e.g. the 0.99 quantile.
Return levels are associated with the analogous return period.
San Francisco Tide Gauge Data
Code
# load SF tide gauge data# read in data and get annual maximafunctionload_data(fname) date_format =DateFormat("yyyy-mm-dd HH:MM:SS")# This uses the DataFramesMeta.jl package, which makes it easy to string together commands to load and process data df =@chain fname begin CSV.read(DataFrame; header=false)rename("Column1"=>"year", "Column2"=>"month", "Column3"=>"day", "Column4"=>"hour", "Column5"=>"gauge")# need to reformat the decimal date in the data file@transform:datetime =DateTime.(:year, :month, :day, :hour)# replace -99999 with missing@transform:gauge =ifelse.(abs.(:gauge) .>=9999, missing, :gauge)select(:datetime, :gauge)endreturn dfendd_sf =load_data("data/surge/h551.csv")# detrend the data to remove the effects of sea-level rise and seasonal dynamicsma_length =366ma_offset =Int(floor(ma_length/2))moving_average(series,n) = [mean(@view series[i-n:i+n]) for i in n+1:length(series)-n]dat_ma =DataFrame(datetime=d_sf.datetime[ma_offset+1:end-ma_offset], residual=d_sf.gauge[ma_offset+1:end-ma_offset] .-moving_average(d_sf.gauge, ma_offset))# group data by year and compute the annual maximadat_ma =dropmissing(dat_ma) # drop missing datadat_annmax =combine(dat_ma -> dat_ma[argmax(dat_ma.residual), :], groupby(transform(dat_ma, :datetime =>x->year.(x)), :datetime_function))delete!(dat_annmax, nrow(dat_annmax)) # delete 2023; haven't seen much of that year yetrename!(dat_annmax, :datetime_function =>:Year)select!(dat_annmax, [:Year, :residual])dat_annmax.residual = dat_annmax.residual /1000# convert to m# make plotsp1 =plot( dat_annmax.Year, dat_annmax.residual; xlabel="Year", ylabel="Annual Max Tide Level (m)", label=false, marker=:circle, markersize=5)p2 =histogram( dat_annmax.residual, normalize=:pdf, orientation=:horizontal, label=:false, xlabel="Count", ylabel="", yticks=[])l =@layout [a{0.7w} b{0.3w}]plot(p1, p2; layout=l, link=:y, ylims=(1, 1.7), bottom_margin=5mm, left_margin=5mm)plot!(size=(1000, 450))# find GEV fit# for most distributions we could use Distributions.fit(), but this isn't implemented in Distributions.jl for GEVinit_θ = [1.0, 1.0, 1.0]gev_lik(θ) =-sum(logpdf(GeneralizedExtremeValue(θ[1], θ[2], θ[3]), dat_annmax.residual))θ_mle = Optim.optimize(gev_lik, init_θ).minimizer
Figure 11: GEV fit to annual maxima of San Francisco Tide Gauge Data
GEV Q-Q Plot
Code
p1 =qqplot(GeneralizedExtremeValue(θ_mle[1], θ_mle[2], θ_mle[3]), dat_annmax.residual, linewidth=3, markersize=5, xlabel="Theoretical Quantile", ylabel="Empirical Quantile")plot!(p1, size=(600, 450))return_periods =2:500# get GEV return levelsreturn_levels =quantile.(GeneralizedExtremeValue(θ_mle[1], θ_mle[2], θ_mle[3]), 1.- (1./ return_periods))# fit lognormal to get return levels for comparisonlognormal_fit =fit(LogNormal, dat_annmax.residual)return_levels_lognormal =quantile.(lognormal_fit, 1.- (1./ return_periods))# function to calculate exceedance probability and plot positions based on data quantilefunctionexceedance_plot_pos(y) N =length(y) ys =sort(y; rev=false) # sorted values of y nxp = xp = [r / (N +1) for r in1:N] # exceedance probability xp =1.- nxpreturn xp, ysendxp, ys =exceedance_plot_pos(dat_annmax.residual)p2 =plot(return_periods, return_levels, linewidth=5, color=:gold4, label="GEV Model Fit", bottom_margin=5mm, left_margin=5mm, right_margin=10mm, legend=:bottomright)plot!(p2, return_periods, return_levels_lognormal, linewidth=5, color=:darkred, label="LogNormal Model Fit")scatter!(p2, 1./ xp, ys, label="Observations", color=:black, markersize=6)xlabel!(p2, "Return Period (yrs)")ylabel!(p2, "Return Level (m)")xlims!(-1, 300)plot!(p2, size=(600, 450))display(p1)display(p2)
(a) GEV fit to annual maxima of San Francisco Tide Gauge Data
(b)
Figure 12
Be Careful About The Shape Parameter!
\(\xi\) can be difficult to constrain;
Often lots of uncertainty (large standard errors);