MCAR: Strong, but generally implausible. Can only use complete cases as observed data is fully representative.
MAR: More plausible than MCAR, can still justify complete-case analysis as conditional observed distributions are unbiased estimates of conditional complete distributions.
MNAR: Deletion is a bad idea. The observed data does not follow the same conditional distribution. Missingness can be informative: try to model the missingness mechanism.
Methods for Dealing with Missing Data
Imputation: substitute values for missing data before analysis;
Averaging: find expected values over all possible values of the missing variables.
Multiple Imputation Example
Example Quality Data
Code
dat = CSV.read("data/airquality/airquality.csv", DataFrame)rename!(dat, :"Solar.R"=>:Solar)dat.Miss_Ozone =ismissing.(dat.Ozone)dat.Miss_Solar =ismissing.(dat.Solar)dat[2:5, 1:7]