Trajectory Regression Attribution
Analysis |
|
Trajectory regression analysis is a way to determine a
simple relationship between a measured air quality parameter at a specific
receptor location and the amount of time that air is flowing across
potential source regions on their way to the receptor location. The
dependent variable in the multiple-linear regression is the air quality
parameter and the independent variables are the estimates of time that air
spends over each of a group of specific source regions. Back-trajectory
analysis is used to estimate the amount of time an air parcel spends over
each source region. |
|
Implicit in the trajectory regression attribution
analysis approach is the concept that the amount of time air spends over a
region determines that regions contribution to pollutants measured at the
receptor site. Obviously this approach is too simplistic to capture the
effects of varying atmospheric factors that are known to be influential
modulating concentration (e.g. washout by precipitation, enhanced chemistry
in clouds, etc.). Another source of uncertainty results from errors in the
trajectory estimated position and movement of air over source prior to
arriving at the receptor sites. In spite of these sources of error,
long-term average source region contributions determined by trajectory
regression attribution analysis have been found to be reasonable and useful.
[add references to papers by Gebhart & others] |
|
The trajectory regression analysis for the Causes of Haze
Assessment is done two ways, with an additive intercept term and with no
intercept. The intercept term is thought to account for both contributions
from beyond the selected source regions (like a global background value) and
statistical noise from imprecise parameters and an imperfect model.
Regression without the intercept forces the sources to account for some of
the background contributions, so will overestimate some of the source region
contributions, while regression with an intercept may underestimate some
source region contributions by incorporating statistical noise in the
intercept term. The most reasonable attribution results by the trajectory
regression method are likely within a range set by regression with and
without an intercept. |
|
Statistical Uncertainty and Level of
Significance |
|
A feature of the regression analysis methodology is that
an uncertainty level and statistical significance are estimated for each
regression coefficient. The uncertainty is the standard error for the
regression coefficient, meaning that there is a 67% probability that the
“true” coefficient is within a range of plus or minus the uncertainty around
the regression coefficient value. The magnitude of the attribution
uncertainties are shown in the attribution bar plots as a vertical line
extending above and below the top edge of each attribution bars for the
source regions. If the uncertainty line is of a comparable magnitude to the
attribution bar height (e.g. ½ or greater than the bar), then the
attribution should be considered uncertain. Another way to assess the
question of statistical uncertainty is with the significance or p value for
each coefficient. It estimates the probability that the regression
coefficient value is statistically different than zero. A p value of 0.0000
indicates that the regression coefficient has less than 0.01% chance of
being zero. Larger p values correspond to higher probability of being
insignificantly different from zero (e.g. p = 0.0010 → 0.1% probability; p =
0.0100 → 1% probability; and p = 0.1000 → 10% probability of being zero), so
the smaller the p value the more reliable is the regression attribution
method from a statistical prospective. |
|
Source Regions |
|
Selection of source regions for trajectory regression can
be done in any number of ways. Some approaches are more likely to be
successful than others in terms of producing a statistically meaningful
result. In general you want to have relatively larger source regions at a
greater distance to the site than those near the site. It’s generally good
to avoid source region boundaries that bisect high emissions density
regions, but it is also good to chose source regions that are of interest to
the ultimate user, in our case the WRAP. The approach chosen to define
source regions for the trajectory regression for the Causes of Haze
Assessment is outlined below. |
|
 | Divide the state containing the monitoring site into quadrants (i.e.
NE, SE, SW, & NW) with the origin at the monitoring sites. This will
provide the first 4 source regions, though for some sites it may be
effectively reduced to as few as one source region. |
|
 | Have every state bordering the state containing the monitoring site as
a separate source region (i.e. neighboring states). This will give two to
six additional source regions. |
|
 | Group all other states beyond the bordering states into four quadrants
(see example maps). The boundaries will be the same regardless of the
monitoring site, except that neighboring sites are excluded. This will
provide up to 4 additional source regions. |
|
 | Include Mexico, Canada, Pacific Ocean, Gulf of Mexico, and Atlantic
Oceans as separate source regions. Adds five more source regions. |
|
 | This gives a total of from 9 to 19 source regions in the western
contiguous states. |
|
 | For the states of Alaska & Hawaii lets divide the states into
quadrants, and everything outside of the states into quadrants for a total
of 8 source regions |
|
|
Back-Trajectory Analysis |
|
|
|
|
|
|
|
|
|
|
|
|
|
|