Documentation for Climate Outcome Likelihood
Tool
Skip to observed data option
Skip to random daily samples
option
Skip to analog data option
General information
This tool is used to determine, based on historic data, the spread of
precipitation possible over a specified amount of time. More
specifically, the tool provides the likelihood of recovering a deficit
+ normal or reaching a certain precipitation threshold by some future
date.
This tool uses data from the from the
Global Historic Climatology Network (GHCN)
accessed through the Applied Climate Information System (ACIS). We
suggest that stations with a record 1920 or earlier through the present
are used for two reasons: 1) a longer record provides more data points
and thus more meaningful results in the probability distribution; and
2) we anticipate this tool will be used to make decisions related to
drought. Utilizing a record 1920present includes several of
California's major droughts and thus the distribution created will
reflect these droughts.
Dates provided must be sequential and the start date (From) must be
within 5 years prior to today and the end date (ending) must be within
5 years following today. This range is set based on timely performance
via the web. The middle date (To) is by default set to "today" but can
vary based on user needs. However, only recovery periods that have
fewer missing days than specified are included in the analysis. If the
middle date is set to a date prior to "today" – say today is
April 4, 2015, and it is set to Jan 1, 2015 – and the end date is
set to a date in the future, say end of the current water year, Sept
30, 2015, and missing days is set to 5, the Jan 1, 2015Sept 30, 2015,
period will not be featured in probability distribution because there
are too many missing days, even though some observations have been made
in the period.
Analyses were performed using Python and graphed using High Charts
Javascriptbased software. The Django framework was used to interface
between the Python analyses on the server side and Javascript on the
client side. Data are acquired via calls to
ACIS web services, abbreviated here as ACISws.
Observed data option

Make ACISws call to retrieve precipitation sum between user selected
start date (From) and middle date (To) less 1 day since precipitation
has not typically been observed/entered system for today. Also
retrieve 19812010 normal precipitation for this period based on a
sum of all daily normal values. Subtract observed value from normal
value to determine if a surplus or deficit is present – a
surplus will be a negative value and a deficit positive. Metadata
(station name, state, period of record) are also gathered at this time.

Retrieve precipitation sums via ACISws call for all periods in the
station record that are equivalent to the "recovery period", the
period between dates in form labeled "to" and "ending" that have a
number of missing days less than that specified by the user. For
example, if the user specified recovery period was 20150405 to
20150930 with 5 missing days, sums for all periods April 10Sept 30
apparent in the station record that have less than 5 missing days
would be retrieved. The number of such periods present in the record
will determine the number of records used in the analysis. An array
of these "recovery period" sums is created. The 19812010 normal for
this period is also retrieved.

Calculate deciles (10th to 100th percentile by multiples of 10) for
the array of "recovery period" sums. This is done using Python's
numpy.percentile function to calculate each decile.

Create a normalized histogram of the "recovery period" sums. Determine
the edge values for 1" bins based on the minimum and maximum values
present in the recovery period array. Use numpy.histogram function to
place array values in bins and normalize the histogram (density=True)
such that the sum of the area of the bars is equal to 1 and each bar
height represents the probability of receiving a precipitation amount
within that bin.

Prepare data to be graphed as cumulative distribution function: Sort
array of "recovery period" sums from lowest to highest. Calculate the
proportional values of the samples:
p = 1. * arange(len(data)) / (len(data) 
1)
Plot the sorted data.

Determine likelihood of a particular outcome:

If user has selected "amelioration": Take the surplus/deficit
value calculated in step 1 and add it to the normal for the
recovery period retrieved in step 2. This provides the amount
needed to recover any deficit and reach normal by the end of the
recovery period. Next, determine how many of the accumulations in
the future period array are greater than or equal to the amount
needed – those that would recover any deficit in observed
period as well as normal for recovery period. To get likelihood
of recovery as a percentage:
(number of accumulations in record ≥
amount needed / total accumulations) * 100
If there are no values in the record that will recover the deficit
and normal for the recovery period, likelihood will be 0. If there
is a large surplus of precipitation in the observed period and/or
all of the accumulations in the recovery period would allow for
recovering any deficit +normal, then likelihood of recovery will
be 100%.

If user has selected "custom threshold": Determine how many values
in the recovery period array are greater than or equal to the
proposed threshold. To get likelihood as a percentage:
(number of accumulations in record ≥
threshold / total accumulations) * 100
The opposite is performed to calculate the likelihood of
not reaching a threshold.

Display data on graph: Graph probability density function based on
data produced in step 4. Include "amount needed" to recover
deficit/reach normal as a vertical line if "amelioration" was chosen.
If "custom threshold" was chosen, display the threshold as a vertical
line. Graph cumulative distribution function based on data produced
in step 5. On both the PDF (probability density function) and CDF
(cumulative distribution function) graphs, display deciles calculated
in step 3 as vertical grey lines on both graphs to help the user
determine the precipitation amounts associated with various
percentiles. Display 19812010 normal for the recovery period in red
as well.
Random daily samples option
A word on daily samples: This method does not account for precipitation
dependencies, i.e., if precipitation occurs on one day, there is a
greater likelihood it will occur on the subsequent day. For this
reason, we have found that the sampling method often displays a lower
maximum than the observed method. It is an interesting exercise to
compare the observed distribution with the sampled distribution to
view their similarities and differences. We welcome any recommendation
for an improved sampling method.

Make ACISws call to retrieve precipitation sum between user selected
start date (From) and middle date (To) less 1 day since precipitation
has not typically been observed/entered system for today. Also
retrieve 19812010 normal precipitation for this period based on a
sum of all daily normal values. Subtract observed value from normal
value to determine if a surplus or deficit is present – a
surplus will be a negative value and a deficit positive. Metadata
(station name, state, period of record) are also gathered at this
time.

Retrieve daily precipitation values via ACISws call for all periods
in the station record that are equivalent to the "recovery period",
the period between dates in form labeled "to" and "ending". Missing
data are not accounted for here since they are taken care of
inherently within the sampling. For example, if the userspecified
recovery period was 20150415 to 20150930, daily values for all
periods April 10Sept 30 apparent in the station record would be
retrieved. An array of the "recovery period" daily values is created
for each period in the station record. An array of all years in the
station record is created as well. The 19812010 normal for this
period is also retrieved.

Perform Sampling: For each day in the recovery period (in this example,
20150405 through 20150930), a random year is selected using the
Python random module and the precipitation value for that particular
day is added to a running sum. This is repeated until the last day of
the recovery period is reached. Sampling for the example period might
look like this:
Sample 1: 19420405 + 20080406 +
19550407 + 19970408 + .... + 20100930
Sample 2: 20120405 + 19780406 + 19220407
+ 19460408 + .... + 19690930
This process is repeated 1,000 times and yields an array of 1,000 sums
representing these "synthetic periods". The number of samples was
determined based on a value that could be run time through a web
interface in less than 5 seconds. If a missing value is encountered
while sampling, another year is chosen at random. This process is
repeated up to 100 times. If a value cannot be found after 100 tries
(for example, if there is no value for any April 6 in the station
record), the process terminates and sampling cannot be conducted on
the selected station.

Calculate deciles (10th  100th percentile by multiples of 10) for the
array of sampled sums. This is done using Python's numpy.percentile
function to calculate each decile.

Create a normalized histogram of the sampled precipitation
accumulations. Determine the edge values for 1" bins based on the
minimum and maximum values present in the sample array. Use
numpy.histogram function to place array values in bins and normalize
the histogram (density=True) such that the sum of the area of the
bars is equal to 1 and each bar height represents the probability of
receiving a precipitation amount within that bin.

Prepare data to be graphed as cumulative distribution function: Sort
array of sampled sums from lowest to highest. Calculate the
proportional values of the samples.
p = 1. * arange(length(data)) /
(length(data)  1)
Plot the sorted data.

Determine likelihood of a particular outcome:

If user has selected "amelioration": Take the surplus/deficit
value calculated in step 1 and add it to the normal for the
recovery period retrieved in step 2. This provides the amount
needed to recover any deficit and reach normal by the end of the
recovery period. Next, determine how many of the accumulation
values in the recovery period sampled array are greater than or
equal to the amount needed – those that would recover any
deficit in observed period as well as normal for recovery period.
To get likelihood of recovery as a percentage:
(number of accumulations in record ≥
amount needed / total accumulations) * 100
If there are no values in the record that will recover the deficit
and normal for the recovery period, likelihood will be 0. If
there is a large surplus of precipitation in the observed
period and/or all of the accumulations in the recovery period
would allow for recovering any deficit +normal, then likelihood
of recovery will be 100%.

If user has selected "custom threshold": Determine how many values
in the sampled recovery period array are greater than or equal to
the proposed threshold. To get likelihood as a percentage:
(number of accumulations in record ≥
threshold / total accumulations) * 100
The opposite is performed to calculate the likelihood of not
reaching a threshold.

Display data on graph: Graph probability density function based on
data produced in step 6. Include "amount needed" to recover
deficit/reach normal as a vertical line if "amelioration" was chosen.
If "custom threshold" was chosen, display the threshold as a vertical
line. Graph cumulative distribution function based on data produced
in step 5. On both the PDF and CDF graphs, display deciles calculated
in step 4 as vertical grey lines on both graphs to help the user
determine the precipitation amounts associated with various
percentiles. Display 19812010 normal for the recovery period in red
as well.
Analog data option

Make ACISws call to retrieve precipitation sum between user selected
start date (From) and middle date (To) less 1 day since precipitation
has not typically been observed/entered system for today. Do this for
all like periods in the station record as well. Also retrieve
19812010 normal precipitation for this period based on a sum of all
daily normal values. Subtract observed value (most current period)
from normal value to determine if a surplus or deficit is present
– a surplus will be a negative value and a deficit positive.
Metadata (station name, state, period of record) are also gathered at
this time.

Determine what decile range the observed value lies in. Using the
array of precipitation values for the observed period, calculate
deciles (10th  100th percentile by multiples of 10). This is done
using Python's numpy.percentile function to calculate each decile. In
this example, we will say the "observed" period selected was
20141001 through 20150405 (recall that accumulation is taken for
this period less 1 day). 7.4 inches of precipitation were observed
20141001 through 20150404; this lies in the 50th60th percentile
for the period.

From the web form, determine what the user wants as analogs, ±
1, 2, or 3 deciles. For this example, we will say the user selected
"± 1 decile". Since the observed value is in the 50th60th
percentile, we will grab all the 1001 to 0404 periods in the
station's record that have accumulations that fall within that
percentile range as well as those that the 40th50th percentile
(1 decile) and the 60th70th percentile (+1 decile). We will say,
for the sake of brevity, the station being used has a short record
(not recommended) and there are 2 values in the 40th50th percentile
(periods of 1001 to 0404 ending 2001, 2005), 2 values in the
50th60th percentile (periods ending 2011, 2015), and 2 values in the
60th70th percentile (periods ending 1998, 2004). Thus, the 1001 to
0404 periods ending in 2001, 2005, 2011, 1998, and 2004 are
considered the "analogs" to the current accumulation value ending in
2015 by the definition of precipitation totals within ± 1
decile of the observed period. Periods with a number of missing days
greater than the "missing" amount specified will not be given as
analogs.

Next, we look at the future "recovery" period selected by the user. In
this example we will use 20150405 through 20150930. To build the
probability distributions that will be graphed, we utilize the
recovery period beginning in each analog year calculated for the
"observed" period. Thus, the recovery periods would be 20010405 to
20010930, 20050405 to 20050930, 20110405 to 20110930,
19980405 to 19980930, and 20040405 to 20040930 in this short
record station example. An ACISws request is made for precipitation
accumulation totals for each of these periods and the values are
placed in an array. If any of these periods have a number of missing
values exceeding the specified threshold for missing data, that
period would not be included in the array for analog recovery
periods. The normal for the "recovery" period is retrieved via
ACISws as well.

Next, a normalized histogram of the "analog" accumulations is created.
In the simple example given here, there are only five accumulations
to graph so it is not going to make a great histogram. For this
reason, we recommend doing all analyses, especially analogs, on
stations with long station records, preferably 5060 years or more.
To create the histogram, determine the edge values for 1 inch bins
based on the minimum and maximum values present in the recovery
period array. Use numpy.histogram function to place array values in
bins and normalize the histogram (density=True) such that the sum of
the area of the bars is equal to 1 and each bar height represents the
probability of receiving a precipitation amount within that bin.

Prepare data to be graphed as cumulative distribution function: Sort
array of analog sums from lowest to highest. Calculate the
proportional values of the samples.
p = 1. * arange(length(data)) / (length(data)
 1).
Plot the sorted data.

Determine likelihood of a particular outcome:

If user has selected "amelioration": Take the surplus/deficit
value calculated in step 1 and add it to the normal for the
recovery period retrieved in step 4. This provides the amount
needed to recover any deficit and reach normal by the end of the
recovery period. Next, determine how many of the accumulation
values in the recovery period analog array are greater than or
equal to the amount needed – those that would recover any
deficit in observed period as well as normal for recovery period.
To get likelihood of recovery as a percentage:
(number of accumulations in record ≥
amount needed / total accumulations) * 100
If there are no values in the analog array that will recover the
deficit and normal for the recovery period, likelihood will be 0.
If there is a large surplus of precipitation in the observed
period and/or all of the accumulations in the recovery period
would allow for recovering any deficit +normal, then
likelihood of recovery will be 100%.

If user has selected "custom threshold": Determine how many values
in the analog recovery period array are greater than or equal to
the proposed threshold. To get likelihood as a percentage:
(number of accumulations in record ≥
threshold / total accumulations) * 100
The opposite is performed to calculate the likelihood of
not reaching a threshold.

Display data on graph: Graph probability density function based on
data produced in step 5. Include "amount needed" to recover
deficit/reach normal as a vertical line if "amelioration" was chosen.
If "custom threshold" was chosen, display the threshold as a vertical
line. Graph cumulative distribution function based on data produced
in step 6. On both the PDF and CDF graphs, display 19812010 normal
for the recovery period in red. Deciles are not calculated or shown
for the analog option due to the small number of values graphed.