Tuesday, October 27, 2015

Scale Independent Hydrologic Metrics

Scale Independent Hydrologic Metrics

Scale independent hydrologic metrics are quantities that capture watershed hydrologic responses and they do not depend on the size of the watershed, put another way they are useful ways to measure hydrologic response that do not scale with watershed area. However, even these metrics generally breakdown when used at the very fine or large scales (e.g. pore scale or continental scale). Scale independent metrics are useful in relating hydrologically similar watersheds of different scales to one another and to identify fundamental hydrologic properties of a watershed. For example the relation between storage and discharge of a watershed can be described using scale invariant recession metrics. I put together a table of four basic and well known scale independent hydrologic metrics that can be calculated for a watershed using commonly available data such as streamflow, precipitation, and temperature.

Metric name Quantity Description
Runoff ratio $\huge{\frac{\overline{Q}}{\overline{P}}}$ $\overline{Q}$ and $\overline{P}$ are long-term average streamflow and precipitation for the watershed. The runoff ratio represents the fraction of precipitation that has contributed to streamflow in a watershed assuming the only hydrologic input is direct precipitation. Depending on the time period used to calculate the runoff ratio it can be an indicator of surface properties in a watershed (impervious to highly permeable) or over longer periods it indicates losses to evapotranspiration.
Snowfall/Precipitaion-day ratio $\huge{\frac{N_{S}}{N_{P}}}$ $N_{S}$ is the number of days in a given year that had snowfall and $N_P$ is number of days that exhibited precipitation. Sometimes these are measured using a threshold amount of of snow or precipitation e.g. number of days with snowfall above 1 cm or rainfall above 1 mm, $N_S$ may also be calculated as the number of days when the temperature was below freezing. Also called the snow day ratio this metric quantifies the dominance of snow as the major form of precipitation for a watershed.
Streamflow elasticity median$\huge{\left(\frac {\frac{dQ}{\overline{Q}}} {\frac{dP}{\overline{P}}} \right)}$ $\overline{Q}$ and $\overline{P}$ are long-term average streamflow and precipitation for the catchment. The derivative of $Q$ or $P$ is likely not available due to the non-continuous measurements of these variables and these quantities are estimated by differencing each day from the previous day's value, alternatively the difference can calculated by subtracting each day by the annual mean value. Streamflow elasticity indicates how sensitive the watershed is to precipitation, that is for a given change in $P$ how much will $Q$ change. A high value indicates an elastic or sensitive watershed whereas a low vaule indicates an inelastic or insensitive watershed.
Baseflow index $\huge{ \frac{\overline{Q_{bf}}} {\overline{Q_{sf}}} }$ $\overline{Q_{bf}}$ is long-term average of the baseflow component of streamflow, usually estimated using a digital filter and $\overline{Q_{sf}}$ is the long-term average streamflow for the catchment. Baseflow index or BFI represents the long-term average proportion of the baseflow component (i.e. groundwater discharge) of total streamflow in a stream. A BFI close to one means that nearly all flow is from slow subsurface discharge to a stream whereas a BFI near zero means that the stream gets most of its input from fast overland flow or preferential flow from precipitation events. Thus the BFI can be related to land cover, soil/aquifer permeability, and other watershed characteristics.

The concept of scale independent metrics for hydrology is important; they can give insight into fundamental hydrologic processes in a watershed. This is an abbreviated list of straightforward metrics that can easily be calculated for a gaged watershed. Using scale independent metrics to investigate hydrologic change and variability is important because the metrics relate to key hydrologic processes or important watershed characteristics. They are also commonly used to classify hydrologically similar watersheds. Can you think of other scale independent metrics? Feel free to post your thoughts in the comments.

Wednesday, October 7, 2015

Why you should learn Python's multiprocessing module

Python multilprocessing basics

This post is not an in depth guide to multiprocessing in Python or even a brief intro. Rather it is intended to give you motivation to bother learning it. When I recently experimented with Python's cross-platform multiprocessing module I was pleasantly surprised on how easy it was to use. I was able to quickly parallelize iterative tasks in Python. For example I have a script that runs the same text parsing on files in multiple directories. Using a simple tool from the multiprocessing module allowed me to easily run the text processing in multiple directories simultaneously. This saved a lot of time since the alternative was looping over the list of directories and running the same command in each directory one at a time. You might have heard that Python is not a good language for parallel processing due to the Global Interpreter Lock issue but the multiprocessing module enables bypassing the lock.

Simple example using Pool

The multiprocessing module has several objects that may be useful for you; one that stands out is the pool class. Pool allows you to utilize multiple processors to run a set of independent tasks quite efficiently (small amount of code) in parallel. To use pool first create a pool object and then simply call its map method which maps a python data collection e.g. a list or dictionary to a single parameter function. In other words map applies the function you pass it to every element in a collection you also must pass to it as input. Check it out,

import multiprocessing as mp
import time

a_list = range(8)

def f(x):
    print x

pool = mp.Pool(processes=8)

When run produces,

$ python mp.py 

In this arbitrary and simple example the last two lines of code do all the work. These two lines create the pool object and apply the function "f" to each element in "a_list" simultaneously on 8 cpu threads. The pool map method is analogous to the built in Python map function. Also notice that f simply prints the input parameter x but when run the output is not in the original order of a_list. This is the expected result because it is running 8 processes in parallel. Pool.map does not apply the function to the elements in the collection you pass it in any order. This means that the tasks you assign to pool must be completely independent from one another.

You can easily extend this example to fit more complicated workflows and as you can see it is incredibly simple. Chances are you have access to a multi-core/hyper-threaded processor that you are under-utilizing. So don't be scared any more, this Python module allows anyone to utilize multiprocessing in a very simple way. Any time you invest learning this module will greatly reward you by saving your runtime later. Cheers