Tuesday, May 5, 2015

Updates to R package emdatr: More than 21000 Natural Disasters since 1900

The International Disaster Database, EMDAT database from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from select natural and man-made disasters. The goal of the emdatr package is to improve the EMDAT database by promoting its use, shedding light on its limitations, and making analysis of the data easier using R

Major updates in emdatr v0.3:
  • Latest data from EMDAT has been included, additional data on loss 'normalization' also updated when available.
  • Vignette has been changed from a PDF to a markdown document.
Here is the emdatr package home page on CRAN.

Thursday, June 26, 2014

Updates to R package raincpc: Global Daily Rainfall for over 35 years

The Climate Prediction Center's  (CPCglobal rainfall data, 1979 - present, 50 km resolution, is one of the few high-quality, long-term, observation-based, daily rainfall products available for free. Although raw data is available at CPC's ftp site, obtaining and processing the data is not easy since there are over 12000 files, and formats and names of these files have changed over time. 

The latest version of the raincpc package provides functionality to download, process and visualize over 35 years of global daily rainfall data from CPC. The vignette demonstrates the use of this package, including the extraction and display of regional rainfall data.

Following are some graphics from the raincpc vignette.






Thursday, June 19, 2014

New R package hazus: Damage functions from FEMA's HAZUS software for use in modeling financial losses from natural disasters

Damage Functions (DFs) translate physical damage to property, resulting from natural disasters, to financial damage. FEMA in USA developed several thousand DFs and these serve as a benchmark in natural catastrophe modeling, both in academia and industry. However, these DFs and their documentation are buried within FEMA's HAZUS software and are not easily accessible for analysis and visualization.

The hazus package provides more than 1300 raw DFs used by FEMA's HAZUS software and also functionality to extract and visualize DFs specific to the flood hazard.

Here is the link to the package home on CRAN. Below is a graphic from the package vignette in R markdown.


Sunday, May 25, 2014

New R package rainfreq: Rainfall Frequency (or Design Storm) Estimates from the US National Weather Service

Rainfall estimates at desired frequency (e.g., 1% annual chance or 100-year return period) and duration (e.g., 24-hour) are often required in the design of dams and other hydraulic structures, catastrophe risk modeling, environmental planning and management. One major source of such estimates for the USA is the NOAA National Weather Service. Raw data is available at 1-km resolution and comes as a huge number of GIS files. 

The new R package rainfreq provides functionality to easily access and analyze the 1-km GIS files provided by NWS' PF Data Server for the entire USA. This package also comes with datasets on record point rainfall measurements provided by NWS.

Here is the rainfreq package home page on CRAN. Here are some graphics from the package vignette.



Tuesday, May 13, 2014

Updates to R package emdatr: Global Disaster Losses from the EMDAT Database

The EMDAT database provides valuable information on human and financial losses from natural disasters around the world. Some of the issues with the EMDAT data are lack of entire data accessibility, static and inconsistent summary reports, and the lack of auxiliary financial and demographic data. The emdatr package addresses some of these issues. 

Major updates in emdatr v0.2:

  • Data has been updated to include the whole of 2013.
  • Data is now hosted on bitbucket.org and only a sample is provided with the package. Package has the functionality to extract the entire data.
  • A new vignette which explains the raw data clean-up and enhancement procedure and which also demonstrates use of the package.
Here is the emdatr package home page on CRAN. Below is a summary graphic on number of natural disasters by decade obtained using the package.


Wednesday, May 7, 2014

New R package dams: Dams in the United States

The dams package provides functionality to access over 74,000 dams in the National Inventory of Dams (NID) from the US Army Corps of Engineers, the single largest source of dams in the United States. Each dam has 64 attributes such as geographical, structural, hydraulic and operational characteristics.

Obtaining data directly from NID has to be done manually and the website's GUI is not user-friendly - only a couple of thousand records could be displayed at a time on the GUI, but there is no option to save these records to a file. Data was obtained manually from NID's website and then cleaned up. The dams package comes with a sample of the cleaned data and the `extract_nid` function from the package could be used to obtain all of the cleaned data.

Here is the dams package home page on CRAN. Here are some graphics from the package vignette.




Monday, April 7, 2014

Dams in the United States from the National Inventory of Dams (NID) Database

There is no database containing information on all the dams in the United States. The single largest source is the National Inventory of Dams (NID) from the US Army Corps of Engineers which claims to have more than 80,000 dams. I downloaded the entire data from NID and also cleaned it up.

I am in the process of creating an R package for this dataset and will shortly have a post on it.

Here are some graphics.



Tuesday, January 21, 2014

The Tornado Project: Annual Tornado Frequency by Location

The goal of this open source R-based analysis, as mentioned earlier (first post, second post) is to bring consistency and transparency to the analyses of publicly available Tornado data.

The latest addition to the project is the analysis of local tornado occurrence probability. Below graphics show the average number of tornadoes per year within the United States since 1980. The average number appears to increase with the addition of the recent data.



The project home page is here - http://rationshop.github.io/tornado_r/

Any help or comments or contributions appreciated.

Wednesday, January 8, 2014

USA Drought of 2013: Analysis of High-resolution Rainfall Data Using R

The ongoing drought in California and other parts of Southwestern United States has been reported extensively by newspapers and government sites.

Although rainfall deficit is technically meteorological drought, and drought could be of several other types (such as hydrological, agricultural, etc.), the attempt here is to demonstrate the use of R in the analysis of high resolution rainfall data. Using 4-km rainfall data from the PRISM Climate Group for 1895-2013, the total for 2013 is compared with the long-term and near-term historical averages.


Spatial patterns compare well with those from the Drought Monitor from the University of Nebraska.

The entire code and all the graphics are available on GitHub - https://github.com/RationShop/rain_prism

This effort is part of The Rain Project.

Any comments or help appreciated.

Monday, January 6, 2014

The Rain Project: An R-based Open Source Analysis of Publicly Available Rainfall Data

Rainfall data used by researchers in academia and industry does not always come in the same format. Data is often in atypical formats and in extremely large number of files and there is not always guidance on how to obtain, process and visualize the data. This project attempts to resolve this issue by serving as a hub for the processing of such publicly available rainfall data using R.
The goal of this project is to reformat rainfall data from their native format to a consistent format, suitable for use in data analysis. Within this project site, each dataset is intended to have its own wiki. Eventually, an R package would be developed for each data source.
Currently R code is available to process data from three sources - Climate Prediction Center (global coverage), US Historical Climatology Network (USA coverage) and APHRODITE (Asia/Eurasia and Middle East).

The project home page is here - http://rationshop.github.io/rain_r/
If you are aware of other sources and would like to add them to this list (and/or would like to add the R code) please let me know. Any other comments or help appreciated.

Sunday, December 29, 2013

The Tornado Project: Counting Tornadoes

The goal of this open source R-based analysis, as mentioned in my earlier post, is to bring consistency and transparency to the analyses of publicly available Tornado datasets.

The project home page is on my GitHub site - https://github.com/RationShop/tornado_r/wiki/The-Tornado-Project

The latest analysis is on the reproduction of stats from literature studies. I was able to reproduce the stats, approximately if not exactly, from several studies. The only issue was with the paper by Simmons et al. I am pretty sure there is something wrong with the numbers published in Simmons et al. Here is the link to the specific analysis - https://github.com/RationShop/tornado_r/wiki/Counting-Tornadoes

Any help or comments or contributions appreciated.

Monday, December 9, 2013

The Tornado Project

Goal

Bring consistency and transparency to the analyses of publicly available Tornado datasets using an R-based open source analysis.

Issues

Tornado data available from the NOAA Storm Prediction Center and other government agencies around the world has been the focus of many studies (e.g., see below references). However, there are a number of issues with these studies:
  • The data itself is changing - both in quantity (additional data added every season) and quality (quality control measures appear to have been applied in the recent past, particularly to the data prior to the 1990s). Hence, it is not possible to exactly, or sometimes even approximately, reproduce the results of these studies.
  • Since the data size is annually changing, it makes more sense to have the analyses revised annually as well. 
  • With the exception of a few (thanks to R user Prof. James Elsner and colleagues), none of the studies provide the code used in their analysis. Moreover, one of the recent studies (Simmons et al 2013) appears to have several errors.
  • Data prior to 1950 in the United States appears to be available only on microfilm. Hopefully, through or due to this effort, some day this data becomes more widely available.

Specific Objectives
  1. Create an R package 
    • The package would come with raw and cleaned Tornado data
  2. Functionality provided by the R package would include:
    • reproduction/replication of summary statistics presented by literature studies. 
    • adjustment of historical monetary losses for inflation and other factors based on literature
    • creation of stochastic and probabilistic models of Tornado hazard, based on literature.
  3. Extend the above to Tornado data from other parts of the world.

Project home


Any help or comments appreciated.

References
  •  NOAA Storm Prediction Center (SPC)
    • Main page - http://www.spc.noaa.gov/gis/svrgis/
    • Summary statistics 1950-99 - http://www.spc.noaa.gov/archive/tornadoes/ustdbmy.html
    • Summary statistics 2000-present - http://www.spc.noaa.gov/climo/online/monthly/newm.html
  • Simmons, Sutter & Pielke, 2013, "Normalized tornado damage in the United States: 1950 - 2011", Environmental Hazards, 12(2), pp. 132-147.
  • Elsner, Murnane, Jagger & Widen, 2013, "A spatial point process model for violent tornado occurrence in the U.S. Great Plains", Mathematical Geosciences, 45(6), pp. 667-679. Code available at - http://rpubs.com/jelsner/4205
  • Verbout, Brooks, Leslie, & Schultz, 2006, "Evolution of the U.S. Tornado Database: 1954-2003", Weather and Forecasting, pp. 86-93.
  • Boruff, Easoz, Jones, Landry, Mitchem & Cutter, 2003, "Tornado hazards in the United States", Climate Research, 24, pp. 103-117.
  • Brooks & Doswell, 2001, "Some aspects of the international climatology of tornadoes by damage classification", Atmospheric Research, 56, pp. 191-201.
  • Grazulis, 1993, "A 110-Year Perspective of Significant Tornadoes", The Tornado: Its Structure, Dynamics, Prediction, and Hazards, Geophysical Monograph 79, pp. 467-474.

Tuesday, November 26, 2013

New R package raincpc: Obtain and Analyze Global Rainfall data from the Climate Prediction Center (CPC)

The Climate Prediction Center's (CPC) daily rainfall data for the entire world, 1979 - present & 50-km resolution, is one of the few high quality and long term observation-based rainfall products. Data is available at CPC's ftp site. However, it is a lot of data and there is no software to analyze and visualize the data.

Some issues with size/format of the CPC data:

  • too many files (365/366 files per year * 34 years, separate folder for each year)
  • each file has 360 rows and 720 columns
  • file naming conventions have changed over time - one format prior to 2006 and couple of different formats afterwards
  • file formats have changed over time - gzipped files prior to 2008 and plain binary files afterwards
  • downloading multiple files simultaneously from the CPC ftp site, using wget, does not seem to work properly
  • there is no software/code readily available to easily process/visualize the data

The R package `raincpc` makes life easier by providing functionality to download and process the data from CPC's ftp site. Some features of this new package are:

  • Data for anytime period during 1979-present can be downloaded and processed
  • Just two functions required: one to download the data (`cpc_get_rawdata`) and another to process the downloaded data (`cpc_read_rawdata`)
  • Making spatial maps using the processed data is easy, via ggplot

Here are some examples on how to obtain and visualize the data - https://github.com/RationShop/raincpc

Below are the relevant CRAN and GitHub sites:
Please let me know if you find any errors or if you have any comments or suggestions.

Friday, November 22, 2013

New R package emdatr: Global Disaster Losses from the EM-DAT Database

The International Disaster Database EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.

Some issues with EM-DAT data and reports:
  • Country names used by EMDAT are not always the same as those used by ISO 3166 convention. This issue is relevant when making spatial maps using R. 
  • Information such as GDP and population from the year of occurrence of the disaster have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information. 
  • Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed. For instance, number of diasters in 2002 were reported to be 428 in the ADSR 2012 report. But the same number in the 2011, 2010, 2009 and 2008 reports is 421, 421, 422 and 421, respectively!
R package emdatr:
  • comes with pre-processed and cleaned EMDAT data
  • includes above-mentioned additional country-level information
  • has functionality to extract desired subsets of the data
  • through the graphics and modeling capabilities provided by R, much more can be accomplished through this R package and the R language, than conventional spreadsheet analyses
  • by making the analysis transparent, the problems with the presentation of summary statistics could be addressed
The home page for the package presents some examples and graphics. Please see - https://github.com/RationShop/emdatr

Below are the relevant CRAN and GitHub sites:
Please let me know if you find any errors or if you have any comments or suggestions.

Monday, November 18, 2013

Towards the R package emdat: Losses from Global Disasters, Part 1

The International Disaster Database, EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.

Some Cons:

  • Some cleaning of the data is required. For instance, the country names used by EMDAT are not always the same as those used by ISO. Hence, making spatial maps in R would involving fixing these names. 
  • Information such as GDP, population and Consumer Price Index have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information. 
  • Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed.

My goal is to create an R package which comes with pre-processed and cleaned EMDAT data and which would also include above-mentioned additional country-level information. Moreover, this R package would have the functionality to extract and analyze the data.

Here is a preliminary analysis of the data. Please see - https://github.com/RationShop/emdat/blob/master/publish_part1.md

All code and graphics are at my GitHub site - https://github.com/RationShop/emdat

Sunday, November 17, 2013

New R package sheldusr: Losses from Natural Disasters in the United States

The SHELDUS database is database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. The new R package sheldusr comes with the cleaned and pre-processed SHELDUS data and includes functionality to retrieve data for any desired time period or natural hazard(s) of interest.

The home page for the package presents some examples and graphics. Please see - https://github.com/RationShop/sheldusr

Below are the relevant CRAN and GitHub sites:
This is my first package and I think it could be improved in several ways. I hope to make these improvements in the coming months. In the meantime, please let me know if you find any errors or if you have any comments or suggestions.

Saturday, November 9, 2013

Towards the R package sheldus: Part 2: Losses from Natural Disasters in the US

In my earlier post I summarized the work on my upcoming R package on the SHELDUS database. This is a database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. My goal is to package the data efficiently so it could be accessed and analyzed easily through R. And some day, develop one package for loss information analysis from various databases (e.g., SHELDUS, EM-DAT, and FEMA's Flood Insurance Program).

Status
  • At this point the code has all the data retrieval functionality available through the GUI from SHELDUS - only in a matter of seconds!
    • retrieval by year
    • retrieval by hazard type
    • adjustment for inflation
  • [TODO] Presidential Disaster Declarations data and also data prior to 1960 needs to be included (these amount to about 40,000 records of the total 820,000 records)
In this second post, I will be extract the entire SHELDUS data and create some interesting graphics. Please see - https://github.com/RationShop/sheldus/blob/master/publish_part2.md

All the graphics and code are available at my GitHub site - https://github.com/RationShop/sheldus

Any help or comments appreciated.

Thursday, November 7, 2013

Towards the R package sheldus, Part 1: Natural Disaster Losses in the US in 2012

The SHELDUS database, short for Spatial Hazard Events and Losses Database in the United States (http://webra.cas.sc.edu/hvri/products/sheldus.aspx), from the University of South Carolina, is a  database on human and property losses from natural disasters in the United States. Data from this database includes County-level information on property losses, crop losses, injuries and fatalities from 18 different types of natural hazards (hurricanes, droughts, floods, etc.) from about 1960 to the present.

Pros
  • Data is free
  • Inflation adjusted losses are also available
Cons
  • Downloading the approximately 200 MB data (as of Nov 2013) from the GUI is tedious. There does not seem to be an easy way to download the entire data all at once. 
  • Currently only one reference year could be chosen for inflation adjustment. What if someone wanted multiple years or wanted to update their data next year - they would have to download the entire data set again through the clunky GUI!
  • Sharing the data and analysis of the entire data is not easy because of its size and layout.
Goal
  • Build an R package which would come with the entire SHELDUS data. 
  • Create functions to display and analyze the data.  
  • [future maybe] Combine this with other disaster damage info (e.g., FEMA's NFIP -https://github.com/RationShop/nfip).
Status/TODOs
  • Most of the code for cleaning and formatting the data, IO and graphics is ready.
  • Instead of plain text, I use the binary format (and a few tricks) reducing the data size to 30 MB (from ~ 200 MB!).
  • [TODO] Ability to retrieve data for multiple years/perils at once.
  • [TODO] Code for inflation adjustment.
  • [TODO] Presidential Disaster Declarations data and data prior to 1960 needs to be included.
I will be revising the code towards building a complete R package. Along my way I will be doing several QA/QC checks and will be posting on my analyses.
In this first post, I will be looking at losses in 2012. Please see - https://github.com/RationShop/sheldus/blob/master/publish_part1.md

All the graphics and code are available at my GitHub site - https://github.com/RationShop/sheldus

Any help or comments appreciated.

Moving to Blogger from Tumblr!

Welcome! I used to blog on Tumblr (http://gopigoteti.tumblr.com/). But from now on I am going to blog here!