Tuesday, November 26, 2013

New R package raincpc: Obtain and Analyze Global Rainfall data from the Climate Prediction Center (CPC)

The Climate Prediction Center's (CPC) daily rainfall data for the entire world, 1979 - present & 50-km resolution, is one of the few high quality and long term observation-based rainfall products. Data is available at CPC's ftp site. However, it is a lot of data and there is no software to analyze and visualize the data.

Some issues with size/format of the CPC data:

  • too many files (365/366 files per year * 34 years, separate folder for each year)
  • each file has 360 rows and 720 columns
  • file naming conventions have changed over time - one format prior to 2006 and couple of different formats afterwards
  • file formats have changed over time - gzipped files prior to 2008 and plain binary files afterwards
  • downloading multiple files simultaneously from the CPC ftp site, using wget, does not seem to work properly
  • there is no software/code readily available to easily process/visualize the data

The R package `raincpc` makes life easier by providing functionality to download and process the data from CPC's ftp site. Some features of this new package are:

  • Data for anytime period during 1979-present can be downloaded and processed
  • Just two functions required: one to download the data (`cpc_get_rawdata`) and another to process the downloaded data (`cpc_read_rawdata`)
  • Making spatial maps using the processed data is easy, via ggplot

Here are some examples on how to obtain and visualize the data - https://github.com/RationShop/raincpc

Below are the relevant CRAN and GitHub sites:
Please let me know if you find any errors or if you have any comments or suggestions.

Friday, November 22, 2013

New R package emdatr: Global Disaster Losses from the EM-DAT Database

The International Disaster Database EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.

Some issues with EM-DAT data and reports:
  • Country names used by EMDAT are not always the same as those used by ISO 3166 convention. This issue is relevant when making spatial maps using R. 
  • Information such as GDP and population from the year of occurrence of the disaster have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information. 
  • Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed. For instance, number of diasters in 2002 were reported to be 428 in the ADSR 2012 report. But the same number in the 2011, 2010, 2009 and 2008 reports is 421, 421, 422 and 421, respectively!
R package emdatr:
  • comes with pre-processed and cleaned EMDAT data
  • includes above-mentioned additional country-level information
  • has functionality to extract desired subsets of the data
  • through the graphics and modeling capabilities provided by R, much more can be accomplished through this R package and the R language, than conventional spreadsheet analyses
  • by making the analysis transparent, the problems with the presentation of summary statistics could be addressed
The home page for the package presents some examples and graphics. Please see - https://github.com/RationShop/emdatr

Below are the relevant CRAN and GitHub sites:
Please let me know if you find any errors or if you have any comments or suggestions.

Monday, November 18, 2013

Towards the R package emdat: Losses from Global Disasters, Part 1

The International Disaster Database, EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.

Some Cons:

  • Some cleaning of the data is required. For instance, the country names used by EMDAT are not always the same as those used by ISO. Hence, making spatial maps in R would involving fixing these names. 
  • Information such as GDP, population and Consumer Price Index have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information. 
  • Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed.

My goal is to create an R package which comes with pre-processed and cleaned EMDAT data and which would also include above-mentioned additional country-level information. Moreover, this R package would have the functionality to extract and analyze the data.

Here is a preliminary analysis of the data. Please see - https://github.com/RationShop/emdat/blob/master/publish_part1.md

All code and graphics are at my GitHub site - https://github.com/RationShop/emdat

Sunday, November 17, 2013

New R package sheldusr: Losses from Natural Disasters in the United States

The SHELDUS database is database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. The new R package sheldusr comes with the cleaned and pre-processed SHELDUS data and includes functionality to retrieve data for any desired time period or natural hazard(s) of interest.

The home page for the package presents some examples and graphics. Please see - https://github.com/RationShop/sheldusr

Below are the relevant CRAN and GitHub sites:
This is my first package and I think it could be improved in several ways. I hope to make these improvements in the coming months. In the meantime, please let me know if you find any errors or if you have any comments or suggestions.

Saturday, November 9, 2013

Towards the R package sheldus: Part 2: Losses from Natural Disasters in the US

In my earlier post I summarized the work on my upcoming R package on the SHELDUS database. This is a database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. My goal is to package the data efficiently so it could be accessed and analyzed easily through R. And some day, develop one package for loss information analysis from various databases (e.g., SHELDUS, EM-DAT, and FEMA's Flood Insurance Program).

Status
  • At this point the code has all the data retrieval functionality available through the GUI from SHELDUS - only in a matter of seconds!
    • retrieval by year
    • retrieval by hazard type
    • adjustment for inflation
  • [TODO] Presidential Disaster Declarations data and also data prior to 1960 needs to be included (these amount to about 40,000 records of the total 820,000 records)
In this second post, I will be extract the entire SHELDUS data and create some interesting graphics. Please see - https://github.com/RationShop/sheldus/blob/master/publish_part2.md

All the graphics and code are available at my GitHub site - https://github.com/RationShop/sheldus

Any help or comments appreciated.

Thursday, November 7, 2013

Towards the R package sheldus, Part 1: Natural Disaster Losses in the US in 2012

The SHELDUS database, short for Spatial Hazard Events and Losses Database in the United States (http://webra.cas.sc.edu/hvri/products/sheldus.aspx), from the University of South Carolina, is a  database on human and property losses from natural disasters in the United States. Data from this database includes County-level information on property losses, crop losses, injuries and fatalities from 18 different types of natural hazards (hurricanes, droughts, floods, etc.) from about 1960 to the present.

Pros
  • Data is free
  • Inflation adjusted losses are also available
Cons
  • Downloading the approximately 200 MB data (as of Nov 2013) from the GUI is tedious. There does not seem to be an easy way to download the entire data all at once. 
  • Currently only one reference year could be chosen for inflation adjustment. What if someone wanted multiple years or wanted to update their data next year - they would have to download the entire data set again through the clunky GUI!
  • Sharing the data and analysis of the entire data is not easy because of its size and layout.
Goal
  • Build an R package which would come with the entire SHELDUS data. 
  • Create functions to display and analyze the data.  
  • [future maybe] Combine this with other disaster damage info (e.g., FEMA's NFIP -https://github.com/RationShop/nfip).
Status/TODOs
  • Most of the code for cleaning and formatting the data, IO and graphics is ready.
  • Instead of plain text, I use the binary format (and a few tricks) reducing the data size to 30 MB (from ~ 200 MB!).
  • [TODO] Ability to retrieve data for multiple years/perils at once.
  • [TODO] Code for inflation adjustment.
  • [TODO] Presidential Disaster Declarations data and data prior to 1960 needs to be included.
I will be revising the code towards building a complete R package. Along my way I will be doing several QA/QC checks and will be posting on my analyses.
In this first post, I will be looking at losses in 2012. Please see - https://github.com/RationShop/sheldus/blob/master/publish_part1.md

All the graphics and code are available at my GitHub site - https://github.com/RationShop/sheldus

Any help or comments appreciated.

Moving to Blogger from Tumblr!

Welcome! I used to blog on Tumblr (http://gopigoteti.tumblr.com/). But from now on I am going to blog here!