Thursday, November 7, 2013

Towards the R package sheldus, Part 1: Natural Disaster Losses in the US in 2012

The SHELDUS database, short for Spatial Hazard Events and Losses Database in the United States (http://webra.cas.sc.edu/hvri/products/sheldus.aspx), from the University of South Carolina, is a  database on human and property losses from natural disasters in the United States. Data from this database includes County-level information on property losses, crop losses, injuries and fatalities from 18 different types of natural hazards (hurricanes, droughts, floods, etc.) from about 1960 to the present.

Pros
  • Data is free
  • Inflation adjusted losses are also available
Cons
  • Downloading the approximately 200 MB data (as of Nov 2013) from the GUI is tedious. There does not seem to be an easy way to download the entire data all at once. 
  • Currently only one reference year could be chosen for inflation adjustment. What if someone wanted multiple years or wanted to update their data next year - they would have to download the entire data set again through the clunky GUI!
  • Sharing the data and analysis of the entire data is not easy because of its size and layout.
Goal
  • Build an R package which would come with the entire SHELDUS data. 
  • Create functions to display and analyze the data.  
  • [future maybe] Combine this with other disaster damage info (e.g., FEMA's NFIP -https://github.com/RationShop/nfip).
Status/TODOs
  • Most of the code for cleaning and formatting the data, IO and graphics is ready.
  • Instead of plain text, I use the binary format (and a few tricks) reducing the data size to 30 MB (from ~ 200 MB!).
  • [TODO] Ability to retrieve data for multiple years/perils at once.
  • [TODO] Code for inflation adjustment.
  • [TODO] Presidential Disaster Declarations data and data prior to 1960 needs to be included.
I will be revising the code towards building a complete R package. Along my way I will be doing several QA/QC checks and will be posting on my analyses.
In this first post, I will be looking at losses in 2012. Please see - https://github.com/RationShop/sheldus/blob/master/publish_part1.md

All the graphics and code are available at my GitHub site - https://github.com/RationShop/sheldus

Any help or comments appreciated.

2 comments:

Unknown said...

Interesting citation and proposal.

No doubt, there is enough for you to do without taking on still more work. But it would be interesting to consider a broader range of energy-related events:
(1) pipelines (crude oil and natural gas transmission and distribution events)
(2) electrical system events (transmission and distribution)
(3) nuclear.

Although nuclear accidents are rare, Fukushima highlights the importance of considering how tsunamis and earthquakes can propagate into radiological and reactor accidents. Hence, this is less about data and more about how to employ cat data to infer risk.

I'm interested in this effort and will be following it. If I can be of any assistance, please reach out

Gopi Goteti said...

Thanks Stephen! Are you aware of any database or other source of data for such energy-related events. If so, please let me know. The biggest problem seems to be getting reliable and consistent data.