The International Disaster Database, EMDAT databasefrom the Center for Research on the Epidemiology of Disasters (CRED,
Belgium) is often used as a reference for losses on human life and
property resulting from select natural and man-made disasters. The goal of the emdatr package is to improve the EMDAT database by promoting
its use, shedding light on its limitations, and making analysis of the
data easier using R Major updates in emdatr v0.3:
Latest data from EMDAT has been included, additional data on loss 'normalization' also updated when available.
Vignette has been changed from a PDF to a markdown document.
The Climate Prediction Center's (CPC) global rainfall data, 1979 - present, 50 km resolution, is one of the few high-quality, long-term, observation-based, daily rainfall products available for free. Although raw data is available at CPC's ftp site, obtaining and processing the data is not easy since there are over 12000 files, and formats and names of these files have changed over time.
The latest version of the raincpc package provides functionality to download, process and visualize over 35 years of global daily rainfall data from CPC. The vignette demonstrates the use of this package, including the extraction and display of regional rainfall data.
Damage Functions (DFs) translate physical damage to property, resulting from natural disasters, to ﬁnancial damage. FEMA in USA developed several thousand DFs and these serve as a benchmark in natural catastrophe modeling, both in academia and industry. However, these DFs and their documentation are buried within FEMA's HAZUS software and are not easily accessible for analysis and visualization.
The hazus package provides more than 1300 raw DFs used by FEMA's HAZUS software and also functionality to extract and visualize DFs speciﬁc to the ﬂood hazard.
Rainfall estimates at desired frequency (e.g., 1% annual chance or 100-year return period) and duration (e.g., 24-hour) are often required in the design of dams and other hydraulic structures, catastrophe risk modeling, environmental planning and management. One major source of such estimates for the USA is the NOAA National Weather Service. Raw data is available at 1-km resolution and comes as a huge number of GIS files. The new R package rainfreq provides functionality to easily access and analyze the 1-km GIS files provided by NWS' PF Data Server for the entire USA. This package also comes with datasets on record point rainfall measurements provided by NWS.
The EMDAT database provides valuable information on human and financial losses from natural disasters around the world. Some of the issues with the EMDAT data are lack of entire data accessibility, static and inconsistent summary reports, and the lack of auxiliary financial and demographic data. The emdatr package addresses some of these issues. Major updates in emdatr v0.2:
Data has been updated to include the whole of 2013.
Data is now hosted on bitbucket.org and only a sample is provided with the package. Package has the functionality to extract the entire data.
A new vignette which explains the raw data clean-up and enhancement procedure and which also demonstrates use of the package.
Here is the emdatr package home page on CRAN. Below is a summary graphic on number of natural disasters by decade obtained using the package.
The dams package provides functionality to access over 74,000 dams in the National Inventory of Dams (NID) from the US Army Corps of Engineers, the single largest source of dams in the United States. Each dam has 64 attributes such as geographical, structural, hydraulic and operational characteristics.
Obtaining data directly from NID has to be done manually and the website's GUI is not user-friendly - only a couple of thousand records could be displayed at a time on the GUI, but there is no option to save these records to a file. Data was obtained manually from NID's website and then cleaned up. The dams package comes with a sample of the cleaned data and the `extract_nid` function from the package could be used to obtain all of the cleaned data.
There is no database containing information on all the dams in the United States. The single largest source is the National Inventory of Dams (NID) from the US Army Corps of Engineers which claims to have more than 80,000 dams. I downloaded the entire data from NID and also cleaned it up.
I am in the process of creating an R package for this dataset and will shortly have a post on it.
The goal of this open source R-based analysis, as mentioned earlier (first post, second post) is to bring consistency and transparency to the analyses of publicly available Tornado data.
The latest addition to the project is the analysis of local tornado occurrence probability. Below graphics show the average number of tornadoes per year within the United States since 1980. The average number appears to increase with the addition of the recent data.
The ongoing drought in California and other parts of Southwestern United States has been reported extensively by newspapers and government sites.
Although rainfall deficit is technically meteorological drought, and drought could be of several other types (such as hydrological, agricultural, etc.), the attempt here is to demonstrate the use of R in the analysis of high resolution rainfall data. Using 4-km rainfall data from the PRISM Climate Group for 1895-2013, the total for 2013 is compared with the long-term and near-term historical averages.
Rainfall data used by researchers in academia and industry does not always come in the same format. Data is often in atypical formats and in extremely large number of files and there is not always guidance on how to obtain, process and visualize the data. This project attempts to resolve this issue by serving as a hub for the processing of such publicly available rainfall data using R.
The goal of this project is to reformat rainfall data from their native format to a consistent format, suitable for use in data analysis. Within this project site, each dataset is intended to have its own wiki. Eventually, an R package would be developed for each data source.
Currently R code is available to process data from three sources - Climate Prediction Center (global coverage), US Historical Climatology Network (USA coverage) and APHRODITE (Asia/Eurasia and Middle East).
The project home page is here - http://rationshop.github.io/rain_r/
If you are aware of other sources and would like to add them to this list (and/or would like to add the R code) please let me know. Any other comments or help appreciated.
The latest analysis is on the reproduction of stats from literature studies. I was able to reproduce the stats, approximately if not exactly, from several studies. The only issue was with the paper by Simmons et al. I am pretty sure there is something wrong with the numbers published in Simmons et al. Here is the link to the specific analysis - https://github.com/RationShop/tornado_r/wiki/Counting-Tornadoes
Any help or comments or contributions appreciated.
Bring consistency and transparency to the analyses of publicly available Tornado datasets using an R-based open source analysis.
Tornado data available from the NOAA Storm Prediction Center and other government agencies around the world has been the focus of many studies (e.g., see below references). However, there are a number of issues with these studies:
The data itself is changing - both in quantity (additional data added every season) and quality (quality control measures appear to have been applied in the recent past, particularly to the data prior to the 1990s). Hence, it is not possible to exactly, or sometimes even approximately, reproduce the results of these studies.
Since the data size is annually changing, it makes more sense to have the analyses revised annually as well.
With the exception of a few (thanks to R user Prof. James Elsner and colleagues), none of the studies provide the code used in their analysis. Moreover, one of the recent studies (Simmons et al 2013) appears to have several errors.
Data prior to 1950 in the United States appears to be available only on microfilm. Hopefully, through or due to this effort, some day this data becomes more widely available.
Create an R package
The package would come with raw and cleaned Tornado data
Functionality provided by the R package would include:
reproduction/replication of summary statistics presented by literature studies.
adjustment of historical monetary losses for inflation and other factors based on literature
creation of stochastic and probabilistic models of Tornado hazard, based on literature.
Extend the above to Tornado data from other parts of the world.
Simmons, Sutter & Pielke, 2013, "Normalized tornado damage in the United States: 1950 - 2011", Environmental Hazards, 12(2), pp. 132-147.
Elsner, Murnane, Jagger & Widen, 2013, "A spatial point process model for violent tornado occurrence in the U.S. Great Plains", Mathematical Geosciences, 45(6), pp. 667-679. Code available at - http://rpubs.com/jelsner/4205
Verbout, Brooks, Leslie, & Schultz, 2006, "Evolution of the U.S. Tornado Database: 1954-2003", Weather and Forecasting, pp. 86-93.
Boruff, Easoz, Jones, Landry, Mitchem & Cutter, 2003, "Tornado hazards in the United States", Climate Research, 24, pp. 103-117.
Brooks & Doswell, 2001, "Some aspects of the international climatology of tornadoes by damage classification", Atmospheric Research, 56, pp. 191-201.
Grazulis, 1993, "A 110-Year Perspective of Significant Tornadoes", The Tornado: Its Structure, Dynamics, Prediction, and Hazards, Geophysical Monograph 79, pp. 467-474.
The Climate Prediction Center's (CPC) daily rainfall data for the entire world, 1979 - present & 50-km resolution, is one of the few high quality and long term observation-based rainfall products. Data is available at CPC's ftp site. However, it is a lot of data and there is no software to analyze and visualize the data.
Some issues with size/format of the CPC data:
too many files (365/366 files per year * 34 years, separate folder for each year)
each file has 360 rows and 720 columns
file naming conventions have changed over time - one format prior to 2006 and couple of different formats afterwards
file formats have changed over time - gzipped files prior to 2008 and plain binary files afterwards
downloading multiple files simultaneously from the CPC ftp site, using wget, does not seem to work properly
there is no software/code readily available to easily process/visualize the data
The R package `raincpc` makes life easier by providing functionality to download and process the data from CPC's ftp site. Some features of this new package are:
Data for anytime period during 1979-present can be downloaded and processed
Just two functions required: one to download the data (`cpc_get_rawdata`) and another to process the downloaded data (`cpc_read_rawdata`)
Making spatial maps using the processed data is easy, via ggplot
The International Disaster Database EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.
Some issues with EM-DAT data and reports:
Country names used by EMDAT are not always the same as those used by ISO 3166 convention. This issue is relevant when making spatial maps using R.
Information such as GDP and population from the year of occurrence of the disaster have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information.
Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed. For instance, number of diasters in 2002 were reported to be 428 in the ADSR 2012 report. But the same number in the 2011, 2010, 2009 and 2008 reports is 421, 421, 422 and 421, respectively!
R package emdatr:
comes with pre-processed and cleaned EMDAT data
includes above-mentioned additional country-level information
has functionality to extract desired subsets of the data
through the graphics and modeling capabilities provided by R, much more can be accomplished through this R package and the R language, than conventional spreadsheet analyses
by making the analysis transparent, the problems with the presentation of summary statistics could be addressed
The International Disaster Database, EM-DAT from the Center for Research on the Epidemiology of Disasters (CRED, Belgium) is often used as a reference for losses on human life and property resulting from natural and man-made disasters. This database has over 20,000 country-level records from the early 1900s to the present. Data is available for free from EMDAT.
Some cleaning of the data is required. For instance, the country names used by EMDAT are not always the same as those used by ISO. Hence, making spatial maps in R would involving fixing these names.
Information such as GDP, population and Consumer Price Index have to be used to "normalize" or adjust monetary losses from the past. The EMDAT database does not provide such information.
Annual reports published by EMDAT are not consistent with one another in terms of number of disasters per year or the total number of people affected/killed.
My goal is to create an R package which comes with pre-processed and cleaned EMDAT data and which would also include above-mentioned additional country-level information. Moreover, this R package would have the functionality to extract and analyze the data.
The SHELDUS database is database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. The new R package sheldusr comes with the cleaned and pre-processed SHELDUS data and includes functionality to retrieve data for any desired time period or natural hazard(s) of interest.
This is my first package and I think it could be improved in several ways. I hope to make these improvements in the coming months. In the meantime, please let me know if you find any errors or if you have any comments or suggestions.
In my earlier post I summarized the work on my upcoming R package on the SHELDUS database. This is a database on human and property losses from natural disasters in the United States. Although the data is free, downloading the data is tedious and so is cleaning and analyzing it. My goal is to package the data efficiently so it could be accessed and analyzed easily through R. And some day, develop one package for loss information analysis from various databases (e.g., SHELDUS, EM-DAT, and FEMA's Flood Insurance Program).
At this point the code has all the data retrieval functionality available through the GUI from SHELDUS - only in a matter of seconds!
retrieval by year
retrieval by hazard type
adjustment for inflation
[TODO] Presidential Disaster Declarations data and also data prior to 1960 needs to be included (these amount to about 40,000 records of the total 820,000 records)
The SHELDUS database, short for Spatial Hazard Events and Losses Database in the United States (http://webra.cas.sc.edu/hvri/products/sheldus.aspx), from the University of South Carolina, is a database on human and property losses from natural disasters in the United States. Data from this database includes County-level information on property losses, crop losses, injuries and fatalities from 18 different types of natural hazards (hurricanes, droughts, floods, etc.) from about 1960 to the present.
Data is free
Inflation adjusted losses are also available
Downloading the approximately 200 MB data (as of Nov 2013) from the GUI is tedious. There does not seem to be an easy way to download the entire data all at once.
Currently only one reference year could be chosen for inflation adjustment. What if someone wanted multiple years or wanted to update their data next year - they would have to download the entire data set again through the clunky GUI!
Sharing the data and analysis of the entire data is not easy because of its size and layout.
Build an R package which would come with the entire SHELDUS data.