The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

isdparser is an parser for ISD/ISD NOAA files

Code liberated from rnoaa to focus on ISD parsing since it’s sorta complicated. Has minimal dependencies, so you can parse your ISD/ISH files without needing the deps that rnoaa needs. Will be used by rnoaa once on CRAN.

Documentation at ftp://ftp.ncdc.noaa.gov/pub/data/noaa/ish-format-document.pdf

Package API:

  • isd_parse() - parse all lines in a file, with parallel option
  • isd_parse_line() - parse a single line - you choose which lines to parse and how to apply the function to your lines
  • isd_transform() - transform ISD data variables
  • isd_parse_csv() - parse csv format files

isd_parse_csv() parses NOAA ISD csv files, whereas isd_parse() and isd_parse_line() both handle compressed files where each row of data is a string that needs to be parsed.

isd_parse_csv() is faster than isd_parse() because parsing each line takes some time - although using isd_parse(parallel = TRUE) option gets closer to the speed of isd_parse_csv().

Install

Stable from CRAN

install.packages("isdparser")

Dev version

remotes::install_github("ropensci/isdparser")
library("isdparser")

isd_parse_csv: parse a CSV file

Using a csv file included in the package:

path <- system.file('extdata/00702699999.csv', package = "isdparser")
isd_parse_csv(path)
#> # A tibble: 6,843 x 68
#>    station date                source latitude longitude elevation name 
#>      <int> <dttm>               <int>    <dbl>     <dbl>     <dbl> <chr>
#>  1  7.03e8 2017-02-10 14:04:00      4        0         0      7026 WXPO…
#>  2  7.03e8 2017-02-10 14:14:00      4        0         0      7026 WXPO…
#>  3  7.03e8 2017-02-10 14:19:00      4        0         0      7026 WXPO…
#>  4  7.03e8 2017-02-10 14:24:00      4        0         0      7026 WXPO…
#>  5  7.03e8 2017-02-10 14:29:00      4        0         0      7026 WXPO…
#>  6  7.03e8 2017-02-10 14:34:00      4        0         0      7026 WXPO…
#>  7  7.03e8 2017-02-10 14:39:00      4        0         0      7026 WXPO…
#>  8  7.03e8 2017-02-10 14:44:00      4        0         0      7026 WXPO…
#>  9  7.03e8 2017-02-10 14:49:00      4        0         0      7026 WXPO…
#> 10  7.03e8 2017-02-10 14:54:00      4        0         0      7026 WXPO…
#> # … with 6,833 more rows, and 61 more variables: report_type <chr>,
#> #   call_sign <int>, quality_control <chr>, wnd <chr>, cig <chr>, vis <chr>,
#> #   tmp <chr>, dew <chr>, slp <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, automated_atmospheric_condition_code <chr>,
#> #   quality_automated_atmospheric_condition_code <chr>, coverage_code <chr>,
#> #   coverage_quality_code <chr>, base_height_dimension <chr>,
#> #   base_height_quality_code <chr>, cloud_type_code <chr>,
#> #   cloud_type_quality_code <chr>, connective_cloud_attribute <chr>,
#> #   vertical_datum_attribute <chr>, base_height_upper_range_attribute <chr>,
#> #   base_height_lower_range_attribute <chr>, coverage <chr>,
#> #   opaque_coverage <chr>, coverage_quality <chr>, lowest_cover <chr>,
#> #   lowest_cover_quality <chr>, low_cloud_genus <chr>,
#> #   low_cloud_genus_quality <chr>, lowest_cloud_base_height <chr>,
#> #   lowest_cloud_base_height_quality <chr>, mid_cloud_genus <chr>,
#> #   mid_cloud_genus_quality <chr>, high_cloud_genus <chr>,
#> #   high_cloud_genus_quality <chr>, altimeter_setting_rate <chr>,
#> #   altimeter_quality_code <chr>, station_pressure_rate <chr>,
#> #   station_pressure_quality_code <chr>, speed_rate <chr>, quality_code <chr>,
#> #   rem <chr>, eqd <chr>

Download a file first:

path <- file.path(tempdir(), "00702699999.csv")
x <- "https://www.ncei.noaa.gov/data/global-hourly/access/2017/00702699999.csv"
download.file(x, path)
isd_parse_csv(path)
#> # A tibble: 6,843 x 68
#>    station date                source latitude longitude elevation name 
#>      <int> <dttm>               <int>    <dbl>     <dbl>     <dbl> <chr>
#>  1  7.03e8 2017-02-10 14:04:00      4        0         0      7026 WXPO…
#>  2  7.03e8 2017-02-10 14:14:00      4        0         0      7026 WXPO…
#>  3  7.03e8 2017-02-10 14:19:00      4        0         0      7026 WXPO…
#>  4  7.03e8 2017-02-10 14:24:00      4        0         0      7026 WXPO…
#>  5  7.03e8 2017-02-10 14:29:00      4        0         0      7026 WXPO…
#>  6  7.03e8 2017-02-10 14:34:00      4        0         0      7026 WXPO…
#>  7  7.03e8 2017-02-10 14:39:00      4        0         0      7026 WXPO…
#>  8  7.03e8 2017-02-10 14:44:00      4        0         0      7026 WXPO…
#>  9  7.03e8 2017-02-10 14:49:00      4        0         0      7026 WXPO…
#> 10  7.03e8 2017-02-10 14:54:00      4        0         0      7026 WXPO…
#> # … with 6,833 more rows, and 61 more variables: report_type <chr>,
#> #   call_sign <int>, quality_control <chr>, wnd <chr>, cig <chr>, vis <chr>,
#> #   tmp <chr>, dew <chr>, slp <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, automated_atmospheric_condition_code <chr>,
#> #   quality_automated_atmospheric_condition_code <chr>, coverage_code <chr>,
#> #   coverage_quality_code <chr>, base_height_dimension <chr>,
#> #   base_height_quality_code <chr>, cloud_type_code <chr>,
#> #   cloud_type_quality_code <chr>, connective_cloud_attribute <chr>,
#> #   vertical_datum_attribute <chr>, base_height_upper_range_attribute <chr>,
#> #   base_height_lower_range_attribute <chr>, coverage <chr>,
#> #   opaque_coverage <chr>, coverage_quality <chr>, lowest_cover <chr>,
#> #   lowest_cover_quality <chr>, low_cloud_genus <chr>,
#> #   low_cloud_genus_quality <chr>, lowest_cloud_base_height <chr>,
#> #   lowest_cloud_base_height_quality <chr>, mid_cloud_genus <chr>,
#> #   mid_cloud_genus_quality <chr>, high_cloud_genus <chr>,
#> #   high_cloud_genus_quality <chr>, altimeter_setting_rate <chr>,
#> #   altimeter_quality_code <chr>, station_pressure_rate <chr>,
#> #   station_pressure_quality_code <chr>, speed_rate <chr>, quality_code <chr>,
#> #   rem <chr>, eqd <chr>

isd_parse_line: parse lines from an ASCII strings file

path <- system.file('extdata/024130-99999-2016.gz', package = "isdparser")
lns <- readLines(path, encoding = "latin1")
isd_parse_line(lns[1])
#> # A tibble: 1 x 38
#>   total_chars usaf_station wban_station date  time  date_flag latitude longitude
#>   <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>    <chr>    
#> 1 0054        024130       99999        2016… 0000  4         +60750   +012767  
#> # … with 30 more variables: type_code <chr>, elevation <chr>,
#> #   call_letter <chr>, quality <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>,
#> #   AW1_present_weather_observation_identifier <chr>,
#> #   AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>, REM_remarks <chr>,
#> #   REM_identifier <chr>, REM_length_quantity <chr>, REM_comment <chr>

Or, give back a list

head(
  isd_parse_line(lns[1], as_data_frame = FALSE)
)
#> $total_chars
#> [1] "0054"
#> 
#> $usaf_station
#> [1] "024130"
#> 
#> $wban_station
#> [1] "99999"
#> 
#> $date
#> [1] "20160101"
#> 
#> $time
#> [1] "0000"
#> 
#> $date_flag
#> [1] "4"

Optionally don’t include “Additional” and “Remarks” sections in parsed output.

isd_parse_line(lns[1], additional = FALSE)
#> # A tibble: 1 x 31
#>   total_chars usaf_station wban_station date  time  date_flag latitude longitude
#>   <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>    <chr>    
#> 1 0054        024130       99999        2016… 0000  4         +60750   +012767  
#> # … with 23 more variables: type_code <chr>, elevation <chr>,
#> #   call_letter <chr>, quality <chr>, wind_direction <chr>,
#> #   wind_direction_quality <chr>, wind_code <chr>, wind_speed <chr>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>

isd_parse: parse an ASCII strings file

Downloading a new file

path <- file.path(tempdir(), "007026-99999-2017.gz")
y <- "ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2017/007026-99999-2017.gz"
download.file(y, path)
isd_parse(path)
#> # A tibble: 6,843 x 72
#>    total_chars usaf_station wban_station date  time  date_flag latitude
#>    <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>   
#>  1 0157        007026       99999        2017… 1404  4         +00000  
#>  2 0157        007026       99999        2017… 1414  4         +00000  
#>  3 0157        007026       99999        2017… 1419  4         +00000  
#>  4 0157        007026       99999        2017… 1424  4         +00000  
#>  5 0157        007026       99999        2017… 1429  4         +00000  
#>  6 0144        007026       99999        2017… 1434  4         +00000  
#>  7 0157        007026       99999        2017… 1439  4         +00000  
#>  8 0157        007026       99999        2017… 1444  4         +00000  
#>  9 0172        007026       99999        2017… 1449  4         +00000  
#> 10 0157        007026       99999        2017… 1454  4         +00000  
#> # … with 6,833 more rows, and 65 more variables: longitude <chr>,
#> #   type_code <chr>, elevation <chr>, call_letter <chr>, quality <chr>,
#> #   wind_direction <chr>, wind_direction_quality <chr>, wind_code <chr>,
#> #   wind_speed <chr>, wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>, GF1_sky_condition <chr>, GF1_coverage <chr>,
#> #   GF1_opaque_coverage <chr>, GF1_coverage_quality <chr>,
#> #   GF1_lowest_cover <chr>, GF1_lowest_cover_quality <chr>,
#> #   GF1_low_cloud_genus <chr>, GF1_low_cloud_genus_quality <chr>,
#> #   GF1_lowest_cloud_base_height <chr>,
#> #   GF1_lowest_cloud_base_height_quality <chr>, GF1_mid_cloud_genus <chr>,
#> #   GF1_mid_cloud_genus_quality <chr>, GF1_high_cloud_genus <chr>,
#> #   GF1_high_cloud_genus_quality <chr>, MA1_atmospheric_pressure <chr>,
#> #   MA1_altimeter_setting_rate <chr>, MA1_altimeter_quality_code <chr>,
#> #   MA1_station_pressure_rate <chr>, MA1_station_pressure_quality_code <chr>,
#> #   REM_remarks <chr>, REM_identifier <chr>, REM_length_quantity <chr>,
#> #   REM_comment <chr>, OC1_wind_gust_observation_identifier <chr>,
#> #   OC1_speed_rate <chr>, OC1_quality_code <chr>,
#> #   GA1_sky_cover_layer_identifier <chr>, GA1_coverage_code <chr>,
#> #   GA1_coverage_quality_code <chr>, GA1_base_height_dimension <chr>,
#> #   GA1_base_height_quality_code <chr>, GA1_cloud_type_code <chr>,
#> #   GA1_cloud_type_quality_code <chr>, GE1_sky_condition <chr>,
#> #   GE1_connective_cloud_attribute <chr>, GE1_vertical_datum_attribute <chr>,
#> #   GE1_base_height_upper_range_attribute <chr>,
#> #   GE1_base_height_lower_range_attribute <chr>,
#> #   AW1_present_weather_observation_identifier <chr>,
#> #   AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>

Parallel

isd_parse(path, parallel = TRUE)

Progress

note: Progress not printed if parallel = TRUE

isd_parse(path, progress = TRUE)
#>
#>   |========================================================================================| 100%
#> # A tibble: 2,601 × 42
#>    total_chars usaf_station wban_station       date  time date_flag latitude longitude type_code
#>          <dbl>        <chr>        <chr>     <date> <chr>     <chr>    <dbl>     <dbl>     <chr>
#> 1           54       024130        99999 2016-01-01  0000         4    60.75    12.767     FM-12
#> 2           54       024130        99999 2016-01-01  0100         4    60.75    12.767     FM-12
#> 3           54       024130        99999 2016-01-01  0200         4    60.75    12.767     FM-12
#> 4           54       024130        99999 2016-01-01  0300         4    60.75    12.767     FM-12
#> 5           54       024130        99999 2016-01-01  0400         4    60.75    12.767     FM-12
#> 6           39       024130        99999 2016-01-01  0500         4    60.75    12.767     FM-12
#> 7           54       024130        99999 2016-01-01  0600         4    60.75    12.767     FM-12
#> 8           39       024130        99999 2016-01-01  0700         4    60.75    12.767     FM-12
#> 9           54       024130        99999 2016-01-01  0800         4    60.75    12.767     FM-12
#> 10          54       024130        99999 2016-01-01  0900         4    60.75    12.767     FM-12
#> # ... with 2,591 more rows, and 33 more variables: elevation <dbl>, call_letter <chr>, quality <chr>,
#> #   wind_direction <dbl>, wind_direction_quality <chr>, wind_code <chr>, wind_speed <dbl>,
#> #   wind_speed_quality <chr>, ceiling_height <chr>, ceiling_height_quality <chr>,
#> #   ceiling_height_determination <chr>, ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>, visibility_code_quality <chr>,
#> #   temperature <dbl>, temperature_quality <chr>, temperature_dewpoint <dbl>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <dbl>, air_pressure_quality <chr>,
#> #   AW1_present_weather_observation_identifier <chr>, AW1_automated_atmospheric_condition_code <chr>,
#> #   AW1_quality_automated_atmospheric_condition_code <chr>, N03_original_observation <chr>,
#> #   N03_original_value_text <chr>, N03_units_code <chr>, N03_parameter_code <chr>, REM_remarks <chr>,
#> #   REM_identifier <chr>, REM_length_quantity <chr>, REM_comment <chr>

Additional data

Optionally don’t include “Additional” and “Remarks” sections in parsed output.

isd_parse(path, additional = FALSE)
#> # A tibble: 6,843 x 31
#>    total_chars usaf_station wban_station date  time  date_flag latitude
#>    <chr>       <chr>        <chr>        <chr> <chr> <chr>     <chr>   
#>  1 0157        007026       99999        2017… 1404  4         +00000  
#>  2 0157        007026       99999        2017… 1414  4         +00000  
#>  3 0157        007026       99999        2017… 1419  4         +00000  
#>  4 0157        007026       99999        2017… 1424  4         +00000  
#>  5 0157        007026       99999        2017… 1429  4         +00000  
#>  6 0144        007026       99999        2017… 1434  4         +00000  
#>  7 0157        007026       99999        2017… 1439  4         +00000  
#>  8 0157        007026       99999        2017… 1444  4         +00000  
#>  9 0172        007026       99999        2017… 1449  4         +00000  
#> 10 0157        007026       99999        2017… 1454  4         +00000  
#> # … with 6,833 more rows, and 24 more variables: longitude <chr>,
#> #   type_code <chr>, elevation <chr>, call_letter <chr>, quality <chr>,
#> #   wind_direction <chr>, wind_direction_quality <chr>, wind_code <chr>,
#> #   wind_speed <chr>, wind_speed_quality <chr>, ceiling_height <chr>,
#> #   ceiling_height_quality <chr>, ceiling_height_determination <chr>,
#> #   ceiling_height_cavok <chr>, visibility_distance <chr>,
#> #   visibility_distance_quality <chr>, visibility_code <chr>,
#> #   visibility_code_quality <chr>, temperature <chr>,
#> #   temperature_quality <chr>, temperature_dewpoint <chr>,
#> #   temperature_dewpoint_quality <chr>, air_pressure <chr>,
#> #   air_pressure_quality <chr>

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.