Chapter 8 Accessing meteorological data
Objectives:
- This tutorial will walk through the steps required to access meteorological data from the Maricopa Agricultural Center.
Pre-requisites:
- Need to have R packages tidyverse, jsonlite, and convertr installed.
- Need to have an internet connection.
8.1 The Maricopa Weather Station
8.1.1 Meteorological data formats
8.1.1.1 Dimensions:
CF standard-name | units |
---|---|
time | days since 1970-01-01 00:00:00 UTC |
longitude | degrees_east |
latitude | degrees_north |
8.1.1.2 Variable names and units
CF standard-name | units | bety | isimip | cruncep | narr | ameriflux |
---|---|---|---|---|---|---|
air_temperature | K | airT | tasAdjust | tair | air | TA (C) |
air_pressure | Pa | air_pressure | PRESS (KPa) | |||
mole_fraction_of_carbon_dioxide_in_air | mol/mol | CO2 | ||||
relative_humidity | % | relative_humidity | rhurs | NA | rhum | RH |
surface_downwelling_photosynthetic_photon_flux_in_air | mol m-2 s-1 | PAR | PAR (NOT DONE) | |||
precipitation_flux | kg m-2 s-1 | cccc | prAdjust | rain | acpc | PREC (mm/s) |
degrees | wind_direction | WD | ||||
wind_speed | m/s | Wspd | WS |
- variable names are from MsTMIP.
- standard_name is CF-convention standard names
- units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)
- soil moisture for the full column, rather than a layer, is soil_moisture_content
For example, in the MsTMIP-CRUNCEP data, the variable rain
should be precipitation_rate
.
We want to standardize the units as well as part of the met2CF.<product>
step. I believe we want to use the CF “canonical” units but retain the MsTMIP units any time CF is ambiguous about the units.
The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function.
8.1.2 Using the API to get data
In order to access the data, we need to contruct a URL that links to where the data is located on Clowder. The data is then pulled down using the API, which “receives requests and sends responses” , for Clowder.
8.1.3 The structure of the Geostreams database
The meteorological data that is collected for the TERRA REF project is contained in multiple related tables, also know as a relational database. The first table contains data about the sensor that is collecting data. This is then linked to a stream table, which contains information about a datastream from the sensor. Sensors can have multiple datastreams. The actual weather data is in the third table, the datapoint table. A visual representation of this structure is shown below.
In this vignette, we will be using data from a weather station at the Maricopa Agricultural Center, with datapoints for the month of January 2017 from a certain sensor. These data are five minute summaries aggregated from observations taken every second.
8.1.4 Creating the URLs for all data table types
All URLs have the same beginning (https://terraref.org/clowder/api/geostreams), then additional information is added for each type of data table as shown below.
- Station: /sensors/sensor_name=[name]
- Sensor: /sensors/[sensor number]/streams
- Datapoints: /datapoints?stream_id=[datapoints number]&since=[start date]&until=[end date]
A certain time period can be specified for the datapoints.
For example, below are the URLs for the particular data being used in this vignette. These can be pasted into a browser to see how the data is stored as text using JSON.
- Station: https://terraref.org/clowder/api/geostreams/sensors?sensor_name=UA-MAC+AZMET+Weather+Station
- Sensor: https://terraref.org/clowder/api/geostreams/sensors/438/streams
- Datapoints: https://terraref.org/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31
Possible sensor numbers for a station are found on the page for that station under “id:”, and then datapoints numbers are found on the sensor page under “stream_id:”.
The table belows lists the names of some stations that have available meteorological data and associated stream ids.
stream id | name |
---|---|
3212 | Irrigation Observations |
46431 | Weather Observations (5 min bins) |
3208 | EnvironmentLogger sensor_weather_station |
3207 | EnvironmentLogger sensor_par |
748 | EnvironmentLogger sensor_spectrum |
3210 | EnvironmentLogger sensor_co2 |
4806 | UIUC Energy Farm SE |
4807 | UIUC Energy Farm CEN |
4805 | UIUC Energy Farm NE |
Here is the json representation of a single five-minute observation:
[
{
"geometry":{
"type":"Point",
"coordinates":[
33.0745666667,
-111.9750833333,
0
]
},
"start_time":"2016-08-30T00:06:24-07:00",
"type":"Feature",
"end_time":"2016-08-30T00:10:00-07:00",
"properties":{
"precipitation_rate":0.0,
"wind_speed":1.6207870370370374,
"surface_downwelling_shortwave_flux_in_air":0.0,
"northward_wind":0.07488770951583902,
"relative_humidity":26.18560185185185,
"air_temperature":300.17606481481516,
"eastward_wind":1.571286062845733,
"surface_downwelling_photosynthetic_photon_flux_in_air":0.0
}
},
8.1.5 Querying weather sensor data stream
The data represent 5 minute summaries aggregated from 1/s observations.
8.1.6 Download data using the command line
Data can be downloaded from Clowder using the command line program Curl. If the
following is typed into the command line, it will download the datapoints data
that we’re interested in as a file which we have chosen to call spectra.json
.
8.1.6.1 Using R
The following code sets the defaults for showing R code.
And this is how you can access the same data in R. This uses the jsonlite R package
and desired URL to pull the data in. The data is in a dataframe with two nested
dataframes, called properties
and geometries
.
library(dplyr)
library(ggplot2)
library(jsonlite)
library(lubridate)
library(magrittr)
library(RCurl)
library(ncdf4)
library(ncdf.tools)
The geometries
dataframe is then pulled out from these data, which contains
the datapoints from this stream. This is combined with a transformed version of the
end of the time period from the stream.
8.2 Weather Plots
Create time series plot for one of the eight variables, wind speed, in the newly created dataframe.
theme_set(ggthemes::theme_few())
ggplot(data = weather_data) +
geom_point(aes(x = time, y = wind_speed), size = 0.7) +
labs(x = "Day", y = "Wind speed (m/s)")
8.2.1 High resolution data (1/s) + spectroradiometer
This higher resolution weather data can be used for VNIR calibration, for example. But at 1/s it is very large!
8.2.1.1 Download data
Here we will download the files using the Clowder API, but note that if you have access to the filesystem on Globus, you can directly access the data in the sites/ua-mac/Level_1/EnvironmentLogger
folder.
knitr::opts_chunk$set(eval = FALSE)
api_url <- "https://terraref.org/clowder/api"
output_dir <- file.path(tempdir(), "downloads")
dir.create(output_dir, showWarnings = FALSE, recursive = TRUE)
# Get Spaces from Clowder - without authentication, result will be Sample Data
spaces <- fromJSON(paste0(api_url, '/spaces'))
print(spaces %>% select(id, name))
8.2.1.2 Download netCDF 1/s data from Clowder
8.2.1.3 Using the netCDF 1/s data
One use case getting the solar spectrum associated with a particular hyperspectral image.
time <- vector()
vals <- vector()
for (i in 1:length(outputs)) {
print(paste0("Scanning ", outputs[i]))
ncfile <- nc_open(outputs[i])
curr_time <- list()
metdata <- list()
for(var in c(names(ncfile$dim), names(ncfile$var))){
metdata[[var]] <- ncvar_get(ncfile, var)
}
lapply(metdata, dim)
days <- ncvar_get(ncfile, varid = "time")
curr_time <- as.numeric(ymd("1970-01-01") + seconds(days * 24 * 60 * 60))
time <- c(time, curr_time)
PAR <- c(vals, metdata$`par_sensor/Sensor_Photosynthetically_Active_Radiation`)
}
#ggplot() +
# geom_line(aes(time, PAR)) + theme_bw()
print(ncfile)