Chapter 8 Accessing meteorological data

Objectives:

This tutorial will walk through the steps required to access meteorological data from the Maricopa Agricultural Center.

Pre-requisites:

Need to have R packages tidyverse, jsonlite, and convertr installed.
Need to have an internet connection.

8.1 The Maricopa Weather Station

8.1.1 Meteorological data formats

8.1.1.1 Dimensions:

CF standard-name	units
time	days since 1970-01-01 00:00:00 UTC
longitude	degrees_east
latitude	degrees_north

8.1.1.2 Variable names and units

CF standard-name	units	bety	isimip	cruncep	narr	ameriflux
air_temperature	K	airT	tasAdjust	tair	air	TA (C)
air_pressure	Pa	air_pressure				PRESS (KPa)
mole_fraction_of_carbon_dioxide_in_air	mol/mol					CO2
relative_humidity	%	relative_humidity	rhurs	NA	rhum	RH
surface_downwelling_photosynthetic_photon_flux_in_air	mol m-2 s-1	PAR				PAR (NOT DONE)
precipitation_flux	kg m-2 s-1	cccc	prAdjust	rain	acpc	PREC (mm/s)
	degrees	wind_direction				WD
wind_speed	m/s	Wspd				WS

variable names are from MsTMIP.
standard_name is CF-convention standard names
units can be converted by udunits, so these can vary (e.g. the time denominator may change with time frequency of inputs)
soil moisture for the full column, rather than a layer, is soil_moisture_content

For example, in the MsTMIP-CRUNCEP data, the variable rain should be precipitation_rate. We want to standardize the units as well as part of the met2CF.<product> step. I believe we want to use the CF “canonical” units but retain the MsTMIP units any time CF is ambiguous about the units.

The key is to process each type of met data (site, reanalysis, forecast, climate scenario, etc) to the exact same standard. This way every operation after that (extract, gap fill, downscale, convert to a model, etc) will always have the exact same inputs. This will make everything else much simpler to code and allow us to avoid a lot of unnecessary data checking, tests, etc being repeated in every downstream function.

8.1.2 Using the API to get data

In order to access the data, we need to contruct a URL that links to where the data is located on Clowder. The data is then pulled down using the API, which “receives requests and sends responses” , for Clowder.

8.1.3 The structure of the Geostreams database

The meteorological data that is collected for the TERRA REF project is contained in multiple related tables, also know as a relational database. The first table contains data about the sensor that is collecting data. This is then linked to a stream table, which contains information about a datastream from the sensor. Sensors can have multiple datastreams. The actual weather data is in the third table, the datapoint table. A visual representation of this structure is shown below.

In this vignette, we will be using data from a weather station at the Maricopa Agricultural Center, with datapoints for the month of January 2017 from a certain sensor. These data are five minute summaries aggregated from observations taken every second.

8.1.4 Creating the URLs for all data table types

All URLs have the same beginning (https://terraref.org/clowder/api/geostreams), then additional information is added for each type of data table as shown below.

Station: /sensors/sensor_name=[name]
Sensor: /sensors/[sensor number]/streams
Datapoints: /datapoints?stream_id=[datapoints number]&since=[start date]&until=[end date]

A certain time period can be specified for the datapoints.

For example, below are the URLs for the particular data being used in this vignette. These can be pasted into a browser to see how the data is stored as text using JSON.

Possible sensor numbers for a station are found on the page for that station under “id:”, and then datapoints numbers are found on the sensor page under “stream_id:”.

The table belows lists the names of some stations that have available meteorological data and associated stream ids.

stream id	name
3212	Irrigation Observations
46431	Weather Observations (5 min bins)
3208	EnvironmentLogger sensor_weather_station
3207	EnvironmentLogger sensor_par
748	EnvironmentLogger sensor_spectrum
3210	EnvironmentLogger sensor_co2
4806	UIUC Energy Farm SE
4807	UIUC Energy Farm CEN
4805	UIUC Energy Farm NE

Here is the json representation of a single five-minute observation:

[
   {
      "geometry":{
         "type":"Point",
         "coordinates":[
            33.0745666667,
            -111.9750833333,
            0
         ]
      },
      "start_time":"2016-08-30T00:06:24-07:00",
      "type":"Feature",
      "end_time":"2016-08-30T00:10:00-07:00",
      "properties":{
         "precipitation_rate":0.0,
         "wind_speed":1.6207870370370374,
         "surface_downwelling_shortwave_flux_in_air":0.0,
         "northward_wind":0.07488770951583902,
         "relative_humidity":26.18560185185185,
         "air_temperature":300.17606481481516,
         "eastward_wind":1.571286062845733,
         "surface_downwelling_photosynthetic_photon_flux_in_air":0.0
      }
   },

8.1.5 Querying weather sensor data stream

The data represent 5 minute summaries aggregated from 1/s observations.

8.1.6 Download data using the command line

Data can be downloaded from Clowder using the command line program Curl. If the following is typed into the command line, it will download the datapoints data that we’re interested in as a file which we have chosen to call spectra.json.

curl -o spectra.json -X GET https://terraref.org/clowder/api/geostreams/datapoints?stream_id=46431&since=2017-01-02&until=2017-01-31

8.1.6.1 Using R

The following code sets the defaults for showing R code.

knitr::opts_chunk$set(cache = FALSE, message = FALSE)

And this is how you can access the same data in R. This uses the jsonlite R package and desired URL to pull the data in. The data is in a dataframe with two nested dataframes, called properties and geometries.

library(dplyr)
library(ggplot2)
library(jsonlite)
library(lubridate)
library(magrittr)
library(RCurl)
library(ncdf4)
library(ncdf.tools)

weather_all <- fromJSON('https://terraref.org/clowder/api/geostreams/datapoints?stream_id=46431&since=2018-04-01&until=2018-08-01', flatten = FALSE)

The geometries dataframe is then pulled out from these data, which contains the datapoints from this stream. This is combined with a transformed version of the end of the time period from the stream.

weather_data <- weather_all$properties %>% 
  mutate(time = with_tz(ymd_hms(weather_all$end_time), "America/Phoenix"))

8.2 Weather Plots

Create time series plot for one of the eight variables, wind speed, in the newly created dataframe.

theme_set(ggthemes::theme_few())
ggplot(data = weather_data) +
  geom_point(aes(x = time, y = wind_speed), size = 0.7) +
  labs(x = "Day", y = "Wind speed (m/s)")

8.2.1 High resolution data (1/s) + spectroradiometer

This higher resolution weather data can be used for VNIR calibration, for example. But at 1/s it is very large!

8.2.1.1 Download data

Here we will download the files using the Clowder API, but note that if you have access to the filesystem on Globus, you can directly access the data in the sites/ua-mac/Level_1/EnvironmentLogger folder.

knitr::opts_chunk$set(eval = FALSE)
api_url <- "https://terraref.org/clowder/api"
output_dir <- file.path(tempdir(), "downloads")
dir.create(output_dir, showWarnings = FALSE, recursive = TRUE)

# Get Spaces from Clowder - without authentication, result will be Sample Data
spaces <- fromJSON(paste0(api_url, '/spaces'))
print(spaces %>% select(id, name))

# Get list of (at most 20) Datasets within that Space from Clowder
datasets <- fromJSON(paste0(api_url, '/spaces/', spaces$id, '/datasets'))
print(datasets %>% select(id, name))

# Get list of Files within any EnvironmentLogger datasets and filter .nc files
files <- fromJSON(paste0(api_url, '/datasets/', datasets$id[grepl("EnvironmentLogger", datasets$name)], '/files'))
ncfiles <- files[grepl('environmentlogger.nc', files$filename), ]
print(ncfiles %>% select(id, filename))

8.2.1.2 Download netCDF 1/s data from Clowder

8.2.1.3 Using the netCDF 1/s data

One use case getting the solar spectrum associated with a particular hyperspectral image.

time <- vector()
vals <- vector()

for (i in 1:length(outputs)) {
  print(paste0("Scanning ", outputs[i]))
  ncfile <- nc_open(outputs[i])
  curr_time <- list()

  metdata <- list()
  for(var in c(names(ncfile$dim), names(ncfile$var))){
    metdata[[var]] <- ncvar_get(ncfile, var)
  }
  lapply(metdata, dim)
  
  days <- ncvar_get(ncfile, varid = "time")
  curr_time <- as.numeric(ymd("1970-01-01") + seconds(days * 24 * 60 * 60))
  
  time <- c(time, curr_time)
  PAR <- c(vals, metdata$`par_sensor/Sensor_Photosynthetically_Active_Radiation`)
}

#ggplot() + 
#  geom_line(aes(time, PAR)) + theme_bw()

print(ncfile)