Chapter 3 Accessing trait data in R
3.1 Learning Objectives
In this chapter you will learn:
- How to create a summary of available data to query from a TERRA REF season
- How to query a specific trait
- How to visualize query results
In this chapter, we go over how to query TERRA REF trait data using the
traits package. The
traits package is a way to query for various sources of species trait data, including BETYdb, NCBI, Coral Traits Disease and others. In this chapter we use BETYdb as our trait source, as it contains the TERRA REF data that we are interested in.
Our example will show how to query for season 6 data and visualize canopy height. In addition to the
traits package we will also be using some of the
tidyverse packages, which allow us to manipulate the data in an efficient, understandable way. If you are unfamiliar with
tidyverse syntax, we recommend checking out some of the resources here.
3.3 Query for available traits
3.3.1 Getting Started
First, we will need to install and load the traits package from CRAN, and load it into our environment, along with the other packages we will use in this tutorial.
3.3.2 Setting options
The function that is used to query BETYdb is called
betydb_query. To reduce the number of arguments needed to pass into this function, we can set some global options using
options. In this case, we will set the URL used in the query, and the API version.
3.3.3 Querying available traits
The TERRA REF database contains trait data for many other seasons of observation, and available data may vary by season. Here, we get a visual summary of available traits and methods of measurement for a season.
First we construct a general query for the Season 4 data. This returns all season 4 data. The function
betydb_query takes as arguments
key = "value" pairs which represent columns in the database to query. In this example, we set
sitename column for season 4 data, and set the limit to “none” to return all records. By default, the function will search all tables in the database. To specify a particular table you can use the
The return value for the
betydb_query function is just a
data.frame so we can work with it like any other
data.frame in R.
Let’s plot a time series of all traits returned. First you might notice that the relevant date columns in the
season_4 data.frame are returned as characters instead of a date format. Before plotting, let’s get our
raw_date column into a proper date format and time zone using functions from
3.3.4 Plot season 4 summary
Now we can create a plot of all of the trait data collected during season 4, including information about the methods used.
ggplot(data = season_4) + geom_point(aes(x = trans_date, y = mean, color = method_name), shape = '.') + geom_line(aes(x = trans_date, y = mean, group = cultivar, color = method_name)) + facet_wrap(~trait, ncol = 4, scales = "free_y") + xlab("Date") + ylab("Mean trait value") + ggtitle("Season 4 data summary") + guides(color = guide_legend(title="Method", ncol = 1, title.position = "top")) + theme_bw() + theme(legend.position = "bottom")
We can view more information about these trait measurements by examining unique values in the trait and trait description columns.
3.4 Querying a specific trait
3.4.1 Querying season 6 canopy height data
You may find after constructing a general query as above that you want to only query a specific trait. Here, we query for the canopy height trait by adding the key-value pair
trait = "canopy_height" to our query function. Note that the limit is also set to return only 250 records, shown here for demonstration purposes.
3.4.2 Plotting query results
As before, we need to reformat the raw date column.
And we can generate a time series plot of just the canopy height data.