Chapter 3 Accessing trait data in R
3.1 Learning Objectives
In this chapter you will learn:
- How to create a summary of available data to query from a TERRA REF season
- How to query a specific trait
- How to visualize query results
3.2 Introduction
In this chapter, we go over how to query TERRA REF trait data using the traits
package. The traits
package is a way to query for various sources of species trait data, including BETYdb, NCBI, Coral Traits Disease and others. In this chapter we use BETYdb as our trait source, as it contains the TERRA REF data that we are interested in.
Our example will show how to query for season 6 data and visualize canopy height. In addition to the traits
package we will also be using some of the tidyverse
packages, which allow us to manipulate the data in an efficient, understandable way. If you are unfamiliar with tidyverse
syntax, we recommend checking out some of the resources here.
3.3 Query for available traits
3.3.1 Getting Started
First, we will need to install and load the traits package from CRAN, and load it into our environment, along with the other packages we will use in this tutorial.
3.3.2 Setting options
The function that is used to query BETYdb is called betydb_query
. To reduce the number of arguments needed to pass into this function, we can set some global options using options
. In this case, we will set the URL used in the query, and the API version.
3.3.3 Querying available traits
The TERRA REF database contains trait data for many other seasons of observation, and available data may vary by season. Here, we get a visual summary of available traits and methods of measurement for a season.
First we construct a general query for the Season 4 data. This returns all season 4 data. The function betydb_query
takes as arguments key = "value"
pairs which represent columns in the database to query. In this example, we set sitename
column for season 4 data, and set the limit to “none” to return all records. By default, the function will search all tables in the database. To specify a particular table you can use the table
argument.
The return value for the betydb_query
function is just a data.frame
so we can work with it like any other data.frame
in R.
Let’s plot a time series of all traits returned. First you might notice that the relevant date columns in the season_4
data.frame are returned as characters instead of a date format. Before plotting, let’s get our raw_date
column into a proper date format and time zone using functions from dplyr
and lubridate
.
3.3.4 Plot season 4 summary
Now we can create a plot of all of the trait data collected during season 4, including information about the methods used.
ggplot(data = season_4) +
geom_point(aes(x = trans_date, y = mean, color = method_name), shape = '.') +
geom_line(aes(x = trans_date, y = mean, group = cultivar, color = method_name)) +
facet_wrap(~trait, ncol = 4, scales = "free_y") +
xlab("Date") +
ylab("Mean trait value") +
ggtitle("Season 4 data summary") +
guides(color = guide_legend(title="Method", ncol = 1, title.position = "top")) +
theme_bw() +
theme(legend.position = "bottom")
We can view more information about these trait measurements by examining unique values in the trait and trait description columns.
3.4 Querying a specific trait
3.4.1 Querying season 6 canopy height data
You may find after constructing a general query as above that you want to only query a specific trait. Here, we query for the canopy height trait by adding the key-value pair trait = "canopy_height"
to our query function. Note that the limit is also set to return only 250 records, shown here for demonstration purposes.
3.4.2 Plotting query results
As before, we need to reformat the raw date column.
canopy_height <- canopy_height %>%
mutate(trans_date = with_tz(ymd_hms(raw_date), "America/Phoenix"))
And we can generate a time series plot of just the canopy height data.