Chapter 5 Retrieve source RGB image files

5.1 Learning Objective

In this chapter you will learn:

Find and retrieve a list of available sensor names
Retrieve files by downloading them to your system

5.2 Introduction

In this chapter we will show how to use the Python terrutils library to retrieve the list of available sensor names. The terrautils library provides a number of functions that can be used to perform different actions with data that is stored in TerraRef. The names of the sensors we retrieve using TerraUtils can provide information on what types of Level 1 data is available.

Our examples also show how to retrieve files of interest from TerraUtils by using the available API (Application Programming Interface).

5.3 Getting Started

First, we will need to install the terrautils library into the Python environment. We can do this by using the pip utility to install the library from pypi. Simple run pip install terrautils in a terminal to install terrautils. All the terrautils functions are now available in Python, although we will only use a very limited number of them.

5.4 Retrieving sensor names

In this section we retrieve the names of different sensor types that are available. This will allow you to understand what files may be available other than just those containing RBG image data.

In order to run Python functions, including those from the terrautils library, within this Rmarkdown, we have to install and set up reticulate.

if(!require(reticulate)){
  install.packages("reticulate")
  reticulate::py_install("terrautils")
}

library(reticulate)
use_virtualenv("r-reticulate")

We will first be using the get_sensor_list function to retrieve all the data on available sensors. We will then use the unique_sensor_names function to extract only the sensor names from the data we just retrieved.

from terrautils.products import get_sensor_list, unique_sensor_names

url = 'https://terraref.ncsa.illinois.edu/clowder/'
key = ''

sensors = get_sensor_list(None, url, key)
names = unique_sensor_names(sensors)

The variable names will now contain the list of all available sensors.

5.5 Retrieving the images

Once we have a list of files and their IDs we can retrieve them one-by-one. We do this by creating a URL that identifies the file to retrieve, making the API call to retrieve the file contents, and writing the contents to disk.

To create the correct URL we start with the one defined before and attach the keyword ‘/files/’ followed by the ID of each file. For example, assuming we have a file ID of ‘111’, the final URL for retrieving the file would be:

https://terraref.ncsa.illinois.edu/clowder/api/files/111

By looping through each of our files, and using their ID and filename, we can retrieve the files from the server and store them locally.

We are streaming the data returned from our server request (stream=True in the code below) due to the high probability of large file sizes. If the stream=True parameter was omitted the file’s entire contents would be in the r variable which could then be written to the local file.

To illustrate how this might work we are going to pre-populated an array of file names and their associated Clowder IDs.

files = [ {"id": "5c507cb74f0c4b0cbe6705f2",
           "filename": "rgb_geotiff_L1_ua-mac_2018-06-02__14-12-05-077_right.tif"}, 
          {"id": "5c507cb84f0cfd2aedf5a75a",
           "filename": "rgb_geotiff_L1_ua-mac_2018-06-02__14-12-05-077_left.tif"},
          {"id": "5c507eaf4f0c4b0cbe6716cd",
           "filename": "rgb_geotiff_L1_ua-mac_2018-05-05__11-35-13-442_left.tif"},
          {"id": "5c507eaf4f0cfd2aedf5b680",
           "filename": "rgb_geotiff_L1_ua-mac_2018-05-05__11-37-40-442_right.tif"}
          ]

The following code shows how to download the image files. First we format the base URL for our query allowing us to reuse it for each file. Next we loop through our array and create a customized URL while making the call to fetch the data using the requests interface. Finally we open the output file and use a loop to write the retrieved data.

import requests
from io import open

# We are using the same `url` and `key` variables declared in the previous example above.
filesurl = url + 'files/'
params={ 'key': key }

for f in files:
  r = requests.get(filesurl + f["id"], params=params, stream=True)
  with open(f["filename"], 'wb') as o:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk:
                o.write(chunk)

The images are now stored on the local file system.

5.6 Sample Images

Below are examples of images captured approximately one month apart ¹ ²

Date	Images
May 4, 2018
Jun 2, 2018

May 4, 2018 - rgb_geotiff_L1_ua-mac_2018-05-04__13-07-04-077_right.tif,rgb_geotiff_L1_ua-mac_2018-05-04__13-07-04-077_left.tif↩
Jun 2, 2018 - rgb_geotiff_L1_ua-mac_2018-06-02__14-12-05-077_right.tif,rgb_geotiff_L1_ua-mac_2018-06-02__14-12-05-077_left.tif↩