Compile Time Series by Units of Analysis¶
Authors:
- Victor Tang
Reviewed/Edited by:
- Marcos Kavlin
- Dr. Andrew Dean
Purpose¶
This notebook is the third in the SWIFT-C Tutorial. The goal of this notebook is to show you, the user, how to aggregate your the time series information by the landscape units of analysis that were calculated in the previous notebook.
Workflow¶
- Import required packages.
- Load Sentinel 1 imagery.
- Load landscape unit polygons.
- Aggregate the Sentinel 1 time series pixel information by landscape unit.
- Export results.
Notes¶
The composites that were used to run this notebook are placeholders used to demonstrate this workflow. The data required to run the steps demonstrated in this notebook are 10 day Sentinel 1 median composites. In this tutorial the images were produced for the year 2022. The images were grouped into 10 day composites as that was most conducive to a complete year-long time series, while still removing speckle and noise. They were all stored in a folder for the year 2022.
The landscape units used in this tutorial were the output from the previous notebook.
Note Description:
This notebook aggregate pixels in Sentinel-1 image time series by landscape units, which were extracted from segmentation of Sentinel-2 composite. It returns an "average" time series for each of landscape units that represent the general temporal pattern of S1 backscatter for a certain year. The temporal resolution of the "average" time sereis is determined by the interval of Sentinel-1 images, which is 10 days in this case.
1. Import required packages¶
import numpy as np
import pandas as pd
import glob
import geopandas as gpd
import xarray as xr
import rioxarray as rxr
from rasterio.features import geometry_mask
base_dir = f"{path_to_data}/"
2. Load Sentinel-1 data¶
The code below was written with the imagery named using the following naming convention: "an_giang_2022_10d_02.tif"
- In this example 'an_giang' is the province name
- 2022 is the year in question
- 10d specifies the timespan of the composite, 02 specifies what it's order (temporally) is within the year.
The code completes the following steps:
- Lists all the .tif files in the specified directory
- Make a list of the order numbers of each image
- Open each image, select the VH band and append it to an empty list
- Concatenate da_list, using the list of order numbers as a pandas.Index, for the dimension argument.
- Convert the units of Sentinel-1 values from linear to decibel units (for better visualization)
# find all the .tif files in a given folder
fpath_list = glob.glob(base_dir + "*.tif")
# extract image order number from file name
x = [int(item.split("_")[-1].split(".")[0]) for item in fpath_list]
# load all the tif files as a list of xarray DataArray
da_list = []
for fpath in fpath_list:
da = rxr.open_rasterio(fpath) # load Sentinel-1 image
da = da.sel(band=1) # select VH band
da_list.append(da)
# concatenate the list of xarray DataArray
da = xr.concat(da_list, pd.Index(x))
# convert unit of Sentinel-1 values from linear to decibel
da.values = 10 * np.log10(da.values)
3. Load polygons of landscape units¶
Once we have loaded our Sentinel-1 data the next step is to load the landscapes units we need to aggregate the data.
gdf = gpd.read_file("data/an_giang_segmentation.geojson")
gdf = gdf.to_crs(da.rio.crs.to_epsg())
gdf = gdf.set_index("PID")
4. Aggregate pixels by landscape units¶
In order to aggregate our pixels by landscape unit, we need to iterate through the features in our landscape unit geodataframe.
The code below executes the following steps:
- Obtain the number of time steps in our time series.
- Create an empty pandas.DataFrame, in which to store the aggregated values for each landscape unit.
- Iterate through the landscape units:
- Clip the time series data, by the spatial extent of thelandscape unit.
- Obtain the median of the remaining pixels
- Generate a 1-dimensional array of medians, which will be used as the time series for each landscape unit.
- Append this time series to the appropriate index in our empty pandas.DataFrame.
n = da.shape[0] # number of images (i.e. time steps)
df = pd.DataFrame(index=gdf.index, columns=np.arange(1, n + 1))
for pid, row in gdf.iterrows():
# subset image to improve efficiency
xmin, ymin, xmax, ymax = row.geometry.bounds
da2 = da.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))
mask = geometry_mask(
gpd.GeoSeries([row.geometry]),
out_shape=da2.shape[-2:], # get dimension of x and y
transform=da2.rio.transform(),
)
mask = ~mask
data = da2.values[:, mask] # select pixels within polygon mask
df.loc[pid] = np.median(data, axis=-1) # aggregate by median
5. Export the resulting DataFrame¶
Once we've filled our new DataFrame with a median time series for each landscape unit, we save it, to use in the next notebook.
This ouput data is provided so once can run the following notebooks
df.to_csv(f"{output_path}/vh_2022.csv")