STAC and Data Access
D.03.10 HANDS-ON TRAINING - EarthCODE 101 Hands-On Workshop - Example showing how to access data on the EarthCODE Open Science Catalog and working with the Pangeo ecosystem on EDC
TODO: Copy from STAC basics guide to explain what stac is
Packages¶
As best practices dictate, we recommend that you install and import all the necessary libraries at the top of your Jupyter notebook.
from pystac.extensions.storage import StorageExtension
from datetime import datetime
from pystac_client import Client as pystac_client
from odc.stac import configure_rio, stac_load
import pystac
import xarray
What is STAC?¶
(summarised from https://
The SpatioTemporal Asset Catalog (STAC) specification was designed to establish a standard, unified language to talk about geospatial data, allowing it to be more easily searchable and queryable.
STAC has been designed to be simple, flexible, and extensible. STAC is a network of JSON files that reference other JSON files, with each JSON file adhering to a specific core specification depending on which STAC component it is describing. This core JSON format can also be customized to fit differing needs, making the STAC specification highly flexible and adaptable.
Therefore, if the STAC specification seems ‘light’, that is because it is light-- by design. Through this flexibility, different domains and tools can easily utilize the STAC specification and make it their own.
The Extensions section of the spec is where the community collaborates on more detail about specific data types and new functionality.
What is STAC¶
The SpatioTemporal Asset Catalog (STAC) specification was designed to establish a standard, unified language to talk about geospatial data, allowing it to be more easily searchable and queryable.
STAC has been designed to be simple, flexible, and extensible. STAC is a network of JSON files that reference other JSON files, with each JSON file adhering to a specific core specification depending on which STAC component it is describing. This core JSON format can also be customized to fit differing needs, making the STAC specification highly flexible and adaptable.
Therefore, if the STAC specification seems ‘light’, that is because it is light-- by design. Through this flexibility, different domains and tools can easily utilize the STAC specification and make it their own.
The Extensions section of the spec is where the community collaborates on more detail about specific data types and new functionality.
What is a SpatioTemporal Asset¶
A SpatioTemporal Asset is any file that represents information about the Earth captured in a certain place and at a particular time. Examples include spatiotemporal data derived from imagery (from satellites, airplanes, and drones), Synthetic Aperture Radar (SAR), point clouds (from LiDAR, structure from motion, etc.), data cubes, and full-motion video. The key is that the GeoJSON is not the actual ‘thing’, but instead references files and serves as an index to the STAC Assets.
STAC Components¶
There are three component specifications that together make up the core SpatioTemporal Asset Catalog specification. These components are:
Item - A STAC Item is the foundational building block of STAC. It is a GeoJSON feature supplemented with additional metadata that enables clients to traverse through catalogs. Since an item is a GeoJSON, it can be easily read by any modern GIS or geospatial library. One item can describe one or more SpatioTemporal Asset(s). For example, a common practice of using STAC for imagery is that each band in a scene is its own STAC Asset and there is one STAC Item to represent all the bands in a single scene.
Catalog - A Catalog is usually the starting point for navigating a STAC. A catalog.json file contains links to some combination of other STAC Catalogs, Collections, and/or Items. We can think of it like a directory tree on a computer.
Collection - A STAC Collection builds upon the STAC Catalog specification to include additional metadata about a set of items that exist as part of the collection. It extends the parent catalog by adding additional fields to enable the description of information like the spatial and temporal extent of the data, the license, keywords, providers, etc. Therefore, it can easily be extended for additional collection-level metadata.
Each component can be used alone, but they work best in concert with one another.

A STAC Item represents one or more spatiotemporal assets as GeoJSON so that it can be easily searched. The STAC Catalog specification provides structural elements to group STAC Items and Collections. STAC Collections are catalogs that add more required metadata and describe a group of related items. Now, let’s dive into each one of these components a bit more in-depth.
The STAC API specification builds on top of that core and allows us to do dynamic queries over our collections.
Accessing the EarthCODE Catalog¶
This section introduces STAC, the SpatioTemporal Asset Catalog. STAC provides a standardized way to structure metadata about spatialotemporal data. The STAC community are building APIs and tools on top of this structure to make working with spatiotemporal data easier.
Users of STAC will interact most often with Collections and Items (there’s also Catalogs, which group together collections). A Collection is just a collection of items, plus some additional metadata like the license and summaries of what’s available on each item. You can view available collections on the EarthCODE catalog with
Open Science Catalog¶
The Open Science Data Catalog is a publicly accessible platform that enables anyone—whether or not they have a GitHub account—to discover and access Earth Observation research. It provides a transparent and structured way to explore the latest results from EO projects by organizing metadata in a consistent and harmonized format.
Built on the open-source STAC Browser, the catalog allows users to browse and explore interlinked elements such as themes, variables, EO missions, projects, products, workflows, and experiments, all described using STAC-compliant JSON files.
cat = pystac_client.open("https://catalog.osc.earthcode.eox.at/")
cat
Now let’s search and access the same data we were just looking at, SeasFire:
https://
In the examples we’ve seen so far, we’ve just been given a STAC item. How do you find the items you want in the first place? That’s where a STAC API comes in.
A STAC API is some web service that accepts queries and returns STAC objects. The ability to handle queries is what differentiates a STAC API from a static STAC catalog, where items are just present on some file system.
# Todo
Note that the collection points to another collection, which contains the actual data. The EarthCODE STAC extension describes some metadata that enrich the STAC collection https://

Accessing Data from EarthCODE¶
Now we will load the actual data from the STAC item we’ve found...
The SeasFire Cube is a scientific datacube for seasonal fire forecasting around the globe. It has been created for the SeasFire project, that adresses ‘Earth System Deep Learning for Seasonal Fire Forecasting’ and is funded by the European Space Agency (ESA) in the context of ESA Future EO-1 Science for Society Call. It contains almost 20 years of data (2001-2021) in an 8-days time resolution and 0.25 degrees grid resolution. It has a diverse range of seasonal fire drivers. It expands from atmospheric and climatological ones to vegetation variables, socioeconomic and the target variables related to wildfires such as burned areas, fire radiative power, and wildfire-related CO2 emissions.
seasfire = pystac.read_file(
"https://s3.waw4-1.cloudferro.com/EarthCODE/Catalogs/seasfire/seasfire-cube_v0.4/catalog.json"
)
for item in seasfire.get_items():
print(item)
seasfire
# https://s3.waw4-1.cloudferro.com/EarthCODE/Catalogs/seasfire/seasfire-cube_v0.4/seasfire-cube-v.0.4/seasfire-cube-v.0.4.json
seasfire_cube = seasfire.get_item("seasfire-cube-v.0.4")
seasfire_cube
Items and Assets¶
STAC is a metadata standard. It doesn’t really deal with data files directly. Instead, it links to the data files under the “assets” property.
As described in https://
In this scenario any STAC metadata exists purely for discovery and cannot be used for filtering or subsetting (see Future Work for more on that). To search the STAC catalog to find collections of interest you will use the Collection Search API Extension. Depending on the level of metadata that has been provided in the STAC catalog you can search by the name of the collection and possibly by the variables – exposed via the Data Cube Extension.
Read straight to xarray¶
For a collection of interest, the best approach for accessing the data is to construct the lazily-loaded data cube in xarray (or an xarray.DataTree if the Zarr store has more than one group) and filter from there.
To do this you can use the zarr backend directly or you can use the stac backend to streamline even more. The stac backend is mostly useful if the STAC collection uses the xarray extension.
Constructing the lazy data cube is likely to be very fast if there is a consolidated metadata file OR the data is in Zarr-3 and the Zarr metadata fetch is highly parallelized (read more).
Loading and Fetching Data Cube - Xarray¶
http_url = seasfire_cube.assets["data"].href.replace(
"s3://",
f"{seasfire_cube.properties['storage:schemes'][seasfire_cube.assets['data'].extra_fields['storage:refs'][0]]['platform'].rstrip('/')}/",
)
ds = xarray.open_dataset(
http_url,
engine='zarr',
chunks={},
consolidated=True
# storage_options = {'token': 'anon'}
)
ds