What is a STAC, COG, and VRT?
Overview
As more and more geospatial data becomes available, it is increasingly important to be able to efficiently identify and access just the data that are needed for a project. There are several tools which can be quite helpful with different aspects of that: Spatio-Temporal Asset Catalogs (STACs), Cloud-Optimized Geotiffs (COGs), and Virtual Rasters (VRTs). In many ways, these tools are similar to ones found at brick-and-mortar libraries. The Spatio-Temporal Asset Catalog functions like a card catalog: it provides metadata and information on where to find resources. Cloud-Optimized Geotiffs are like the books themselves with a table of contents, making it easy to find just the pages you are interested in. And just like multi-volume sets of books get around the physical limitations on the size of individual books, Virtual Rasters can reference multiple Cloud-Optimized Geotiffs, allowing them to be handled as a single unit.
Spatio-Temporal Asset Catalogs (STACS)
Spatio-Temporal Asset Catalogs are a catalog for spatial data, and particularly spatial data with a time component as well. They were designed primarily for raster data, such as satellite imagery, and are transmitted in JSON or GeoJSON format. STACs come in two different types: static STACs, which are simply JSON files, and dynamic STACs, where an API connects to a database and provides JSONs on request. The dynamic STAC APIs are intended to align with the Open Geospatial Consortium’s Web Feature Service v3 standard (OGC WFS v3), so they can interoperate easily with other geospatial tools. The foundation of the STAC are items, for instance a Landsat scene. These items are a simple GeoJSON feature, and have one or more assets associated with them, for instance single-band data in COG format, an XML metadata file, a virtual raster, or a JPG thumbnail. Assets are simply a link to that data. On top of items are collections. Collections are GeoJSON feature collections, composed of items with common metadata. Often that is something like the same instrument/processing parameters. Collections can also have assets, such as metadata common to all items in the collection or a virtual raster of multiple COGs in the collection. And above collections are catalogs. Catalogs are like a computer’s folder structure. They contain collections and/or other catalogs. One notable feature of STACs is that because collections operate using GeoJSON, you can explore a STAC collection using familiar desktop GIS tools.
Cloud-Optimized Geotiffs (COGS)
Cloud-Optimized Geotiffs (COGs) are based on the TIFF image format, with geospatial metadata (GeoTiff), having the metadata at the head of the file. TIFF allows for sub-images to be stored within a single file, and COGs make use of that to store data in tiles (typical size is 512x512 pixels). Larger tiles produce less metadata to search through, but at the cost of higher data volumes to access one unit of data. Smaller tiles yield more metadata to search through, but transfer less data at a time. Because of the size of the internal meatdata header, COGs may have poor performance above a certain number of sub-images. COGs often have not only the full-resolution data stored within the TIFF, but also lower-resolution overviews for different map scales. As a result, a COG which includes 0.5-m data for a 100 km by 100 km area may also include a 512x512 pixel overview which is appropriate for viewing at the county or state scale. So instead of needing to download tens of gigabytes of data, the GIS client can request and download just one 512x512 pixel image! When a client zooms in, it fetches the appropriately-sized overview tile(s), and when fully zoomed in, it fetches the full-resolution data for the region being viewed. To get a copy of the data for analysis in a desktop GIS client, use a clipping tool to select your area of interest from the COG. The GIS client will then request only the base data tiles that overlap your clipping. Save that clipped layer and you have downloaded the relevant data for your project! Note that the clipped layer will be a GeoTIFF, but may not be cloud-optimized and will not have the overviews. Additional tools such as GDAL can be used to create a COG with overviews from your GeoTIFF if desired.
Virtual Raster (VRTs)
Virtual Rasters (VRTs) are a type of XML file that provide a way for desktop GIS clients to treat multiple different raster sources (e.g. COGs) as one unit. This gives users one symbology for the layer, rather than separate symbologies for each raster. If several COGs are referenced by a VRT, the desktop GIS client will access only the data it needs from them. It will also allow you to clip across seams in the COGs.