Skip to main content

Build Treecount layer AWS Step Function

This step function postprocessing-build-treecount-step-function is used to orchestrate the upload and zonal statistics of a treecount layer. The function uses several raster layers to run zonal statistics on the treecount data. Currently, the step function handles object height models and vegetation index maps. These layers are build by postprocessing-build-ohm-step-function for object height models and postprocessing-build-health-step-function for health mapping. The step function will also handle an unpublished treecount dataset. New zonal statistics can be added to the step function as new states. The step function publishes unpublished treecount datasets and add zonal statistics to the published treecount. Processing options can be found in the documentation for the individual lambdas.

Lambdas

The lambdas orchestrated by the Step function are the following:

Warning data will not be retained in the input geometry dataset for this particular layer.

  • A PostprocessingBuildVectorStatsFunction to process zonal statistics for a PostGIS vector table using any type of raster layer as a source.

  • A PostprocessingTreeCountLayerFunction to generate the distance calculations between individual trees. The function wraps a couple of queries to transform data into the right projection and run the nearest neighbour calculation. It's therefore more likely to time out since this query can take very long (>60s for 800k records).

  • A StepFunctionListS3BucketFunction to collect the necessary tiles for feeding into the vector stats lambda.

  • A UtilityGetPrefixFunction to generate the prefix used to find the input files and where data will be stored in S3.

Invocation description

The following keys are required:

  • job_id str - job id reference.
  • client_tag str - used for cost tagging.

The following keys are optional

  • aurora bool - switch to run Serverless infrastructure, overwrites legacy, defaults to True.
  • health_options dict - valid options for health layer.
    • append bool - append data to existing records, otherwise it creates new rows, defaults to False.
    • ALL_TOUCHED bool - specific GDAL option for burn operations, defaults to False.
    • bounds List[float] - search window bbox, defaults to tile dimensions.
    • buffer float | int - size of buffer around point to search for valid pixels, defaults to 1.
    • crs int - override EPSG, defaults to input raster EPSG.
    • date str - set a date filter.
    • geotransform tuple - transform matrix for GDAL objects, defaults to tile matrix.
    • id_column str - set the ID column of the layer dataset (GPKG creates fid as column name).
    • overwrite bool - overwrite existing attributes, overlapping bbox will rewrite data.
    • parameter str - attribute to use for search buffer, similar to buffer option.
    • projection str - geoprojection string for GDAL objects, defaults to tile projection.
    • simplify int - strength of geometry simplification, defaults to 0.
    • threshold float | int - number of lower limit used for binary masking of tile, defaults to tree ID.
    • XSize float - number of pixels horizontal, defaults to tile X size.
    • YSize float - number of pixels vertical, defaults to tile Y size.
  • height_options dict - valid options for height layer.
    • append bool - append data to existing records, otherwise it creates new rows, defaults to False.
    • ALL_TOUCHED bool - specific GDAL option for burn operations, defaults to False.
    • bounds List[float] - search window bbox, defaults to tile dimensions.
    • buffer float | int - size of buffer around point to search for valid pixels, defaults to 1.
    • crs int - override EPSG, defaults to input raster EPSG.
    • date str - set a date filter.
    • geotransform tuple - transform matrix for GDAL objects, defaults to tile matrix.
    • id_column str - set the ID column of the layer dataset (GPKG creates fid as column name).
    • overwrite bool - overwrite existing attributes, overlapping bbox will rewrite data.
    • parameter str - attribute to use for search buffer, similar to buffer option.
    • projection str - geoprojection string for GDAL objects, defaults to tile projection.
    • simplify int - strength of geometry simplification, defaults to 0.
    • threshold float | int - number of lower limit used for binary masking of tile, defaults to tree ID.
    • XSize float - number of pixels horizontal, defaults to tile X size.
    • YSize float - number of pixels vertical, defaults to tile Y size.
  • layername str - set a layer name to run the geostatistics on. Defaults to "treecount"
  • legacy bool - defaults to false. Whether you want to run against the Serverless or EC2 infrastructure.
  • options dict - valid options for k-Nearest Neighbours.
    • date str - set a date filter, is overwritten by order_date
    • neighbours int - number of neighbouring features to include in aggregation, defaults to 6.
    • skip bool - whether to skip this step, will overwrite all other options.
    • threshold float | int - search radius for neighbours in meters, defaults to 12.
  • order_date str - set a date subset (filter) the table. Can be searched using job_id.
  • workspace str - set a workspace to specify the table. Can be seached using job_id.

Example

{
"job_id": "20220520151302-1327-0ad42aeec1014698b5efc953b83b5825",
"order_date": "2021-01-01",
"options": {
"date": "2021-01-01",
"skip": true,
"neighbours": 6,
"threshold": 12
},
"health_options": {
"parameter": "mindist",
"simplify": 0
},
"height_options": {
"date": "2021-01-01",
"buffer": 2,
"parameter": "meandist",
"append": true
},
"legacy": true,
"workspace": "some-workspace"
}