Skip to main content

Generate Inference Data AWS Step Function

The step function machine-learning-generate-inference-data creates the inference tiles necessary for the detection models. The inference tiles are generated from the original RGB tiles. The tiles can be resampled and cropped to masking vector information. Masking can be done from any polygon file with 1 or more features (example would be farm delineation). Resampling is governed by step function input event. Tiles are searched by the lookup lambdas.

The output are s3 objects of the clipped and resampled tiles stored in the defined output. Not using the polygons argument will run the simplified version of the step functions. This basically moves the raster source tiles to a defined output location in the machine learning bucket. Output is always stored in the GeoTiff format.

You can also generate DynamoDB references for the tiles by running the simplified version with upload as "true".

Lambdas

The lambdas orchestrated by the Step function are the following:

  • The prefix generator lambda GenerateTilesPrefix creates the prefix for the tile dataset.
  • The tile bounds reader lambda GenerateTileBounds read the tiles and generates a GeoJSON with the geographic tile demarcation. The output can be used for intersection operations without loading the raster data.
  • The generator lambda GeneratePolygonIterator builds the array of polygons and their intersected tiles. This lambda feeds into the Map State of this step function to perform the parallel processes.
  • The resampled tiles are created and cropped (if applicable) by the GenerateInferenceDataOnPolygon lambda. This lambda runs in an iterator on the number of features created by the previous lambda.

Invocation description

The following keys are required to invoke the step function:

  • job_id str - job id reference.
  • prefix: str - path to the raster data.
  • output_key str - object path to output S3 location.

The following keys are optional to invoke the step function:

  • clip bool - setting this will determine if raster data is cropped or only intersected with the data in polygons. This parameter is of course ignored when polygons aren't present.
  • polygons str - path to the crop polygons, takes a singular geometry file or a prefix to a shapefile.
  • resolution str, float, optional - be aware that a full 4326 dataset will fail if the user sets a resolution in UTM (in other words; centimeters). A warning will be thrown and the dataset will contain empty files. Setting this to "source" will use the source raster resolution.
  • upload bool - switch on whether output should be written to s3. Only functions in simplified mode.

Example

{
"clip": false,
"job_id": "20240926085555-1337-e0af5713b68944cbbd0fc05c6f38f79a",
"output_key": "path/to/output/",
"polygons": "/path/to/polygon.shp",
"prefix": "path/to/raster/data/",
"resolution": "source"
}