Generate Inference Data AWS Step Function
The step function machine-learning-generate-inference-data creates the inference tiles necessary for the detection models. The inference tiles are generated from the original RGB tiles. The tiles can be resampled and cropped to masking vector information. Masking can be done from any polygon file with 1 or more features (example would be farm delineation). Resampling is governed by step function input event. Tiles are searched by the lookup lambdas.
The output are s3 objects of the clipped and resampled tiles stored in the defined output. Not using the polygons argument will run the simplified version of the step functions. This basically moves the raster source tiles to a defined output location in the machine learning bucket. Output is always stored in the GeoTiff format.
You can also generate DynamoDB references for the tiles by running the simplified version with upload as "true".
Lambdas
The lambdas orchestrated by the Step function are the following:
- The prefix generator lambda GenerateTilesPrefix creates the prefix for the tile dataset.
- The tile bounds reader lambda GenerateTileBounds read the tiles and generates a GeoJSON with the geographic tile demarcation. The output can be used for intersection operations without loading the raster data.
- The generator lambda GeneratePolygonIterator builds the array of polygons and their intersected tiles. This lambda feeds into the Map State of this step function to perform the parallel processes.
- The resampled tiles are created and cropped (if applicable) by the GenerateInferenceDataOnPolygon lambda. This lambda runs in an iterator on the number of features created by the previous lambda.
Invocation description
The following keys are required to invoke the step function:
job_idstr - job id reference.prefix: str - path to the raster data.output_keystr - object path to output S3 location.
The following keys are optional to invoke the step function:
clipbool - setting this will determine if raster data is cropped or only intersected with the data inpolygons. This parameter is of course ignored whenpolygonsaren't present.polygonsstr - path to the crop polygons, takes a singular geometry file or a prefix to a shapefile.resolutionstr, float, optional - be aware that a full 4326 dataset will fail if the user sets a resolution in UTM (in other words; centimeters). A warning will be thrown and the dataset will contain empty files. Setting this to"source"will use the source raster resolution.uploadbool - switch on whether output should be written to s3. Only functions in simplified mode.
Example
{
"clip": false,
"job_id": "20240926085555-1337-e0af5713b68944cbbd0fc05c6f38f79a",
"output_key": "path/to/output/",
"polygons": "/path/to/polygon.shp",
"prefix": "path/to/raster/data/",
"resolution": "source"
}