Training Data AWS Step Function
The step function machine-learning-build-training-data builds the training dataset necessary for training crop detection models. The training data is generated from the annotated treecount and overlapping RGB tiles.
Lambdas
The lambdas orchestrated by the Step function are the following:
- The image finder lambda GetImageTiles creates an Iterator with available tiles for training data generation.
- Tiles can be recorded into DynamoDB with the RunSimpleTile lambda using the
simple_handler. - The bounding boxes of the annotations are generated by the RunAnnotationTask lambda. The size of the box is determined by its
buffer_size. Results are written to DynamoDB. This lambda will be retried during processing if it times out. A tile of the array of tiles is consumed when an iteration completes. Certain records will be written as payloads. These records don't matter too much for converting results into geometries. - The CollectAnnotations lambda collects the information from DynamoDB and generates a file which can be loaded into ML processing. This file contains the bbox coordinates and s3 keys to the tiles.
Invocation description
The following keys can be used to invoke the step function:
job_idstr - job id reference.prefixstr - prefix to the RGB data.buffer_sizefloat - size of the square in meters, for a 25cm2 box use buffer 2.5cmsimplebool - run a lambda to record the tiles in DynamoDB. A simplified form, which will keep the original state of the tile.culturestr - specify the name of the label. Defaults to Tree.edge_casesbool - perform an intersect instead of a contain.output_formatstr - set the output format.output_keystr - an output directoy to write the annotations to. Preferably this it the same folder of the RGB tiles. Will write to the metadata folder by default.pointsstr - prefix to the treecount data, use full path to shape to access larger datasets. Can also be defined as an AuroraDB table usingworkspace:layername:date.resolutionfloat | str - resampling resolution in meters.tile_sizeint - output tile dimensions in pixels.uploadbool - store the new tiles in a bucket.
Example
A file example:
{
"simple": false,
"culture": "Tree",
"edge_cases": true,
"job_id": "20240919130536-1337-fb3b1c868c4f4fb4bce96fbcc3fe3e9d",
"prefix": "path/to/rgb/20200202/",
"buffer_size": 0.4,
"points": "workspace:treecount:20240902",
"resolution": 0.1,
"tile_size": 500,
"output_key": "test/live_test/",
"output_format": "GTiff",
"upload": true
}
An AuroraDB example:
{
"simple": true,
"culture": "Tree",
"edge_cases": true,
"job_id": "20240919130536-1337-fb3b1c868c4f4fb4bce96fbcc3fe3e9d",
"prefix": "path/to/rgb/20200202/",
"buffer_size": 0.3,
"points": "workspace:treecount:20240902",
"resolution": "source",
"tile_size": null,
"output_key": "test/live_test/",
"output_format": "GTiff",
"upload": true
}
WARNING Use UTM projection! Since resolution and buffer size work in meters, don't attempt to ingest spherical coordinate systems. Everything will fail.