Utilities#

class geoutils.measure_distance(ds, SCL_val, lon, lat, plot=True)[source]#

Function to calculate the distance from a specified longitude and latitude to the scene classification specified. The scene classification is taken from the first index on the Sentinel-2 SCL band in the fused result. The function returns the distance in meters, and plots the distance in a circle around the specified location.

To use this function, you must pass a Sentinel-2 dataset with the SCL band already fused. Please ensure that clouds are minimal or nonexistant, as that can impact the location of the scene classificaitons in the SCL band.

Parameters
  • ds (xr.Dataset) – Dataset to measure

  • SCL_val (int) – Sentinel SCL band number representing classification to measure distance to

  • lon (float) – Longitude of point of interest to measure from

  • lat (float) – Latitude of point of interest to measure from

  • plot (bool, optional) – Plot figure of distance measure

    Default: True

Return type

float

Returns

Minimum distance from point of interest to the specified classification

class geoutils.cluster(dataset, n_clusters=5, variable_prefixes=None, save=False, save_path=None)[source]#

Function to perform K-means clustering on an area of interest (AOI) dataset. This function takes an input dataset and performs K-means clustering on it, returning a clustered dataset. Optionally, you can save the clustered image to a specified directory.

Parameters
  • dataset (xarray.Dataset) – The input dataset containing data to be clustered.

  • n_clusters (int, optional) – The number of clusters to create, default is 5.

    Default: 5

  • variable_prefixes (list, optional) – A list of variable prefixes to use for clustering. If not specified, all variables in the dataset will be used.

    Default: None

  • save (bool, optional) – Whether to save the clustered image, default is False.

    Default: False

  • save_path (str, optional) – The directory path to save the clustered image to. Required if save is set to True.

    Default: None

Return type

xarray.Dataset

Returns

The clustered dataset with an additional ‘cluster’ DataArray representing the cluster labels.

class geoutils.plot_clustered_dataset(clustered_dataset, n_clusters)[source]#

Plot a clustered dataset using the viridis colormap.

Parameters

clustered_dataset (xarray.Dataset) – The clustered dataset to plot.

class ml_utils.chunk_result(img, input_shape, pad_mode='reflect')[source]#

A function to create splits out of a particular size from a given image. images are split up row wise, i.e - row1 split up, row2 split up and so on

NOTE - padding is added in case the image can’t be split into equal parts

padding is added on the right and the bottom of the image, padding type is reflected by default

The function assumes that the input will always be of 4 dimensions which pertain to [channels, time-step, height, width]. Expand any missing dimensions as 1 before passing the data

Parameters
  • img – image to be split up [C x T x H x W]

  • input_shape – size of the split [size_h, size_w]

Returns

a list containing the split up images

Return type

splits

class ml_utils.create_tf_record(chunks, save_path, save_coords=False, save_attrs=False)[source]#

A function to save “chunked” results into the “tfrecord” format

The function assumes that the input will always be of 4 dimensions which pertain to [channels, time-step, height, width]. Expand any missing dimensions as 1 before passing the data

Parameters
  • chunks (list of xarray datasets) – List of xarray datasets with equal dimensions

  • save_path (str) – Path to the directory to save the tfrecord file

  • save_coords (bool, optional) – Boolean to save coordinates from the datasets

    Default: False

  • save_attrs (bool, optional) – Boolean to save metadata and attributes from the datasets

    Default: False

Returns

Dictionary mapping feature name to the tf.io.FixedLenFeature as they are stored out_types_dict (dict) : Dictionary mapping feature name to the data type to which it needs to be decoded shapes_dict (dict) : Dictionary mapping feature name to the the shape it needs to be decoded to, empty tuple for scalar data

Return type

features_dict (dict)

class ml_utils.load_img(example_proto, features_dict, out_types_dict, shapes_dict)[source]#

Function to map data from a saved tfrecord to the accompanying saved dictionaries. This function is meant to be used in conjunction with the tf.data API when loading in the dataset as a tfrecord.

Parameters
  • example_proto (str) – Single example (data sample) from the tfrecord

  • features_dict (dict) – Dictionary mapping feature name to the tf.io.FixedLenFeature as they are stored

  • out_types_dict (dict) – Dictionary mapping feature name to the data type to which it needs to be decoded

  • shapes_dict (dict) – Dictionary mapping feature name to the the shape it needs to be decoded to, empty tuple for scalar data

Example

>>> dataset = tf.data.TFRecordDataset(save_file_path)
>>> dataset = dataset.map(
>>>     lambda example_proto: ml_utils.load_img(
>>>         example_proto, features_dict=features_dict, out_types_dict=out_types_dict, shapes_dict=shapes_dict
>>>      )
>>>  )
class ml_utils.combine_bands(example_data, input_bands, output_bands)[source]#

Function to stack the input and output bands from the tfrecord dataset.

Parameters
  • example_data (str) – Single example (data sample) from the tfrecord

  • input_bands (list of strings) – List of bands used as model inputs

  • output_bands (list of strings) – List of bands used as model outputs

Example

>>> input_bands = ["S2_RED", "S2_GREEN", "S2_BLUE"]
>>> output_bands = ["S2_SCL"]
>>> dataset = dataset.map(lambda example_data: ml_utils.combine_bands(example_data, input_bands, output_bands))