Utilities
Utilities#
- class geoutils.measure_distance(ds, SCL_val, lon, lat, plot=True)[source]#
Function to calculate the distance from a specified longitude and latitude to the scene classification specified. The scene classification is taken from the first index on the Sentinel-2 SCL band in the fused result. The function returns the distance in meters, and plots the distance in a circle around the specified location.
To use this function, you must pass a Sentinel-2 dataset with the SCL band already fused. Please ensure that clouds are minimal or nonexistant, as that can impact the location of the scene classificaitons in the SCL band.
- Parameters
ds (xr.Dataset) – Dataset to measure
SCL_val (int) – Sentinel SCL band number representing classification to measure distance to
lon (float) – Longitude of point of interest to measure from
lat (float) – Latitude of point of interest to measure from
plot (bool, optional) – Plot figure of distance measure
Default:True
- Return type
float
- Returns
Minimum distance from point of interest to the specified classification
- class geoutils.cluster(dataset, n_clusters=5, variable_prefixes=None, save=False, save_path=None)[source]#
Function to perform K-means clustering on an area of interest (AOI) dataset. This function takes an input dataset and performs K-means clustering on it, returning a clustered dataset. Optionally, you can save the clustered image to a specified directory.
- Parameters
dataset (xarray.Dataset) – The input dataset containing data to be clustered.
n_clusters (int, optional) – The number of clusters to create, default is 5.
Default:5
variable_prefixes (list, optional) – A list of variable prefixes to use for clustering. If not specified, all variables in the dataset will be used.
Default:None
save (bool, optional) – Whether to save the clustered image, default is False.
Default:False
save_path (str, optional) – The directory path to save the clustered image to. Required if save is set to True.
Default:None
- Return type
xarray.Dataset
- Returns
The clustered dataset with an additional ‘cluster’ DataArray representing the cluster labels.
- class geoutils.plot_clustered_dataset(clustered_dataset, n_clusters)[source]#
Plot a clustered dataset using the viridis colormap.
- Parameters
clustered_dataset (xarray.Dataset) – The clustered dataset to plot.
- class ml_utils.chunk_result(img, input_shape, pad_mode='reflect')[source]#
A function to create splits out of a particular size from a given image. images are split up row wise, i.e - row1 split up, row2 split up and so on
- NOTE - padding is added in case the image can’t be split into equal parts
padding is added on the right and the bottom of the image, padding type is reflected by default
The function assumes that the input will always be of 4 dimensions which pertain to [channels, time-step, height, width]. Expand any missing dimensions as 1 before passing the data
- Parameters
img – image to be split up [C x T x H x W]
input_shape – size of the split [size_h, size_w]
- Returns
a list containing the split up images
- Return type
splits
- class ml_utils.create_tf_record(chunks, save_path, save_coords=False, save_attrs=False)[source]#
A function to save “chunked” results into the “tfrecord” format
The function assumes that the input will always be of 4 dimensions which pertain to [channels, time-step, height, width]. Expand any missing dimensions as 1 before passing the data
- Parameters
chunks (list of xarray datasets) – List of xarray datasets with equal dimensions
save_path (str) – Path to the directory to save the tfrecord file
save_coords (bool, optional) – Boolean to save coordinates from the datasets
Default:False
save_attrs (bool, optional) – Boolean to save metadata and attributes from the datasets
Default:False
- Returns
Dictionary mapping feature name to the tf.io.FixedLenFeature as they are stored out_types_dict (dict) : Dictionary mapping feature name to the data type to which it needs to be decoded shapes_dict (dict) : Dictionary mapping feature name to the the shape it needs to be decoded to, empty tuple for scalar data
- Return type
features_dict (dict)
- class ml_utils.load_img(example_proto, features_dict, out_types_dict, shapes_dict)[source]#
Function to map data from a saved tfrecord to the accompanying saved dictionaries. This function is meant to be used in conjunction with the tf.data API when loading in the dataset as a tfrecord.
- Parameters
example_proto (str) – Single example (data sample) from the tfrecord
features_dict (dict) – Dictionary mapping feature name to the tf.io.FixedLenFeature as they are stored
out_types_dict (dict) – Dictionary mapping feature name to the data type to which it needs to be decoded
shapes_dict (dict) – Dictionary mapping feature name to the the shape it needs to be decoded to, empty tuple for scalar data
Example
>>> dataset = tf.data.TFRecordDataset(save_file_path) >>> dataset = dataset.map( >>> lambda example_proto: ml_utils.load_img( >>> example_proto, features_dict=features_dict, out_types_dict=out_types_dict, shapes_dict=shapes_dict >>> ) >>> )
- class ml_utils.combine_bands(example_data, input_bands, output_bands)[source]#
Function to stack the input and output bands from the tfrecord dataset.
- Parameters
example_data (str) – Single example (data sample) from the tfrecord
input_bands (list of strings) – List of bands used as model inputs
output_bands (list of strings) – List of bands used as model outputs
Example
>>> input_bands = ["S2_RED", "S2_GREEN", "S2_BLUE"] >>> output_bands = ["S2_SCL"] >>> dataset = dataset.map(lambda example_data: ml_utils.combine_bands(example_data, input_bands, output_bands))