Clustering to find homogenous areas
Contents
Clustering to find homogenous areas#
Short description
This notebook performs clustering analysis on Sentinel-2 satellite data, utilizing the B02, B03 and B04 bands to identify and group areas with similar spectral characteristics for further analysis.
In this notebook, you will search for, select, and obtain Sentinel-2 data for one day over a neighborhood in Barcelona, Spain. The selected data will be cloud-free to ensure accurate analysis of the study area. Specific bands, such as the B02, B03 and B04 bands will be calculated and obtained over the region of interest. A clustering analysis will be performed on these bands to group areas with similar spectral characteristics, enabling a deeper understanding of the landscape patterns. This example demonstrates the application of clustering techniques on Sentinel-2 data to identify and visualize distinct land cover types.
1 - Import spacesense object(s) and other dependencies#
[1]:
from spacesense import Client, geoutils
import datetime
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
from skimage import exposure
if "SS_API_KEY" not in os.environ:
from getpass import getpass
api_key = getpass('Enter your api key : ')
os.environ["SS_API_KEY"] = api_key
Enter your api key : ··········
2 - Define AOI and output options#
[2]:
# A neighborhood of Barcelona
aoi = {
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
2.1719121924506055,
41.39760043017927
],
[
2.1647389059867805,
41.39223018500084
],
[
2.1682818096665244,
41.389200339725676
],
[
2.175746693142031,
41.394800520638
],
[
2.1719121924506055,
41.39760043017927
]
]
],
"type": "Polygon"
}
}
]
}
[3]:
# Define the TOI
start_date = "2021-06-16"
end_date = "2021-06-16"
[4]:
client = Client(id="cluster_zones")
3 - Search S2#
[5]:
s2_search_result = client.s2_search(aoi=aoi, start_date=start_date, end_date=end_date, query_filters={"valid_pixel_percentage": {">=": 99}})
s2_search_result.dataframe
WARNING:spacesense.core:start_date and end_date are the same, adding 1 day to end_date
[5]:
id | date | tile | valid_pixel_percentage | platform | relative_orbit_number | product_id | datetime | swath_coverage_percentage | no_data | cloud_shadows | vegetation | not_vegetated | water | cloud_medium_probability | cloud_high_probability | thin_cirrus | snow | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | S2B_31TDF_20210616_0_L2A | 2021-06-16 | 31TDF | 99.91 | sentinel-2b | 008 | S2B_MSIL2A_20210616T103629_N0300_R008_T31TDF_2... | 2021-06-16T10:49:42Z | 100.0 | 0.0 | 0.0 | 1.83 | 98.08 | 0.0 | 0.0 | 0.09 | 0.0 | 0.0 |
[ ]:
#We remove duplicate dates
s2_search_result.filter_duplicate_dates()
4 - Specify bands#
Only selecting bands from S2 that we are interested in. In this urban example, we choose the RGB bands (2,3 and 4)
[6]:
s2_search_result.output_bands = ["B02","B03","B04"]
5 - Obtain S2 data through Fuse function#
[7]:
fuse_result = client.fuse(
catalogs_list=[s2_search_result]
)
fuse_result.dataset
[7]:
<xarray.Dataset> Dimensions: (time: 1, y: 94, x: 93) Coordinates: * time (time) datetime64[ns] 2021-06-16 * y (y) float32 41.4 41.4 41.4 41.4 41.4 ... 41.39 41.39 41.39 41.39 * x (x) float32 2.165 2.165 2.165 2.165 ... 2.175 2.175 2.176 2.176 Data variables: S2_B02 (time, y, x) float32 ... S2_B03 (time, y, x) float32 ... S2_B04 (time, y, x) float32 ... Attributes: transform: [ 1.18459751e-04 0.00000000e+00 2.16473355e+00 0.000... crs: +init=epsg:4326 res: [1.18459751e-04 9.09349018e-05] descriptions: ['B02', 'B03', 'B04'] AREA_OR_POINT: Area _FillValue: nan s2_data_lineage: {"Data origin": "S3 bucket (ARN=arn:aws:s3:::sentinel-c... ulx, uly: [ 2.16473355 41.39765899]
6 - Look at the RGB image#
[8]:
fuse_result.plot_rgb(all_dates = True, brightness_factor = 2)
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

7 - Cluster S2 data with the cluster function#
In this example, we only use a single image. However, the Cluster function can also accept multi-temporal datasets to perform its K-mean clustering.
[9]:
# Select the number of clustering classes you want
n_clusters = 3
# What variables are to be taken into account in the clustering?
variables_to_use = [ "S2_B02", "S2_B03", "S2_B04"]
# In this example we take all the dates from fuse_result.dataset, but you can select a single date as well)
clustered_dataset_RGB = geoutils.cluster(fuse_result.dataset, n_clusters, variables_to_use)
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
8 - Plot the clustered function using the util function#
[10]:
geoutils.plot_clustered_dataset(clustered_dataset_RGB, n_clusters)
