unsupervised classification with R

unsupervised classification with R

m

January 29, 2016

Here we see three simple ways to perform an unsupervised classification on a raster dataset in R. I will show these approaches, but first we need to load the relevant packages and the actual data. You could use the Landsat data used in the “Remote Sensing and GIS for Ecologists” book which can be downloaded here.

library("raster")  
library("cluster")
library("randomForest")

# loading the layerstack  
# here we use a subset of the Landsat dataset from "Remote Sensing and GIS for Ecologists" 
image <- stack("path/to/raster")
plotRGB(image, r=3,g=2,b=1,stretch="hist")

RGBimage

Now we will prepare the data for the classifications. First we convert the raster data in a matrix, then we remove the NA-values.

## returns the values of the raster dataset and write them in a matrix. 
v <- getValues(image)
i <- which(!is.na(v))
v <- na.omit(v)

The first classification method is the well-known k-means method. It separates n observations into  k clusters. Each observation belongs to the cluster with the nearest mean.

## kmeans classification 
E <- kmeans(v, 12, iter.max = 100, nstart = 10)
kmeans_raster <- raster(image)
kmeans_raster[i] <- E$cluster
plot(kmeans_raster)

Kmeans

The second classification method is called clara (Clustering for Large Applications). It work by clustering only a sample of the dataset and then assigns all object in the dataset to the clusters.

## clara classification 
clus <- clara(v,12,samples=500,metric="manhattan",pamLike=T)
clara_raster <- raster(image)
clara_raster[i] <- clus$clustering
plot(clara_raster)

clara

The third method uses a random Forest model to calculate proximity values. These values were clustered using k-means. The clusters are used to train another random Forest model for classification.

## unsupervised randomForest classification using kmeans
vx<-v[sample(nrow(v), 500),]
rf = randomForest(vx)
rf_prox <- randomForest(vx,ntree = 1000, proximity = TRUE)$proximity

E_rf <- kmeans(rf_prox, 12, iter.max = 100, nstart = 10)
rf <- randomForest(vx,as.factor(E_rf$cluster),ntree = 500)
rf_raster<- predict(image,rf)
plot(rf_raster)

randomForest

The three classifications are stacked into one layerstack and plotted for comparison.

class_stack <- stack(kmeans_raster,clara_raster,rf_raster)
names(class_stack) <- c("kmeans","clara","randomForest")

plot(class_stack)

Comparing the three classifications:

Looking at the different classifications we notice, that the kmeans and clara classifications have only minor differences.
The randomForest classification shows a different image.

 

want to read more about R and classifications? check out this book:

follow us and share it on:

you may also like:

Hackathon within the Super-Test-Site Project

Hackathon within the Super-Test-Site Project

What happens when researchers and developers sit down together to explore a multidisciplinary urban dataset? Our researchers from the EORC joined a hackathon that took place within the Super-Test-Site Project, organised by Prof. Dr. Gunther Gust from the Chair of...

Field Days in the Oberpfalz: Exploring FSME Hotspots

Field Days in the Oberpfalz: Exploring FSME Hotspots

On April 17th and 29th our researchers Sofía and Ariane had two field days in the areas around Amberg and Schwandorf, one of Germany's most well-known TBE (tick-borne encephalitis) risk regions. They joined Prof. Dr. Gerhard Dobler and Dr. Lidia Chitimia-Dobler from...

Johannes Mast has successfully defended his PhD Thesis

Johannes Mast has successfully defended his PhD Thesis

Johannes Mast defended his PhD Thesis titled "Geographical Migration Research using Remote Sensing and Social Media Data" at the Julius-Maximilians-University Würzburg successfully on the 29th of April 2026. We congratulate him very much for his...

EAGLEs at SANParks – Kruger National Park

EAGLEs at SANParks – Kruger National Park

Our EAGLEs Sebastian Rothaug and Clemens Schömig just finished their 2+ months for the internship/InnoLab in Kruger National Park. The work was done with SANparks, Dr. Coetsee and Dr. Wigley within a year-long collaboration of EORC researcher Dr. Bevanda. The...

Fieldwork in Focus: Our New “Hex Wall” Installation

Fieldwork in Focus: Our New “Hex Wall” Installation

At EORC, the transition from physical reality to digital analysis is a core part of our methodology. While our primary output consists of Earth Observation data the foundation of this work is laid in the field. To document this essential aspect of our research, we...

Share This