unsupervised classification with R

unsupervised classification with R

written by Thorsten Dahms

January 29, 2016

Here we see three simple ways to perform an unsupervised classification on a raster dataset in R. I will show these approaches, but first we need to load the relevant packages and the actual data. You could use the Landsat data used in the “Remote Sensing and GIS for Ecologists” book which can be downloaded here.


# loading the layerstack  
# here we use a subset of the Landsat dataset from "Remote Sensing and GIS for Ecologists" 
image <- stack("path/to/raster")
plotRGB(image, r=3,g=2,b=1,stretch="hist")


Now we will prepare the data for the classifications. First we convert the raster data in a matrix, then we remove the NA-values.

## returns the values of the raster dataset and write them in a matrix. 
v <- getValues(image)
i <- which(!is.na(v))
v <- na.omit(v)

The first classification method is the well-known k-means method. It separates n observations into  k clusters. Each observation belongs to the cluster with the nearest mean.

## kmeans classification 
E <- kmeans(v, 12, iter.max = 100, nstart = 10)
kmeans_raster <- raster(image)
kmeans_raster[i] <- E$cluster


The second classification method is called clara (Clustering for Large Applications). It work by clustering only a sample of the dataset and then assigns all object in the dataset to the clusters.

## clara classification 
clus <- clara(v,12,samples=500,metric="manhattan",pamLike=T)
clara_raster <- raster(image)
clara_raster[i] <- clus$clustering


The third method uses a random Forest model to calculate proximity values. These values were clustered using k-means. The clusters are used to train another random Forest model for classification.

## unsupervised randomForest classification using kmeans
vx<-v[sample(nrow(v), 500),]
rf = randomForest(vx)
rf_prox <- randomForest(vx,ntree = 1000, proximity = TRUE)$proximity

E_rf <- kmeans(rf_prox, 12, iter.max = 100, nstart = 10)
rf <- randomForest(vx,as.factor(E_rf$cluster),ntree = 500)
rf_raster<- predict(image,rf)


The three classifications are stacked into one layerstack and plotted for comparison.

class_stack <- stack(kmeans_raster,clara_raster,rf_raster)
names(class_stack) <- c("kmeans","clara","randomForest")


Comparing the three classifications:

Looking at the different classifications we notice, that the kmeans and clara classifications have only minor differences.
The randomForest classification shows a different image.


want to read more about R and classifications? check out this book:

you may also like:

New PhD student Adomas Liepa

New PhD student Adomas Liepa

I started my academic career in Bergen, Norway where I studied geophysics. During my bachelor's degree I became more interested in Earth's surface and surface dynamics, rather than the interior of the Earth, which is what geophysics focuses on. After obtaining my...

Merry Christmas and a Happy New Year 2021

Merry Christmas and a Happy New Year 2021

An unprecedented year with various unexpected events and many required changes had to be managed by our department like by many other organizations as well. A challenging year is coming to an end. We at the Department of Remote Sensing at the University of Würzburg...

most recent news:

New researcher Pawel Kluter

New researcher Pawel Kluter

Pawel Kluter joined the Department of Remote Sensing as a Research Associate in November 2020. His main role is the deployment of Data Cubes in cloud environments (Front End / Back End), as well as the development of remote sensing processing routines using Python....

New PostDoc Dr Insa Otte

New PostDoc Dr Insa Otte

We are very happy to welcome Insa Otte at the Department of Remote Sensing as a new research fellow. Before joining the department, Insa worked on rainfall in-situ data and focused on extreme events. But generally, she has a great interest and experience in...