How To Cut A Dendrogram In Python

In hierarchical clustering, a dendrogram is a tree-like diagram that shows the process of merging clusters, with each cluster being represented by a node. It can help you visualize the relationships between clusters and decide the appropriate number of clusters for your data. In this tutorial, we will learn how to cut a dendrogram in Python using the SciPy library.

Step 1: Install the required libraries

First, you need to install numpy, scipy, and matplotlib if you haven’t already. You can do this using pip:

Step 2: Import the necessary libraries

Next, you need to import the required libraries:

Step 3: Generate some sample data

For this tutorial, we will generate some random data points:

Step 4: Perform Hierarchical Clustering

Now, we need to perform hierarchical clustering on the data using the linkage function from scipy:

Here, we have used the Ward’s method as the linkage method for clustering.

Step 5: Plot the dendrogram

Before cutting the dendrogram, let’s plot it first to visualize the hierarchical clustering:

This code will generate the dendrogram for the given data.

Step 6: Cutting the dendrogram

You can cut the dendrogram at a specific distance or at a specific number of clusters. In this example, we will cut the dendrogram at a maximum distance of 1.5.

Here, fcluster function takes the linkage matrix Z, the maximum distance, and the cutting criterion as input arguments.

Step 7: Visualize the clustered data

After cutting the dendrogram, we can visualize the clustered data:

This will show the data points with different colors representing different clusters.

Full code

Output

Conclusion

In this tutorial, we have learned how to cut a dendrogram in Python using the SciPy library. Cutting a dendrogram helps in deciding the number of clusters in hierarchical clustering. By following these steps, you can visualize and analyze the hierarchical clustering of your data more effectively.