Comparison of Clustering Performance for both CPU and GPU

Comparison of Clustering Performance for both CPU and GPU

comparison of clustering performance

A Benchmark: K-Means Algorithms for Scikit-Learn and TensorFlow-GPU

When you would like to have a deeper understanding of a phenomenon, one approach can be putting them in groups to make it easier to comprehend. As an example, you might want to view movies by title, while another person might view it by genre. How you decide to group them guides you to learn more about them.

In machine learning, dividing the data points into a certain number of groups is called clustering. These data points do not have initial labels. For that reason, clustering is the grouping of unlabeled data points into groups to figure them out in a more meaningful way.

There exist three kinds of clustering as Hierarchical Clustering (Divisive, Agglomerative), Partitional Clustering (Centroid, Model-Based, Graph-Theoretic, Spectral) and Bayesian Clustering (Decision Based, Nonparametric). In this post, we will be using Partitional Clustering>Centroid as the clustering algorithm.

For more deeper look for clustering techniques, please visit MIT-edu notes.

Throughout this post, the aim is to compare the clustering performances of Scikit-Learn (random, k-means++) and TensorFlow-GPU (k-means++, Tunnel k-means) algorithms by means of their execution times and print them in a comparison matrix by providing corresponding system specs. Since the content of the data is not the focus of this benchmark study, the required data sets will be randomly generated.

K-means Clustering

The purpose of k-means clustering is to partition some number of observations n into k number of centroids in a non-overlapping manner. The number of data points are expected to be as homogeneous as possible inside the cluster and as heterogeneous as possible outside the cluster. For more theoretical explanations, please kindly visit this webpage.

The algorithm works when a group of center points is available by following the below steps:

  1. Existing clusters are updated in order to include the nearest data points in distance to every centroid.
  2. The center points are iteratively recomputed as the average of all data points involving in a cluster.

Throughout this post, three different environments (2 different CPU-powered system, 1 GPU-powered system) will be tested. The system specs are added below.

First CPU-Powered System Specs:

Processor: Intel(R) Core(TM) i7–8850H
Processor Speed: 2.60 GHz
Physical Cores: 6
Logical Processors: 12
Physical Memory (RAM): 64 GB

 

Second CPU-Powered System Specs:

Processor: Intel(R) Core(TM) i9–10900
Processor Speed: 3.70 GHz
Physical Cores: 10
Logical Processors: 20
Physical Memory (RAM): 256 GB (8 x 32GB DDR4 3000 MHz)

 

GPU-Powered System Specs:

Processor:Intel(R) Core(TM) i9–10900
Processor Speed: 3.70 GHz
Graphics Processors (GPU): NVIDIA GeForce® RTX 2080Ti 352-bit 11GB GDDR6 Memory (2)

1. CPU-based K-means Clustering

Central Processing Unit (CPU) is the crucial part computer where most of the processing and computing performs inside.

For the further coding part, we will be using the Python programming language (version 3.7). Both PyCharm and Jupyter Notebook can be used to run Python scripts. For the ease of use, Anaconda can be installed since it includes all required IDE’s and vital packages with itself.

As an initial step, we will be installing the required packages that will be required CPU implementation.

pip install psutil
pip install numpy
pip install matplotlib
pip install scikit-learn

After installing psutil, numpy, matplotlib, sklearn we will import the packages in order to be able to benefit from their features.

import platform
import time
import psutil as ps
import platform
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

Before start clustering, CPU information and their current CPU usage percentages can be checked with the following code snippet.

print("Physical cores:", ps.cpu_count(logical=False))
print("Logical cores:", ps.cpu_count(logical=True),"\n")

print("CPU Usage Percentages:", "\n")
for i, percentage in enumerate(ps.cpu_percent(percpu=True)):
    print(f"CPU {i}: {percentage}%")
    
print("\n")
print(f"Current CPU Usage in Total: {ps.cpu_percent()}%")
comparison of clustering performance
First CPU System Info
comparison of clustering performance
comparison of clustering performance
Second CPU System Info

There are two types of k-means algorithm that is existent within Kmeans() function with the parameter “init= random” or “init=kmeans++”. In below, firstly “init = random” which stands for selecting k observations in a random manner will be tested.

“n_clusters” parameter stands for the number of clusters the algorithm will split into. When we assign “n_clusters” into “100”, it signals algorithm to distribute all data points into most appropriate 100 centroids. To be able to control the randomness in every run, “random_state” parameter shall be set to an integer number; otherwise, a different random group will be selected in every run which will be harder to compare with the previous outputs.

kmeansRnd = KMeans(n_clusters=100, init='random',random_state=12345)

After setting cluster parameters, random numbers are generated with 5 columns and 500 K rows.

CPU_Test_1 = np.random.normal(size=(500000,5))

Now, we have 5-columned and 500 K-rowed randomly generated data. With “fit_predict” function, cluster centers and cluster indexes will be calculated for each data point by feeding our data named “CPU_Test_1”.

kmeansRnd.fit_predict(CPU_Test_1)

Visualization of the samples and the number of clusters are added by using “plotly” package of python. The results of this plotting script are added at the end of the system-based results in the upcoming sections of the post while depicting the data points and the clusters.

plt.scatter(CPU_Test_1[:,0], CPU_Test_1[:,1], s=1)
plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s=5, c="r")
plt.title("K-means (Random) with Scikit Learn", "\n")
plt.show()

Execution times of the algorithms can be tracked by adding the following code lines outside of our algorithm.

start_time = time.time()
# ... clustering code block 
end_time = time.time()
print(end_time - start_time)

1.1. K-means (random) Clustering with Scikit-Learn

Scikit Learn K-means — random algorithm chooses the first center cluster in a random way which might cause a higher in-cluster variance compared to k-means++ algorithm.

First CPU-Powered System Results:

Algorithm Execution Time: 2505.26 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
Scikit Learn, K-means (random) Clustering with 100 clusters

Second CPU-Powered System Results:

Algorithm Execution Time: 1395.10 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
Scikit Learn, K-means (random) Clustering with 100 clusters

1.2. K-means (kmeans++) Clustering with Scikit-Learn

Scikit Learn K-means — kmeans++ algorithm chooses the first center cluster in a more logical way which can lead to a more accelerated clustering performance afterward.

First CPU-Powered System Results:

Algorithm Execution Time: 2603.74 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
Scikit Learn, K-means (kmeans++) Clustering with 100 clusters

Second CPU-Powered System Results:

Algorithm Execution Time: 1384.73 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
Scikit Learn, K-means (kmeans++) Clustering with 100 clusters

2. GPU-based Clustering

Tensorflow library is developed to be used for massive volumes of numerical computations. It supports both CPU and GPU according to the installed version of your environment. If you wish to enable your GPU(s), the version that is needed to be installed is TensorFlow-GPU.

pip install tensorflow-gpu

In order to be able to test the availability of GPU information by using TensorFlow library, the following code snippet can be used.

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
tf.config.list_physical_devices('GPU')

Output:

comparison of clustering performance

After viewing GPU information by using the below script, more detailed device information, including CPU(s), can be printed.

from tensorflow.python.client import device_lib as dev_lib
print (dev_lib.list_local_devices())

Output:

comparison of clustering performance

2.1. KMeansTF (kmeans++) Clustering with TensorFlow-GPU

TensorFlow k-means algorithm routes the negative effects of randomly generated initial centroid by iteratively optimizing in-cluster variance.

GPU-enabled Tensorflow k-means algorithm can be used by installing the following kmeanstf package.

pip install kmeanstf

After installing the required package, the following algorithm will be running in a GPU-based environment. Results are added after system specs.

from kmeanstf import KMeansTF
start_time = time.time()
kmeanstf = KMeansTF(n_clusters=100, random_state=12345)
GPU_Test = np.random.normal(size=(500000,5))
kmeanstf.fit(GPU_Test)
end_time = time.time()
print('Kmeans++ execution time in seconds: {}'.format(end_time - start_time))

GPU-Powered System Specs:

Algorithm Execution Time: 219.18 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
KMeansTF (kmeans++) Clustering with 100 clusters

2.2. TunnelKMeansTF Clustering with TensorFlow-GPU

Tunnel k-means algorithm has the capability to achieve non-local shifts between clusters to find the most optimum centroids for each data point.

start_time = time.time()
kmeansTunnel = TunnelKMeansTF(n_clusters=100, random_state=12345)
GPU_Test = np.random.normal(size=(500000,5))
kmeansTunnel.fit(GPU_Test)
end_time = time.time()
print('Tunnel Kmeans execution time in seconds: {}'.format(end_time - start_time))

GPU-Powered System Specs:

Algorithm Execution Time: 107.38 seconds
Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

comparison of clustering performance
TunnelKMeansTF Clustering with 100 clusters

3. Final Comparison Matrix

To have a clearer understanding of the outputs of the algorithms, a matrix table is generated.

comparison of clustering performance
System Features vs Algorithm Run-Times

As we can observe in the comparison matrix, the GPU-powered system has a distinctive performance advantage by means of execution times.

The full implementation code and Jupyter Notebook is available on my GitHub.

comparison of clustering performance
17/03/2020

Reading Time: 9 minutes

A Benchmark: K-Means Algorithms for Scikit-Learn and TensorFlow-GPU

When you would like to have a deeper understanding of a phenomenon, one approach can be putting them in groups to make it easier to comprehend. As an example, you might want to view movies by title, while another person might view it by genre. How you decide to group them guides you to learn more about them.

In machine learning, dividing the data points into a certain number of groups is called clustering. These data points do not have initial labels. For that reason, clustering is the grouping of unlabeled data points into groups to figure them out in a more meaningful way.

There exist three kinds of clustering as Hierarchical Clustering (Divisive, Agglomerative), Partitional Clustering (Centroid, Model-Based, Graph-Theoretic, Spectral) and Bayesian Clustering (Decision Based, Nonparametric). In this post, we will be using Partitional Clustering>Centroid as the clustering algorithm.

For more deeper look for clustering techniques, please visit MIT-edu notes.

Throughout this post, the aim is to compare the clustering performances of Scikit-Learn (random, k-means++) and TensorFlow-GPU (k-means++, Tunnel k-means) algorithms by means of their execution times and print them in a comparison matrix by providing corresponding system specs. Since the content of the data is not the focus of this benchmark study, the required data sets will be randomly generated.

Don’t miss out the latestCommencis Thoughts and News.

    K-means Clustering

    The purpose of k-means clustering is to partition some number of observations n into k number of centroids in a non-overlapping manner. The number of data points are expected to be as homogeneous as possible inside the cluster and as heterogeneous as possible outside the cluster. For more theoretical explanations, please kindly visit this webpage.

    The algorithm works when a group of center points is available by following the below steps:

    1. Existing clusters are updated in order to include the nearest data points in distance to every centroid.
    2. The center points are iteratively recomputed as the average of all data points involving in a cluster.

    Throughout this post, three different environments (2 different CPU-powered system, 1 GPU-powered system) will be tested. The system specs are added below.

    First CPU-Powered System Specs:

    Processor: Intel(R) Core(TM) i7–8850H
    Processor Speed: 2.60 GHz
    Physical Cores: 6
    Logical Processors: 12
    Physical Memory (RAM): 64 GB

     

    Second CPU-Powered System Specs:

    Processor: Intel(R) Core(TM) i9–10900
    Processor Speed: 3.70 GHz
    Physical Cores: 10
    Logical Processors: 20
    Physical Memory (RAM): 256 GB (8 x 32GB DDR4 3000 MHz)

     

    GPU-Powered System Specs:

    Processor:Intel(R) Core(TM) i9–10900
    Processor Speed: 3.70 GHz
    Graphics Processors (GPU): NVIDIA GeForce® RTX 2080Ti 352-bit 11GB GDDR6 Memory (2)

    1. CPU-based K-means Clustering

    Central Processing Unit (CPU) is the crucial part computer where most of the processing and computing performs inside.

    For the further coding part, we will be using the Python programming language (version 3.7). Both PyCharm and Jupyter Notebook can be used to run Python scripts. For the ease of use, Anaconda can be installed since it includes all required IDE’s and vital packages with itself.

    As an initial step, we will be installing the required packages that will be required CPU implementation.

    pip install psutil
    pip install numpy
    pip install matplotlib
    pip install scikit-learn

    After installing psutil, numpy, matplotlib, sklearn we will import the packages in order to be able to benefit from their features.

    import platform
    import time
    import psutil as ps
    import platform
    from datetime import datetime
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.cluster import KMeans

    Before start clustering, CPU information and their current CPU usage percentages can be checked with the following code snippet.

    print("Physical cores:", ps.cpu_count(logical=False))
    print("Logical cores:", ps.cpu_count(logical=True),"\n")
    
    print("CPU Usage Percentages:", "\n")
    for i, percentage in enumerate(ps.cpu_percent(percpu=True)):
        print(f"CPU {i}: {percentage}%")
        
    print("\n")
    print(f"Current CPU Usage in Total: {ps.cpu_percent()}%")
    comparison of clustering performance
    First CPU System Info
    comparison of clustering performance
    comparison of clustering performance
    Second CPU System Info

    There are two types of k-means algorithm that is existent within Kmeans() function with the parameter “init= random” or “init=kmeans++”. In below, firstly “init = random” which stands for selecting k observations in a random manner will be tested.

    “n_clusters” parameter stands for the number of clusters the algorithm will split into. When we assign “n_clusters” into “100”, it signals algorithm to distribute all data points into most appropriate 100 centroids. To be able to control the randomness in every run, “random_state” parameter shall be set to an integer number; otherwise, a different random group will be selected in every run which will be harder to compare with the previous outputs.

    kmeansRnd = KMeans(n_clusters=100, init='random',random_state=12345)

    After setting cluster parameters, random numbers are generated with 5 columns and 500 K rows.

    CPU_Test_1 = np.random.normal(size=(500000,5))

    Now, we have 5-columned and 500 K-rowed randomly generated data. With “fit_predict” function, cluster centers and cluster indexes will be calculated for each data point by feeding our data named “CPU_Test_1”.

    kmeansRnd.fit_predict(CPU_Test_1)

    Visualization of the samples and the number of clusters are added by using “plotly” package of python. The results of this plotting script are added at the end of the system-based results in the upcoming sections of the post while depicting the data points and the clusters.

    plt.scatter(CPU_Test_1[:,0], CPU_Test_1[:,1], s=1)
    plt.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s=5, c="r")
    plt.title("K-means (Random) with Scikit Learn", "\n")
    plt.show()

    Execution times of the algorithms can be tracked by adding the following code lines outside of our algorithm.

    start_time = time.time()
    # ... clustering code block 
    end_time = time.time()
    print(end_time - start_time)

    1.1. K-means (random) Clustering with Scikit-Learn

    Scikit Learn K-means — random algorithm chooses the first center cluster in a random way which might cause a higher in-cluster variance compared to k-means++ algorithm.

    First CPU-Powered System Results:

    Algorithm Execution Time: 2505.26 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    Scikit Learn, K-means (random) Clustering with 100 clusters

    Second CPU-Powered System Results:

    Algorithm Execution Time: 1395.10 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    Scikit Learn, K-means (random) Clustering with 100 clusters

    1.2. K-means (kmeans++) Clustering with Scikit-Learn

    Scikit Learn K-means — kmeans++ algorithm chooses the first center cluster in a more logical way which can lead to a more accelerated clustering performance afterward.

    First CPU-Powered System Results:

    Algorithm Execution Time: 2603.74 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    Scikit Learn, K-means (kmeans++) Clustering with 100 clusters

    Second CPU-Powered System Results:

    Algorithm Execution Time: 1384.73 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    Scikit Learn, K-means (kmeans++) Clustering with 100 clusters

    2. GPU-based Clustering

    Tensorflow library is developed to be used for massive volumes of numerical computations. It supports both CPU and GPU according to the installed version of your environment. If you wish to enable your GPU(s), the version that is needed to be installed is TensorFlow-GPU.

    pip install tensorflow-gpu

    In order to be able to test the availability of GPU information by using TensorFlow library, the following code snippet can be used.

    import tensorflow as tf
    config = tf.compat.v1.ConfigProto()
    tf.config.list_physical_devices('GPU')

    Output:

    comparison of clustering performance

    After viewing GPU information by using the below script, more detailed device information, including CPU(s), can be printed.

    from tensorflow.python.client import device_lib as dev_lib
    print (dev_lib.list_local_devices())

    Output:

    comparison of clustering performance

    2.1. KMeansTF (kmeans++) Clustering with TensorFlow-GPU

    TensorFlow k-means algorithm routes the negative effects of randomly generated initial centroid by iteratively optimizing in-cluster variance.

    GPU-enabled Tensorflow k-means algorithm can be used by installing the following kmeanstf package.

    pip install kmeanstf

    After installing the required package, the following algorithm will be running in a GPU-based environment. Results are added after system specs.

    from kmeanstf import KMeansTF
    start_time = time.time()
    kmeanstf = KMeansTF(n_clusters=100, random_state=12345)
    GPU_Test = np.random.normal(size=(500000,5))
    kmeanstf.fit(GPU_Test)
    end_time = time.time()
    print('Kmeans++ execution time in seconds: {}'.format(end_time - start_time))

    GPU-Powered System Specs:

    Algorithm Execution Time: 219.18 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    KMeansTF (kmeans++) Clustering with 100 clusters

    2.2. TunnelKMeansTF Clustering with TensorFlow-GPU

    Tunnel k-means algorithm has the capability to achieve non-local shifts between clusters to find the most optimum centroids for each data point.

    start_time = time.time()
    kmeansTunnel = TunnelKMeansTF(n_clusters=100, random_state=12345)
    GPU_Test = np.random.normal(size=(500000,5))
    kmeansTunnel.fit(GPU_Test)
    end_time = time.time()
    print('Tunnel Kmeans execution time in seconds: {}'.format(end_time - start_time))

    GPU-Powered System Specs:

    Algorithm Execution Time: 107.38 seconds
    Clusters Visualization: Blue Dots (Data Points), Red Dots (Clusters)

    comparison of clustering performance
    TunnelKMeansTF Clustering with 100 clusters

    3. Final Comparison Matrix

    To have a clearer understanding of the outputs of the algorithms, a matrix table is generated.

    comparison of clustering performance
    System Features vs Algorithm Run-Times

    As we can observe in the comparison matrix, the GPU-powered system has a distinctive performance advantage by means of execution times.

    The full implementation code and Jupyter Notebook is available on my GitHub.

    Pınar Ersoy

    Pınar Ersoy / Data Scientist