When do we NOT use Euclidean Distance?

Jed Lee
6 min readNov 17, 2022

Understanding Euclidean through its antipode.

Personal Photo from my recent trip to Santa Monica in June 2022

Content of this Article

  1. Introduction
  2. Importance of Distance Measures
  3. When do we NOT use Euclidean Distance?
  4. Not Scale In-variant
  5. Curse of Dimensionality
  6. Sparsity
  7. Text Data
  8. Working with Categorical Variables
  9. Conclusion

Introduction

Euclidean Distance is the most common distance measure used to determine the distance between 2 points.

The most intuitive way to understand Euclidean Distance is simply to imagine 2 points right in front of you and try to determine the distance between them. You would probably draw a line connecting the 2 points and measure the length of that line. That is Euclidean Distance!

To a layman, it is literally called Distance. You do not even need to tag the word “Euclidean” even. This distance is so widely used that it is the default metric that the Sckit Learn library uses for K-Nearest Neighbour.

Importance of Distance Measures

So, why are we so concerned with finding the distance between 2 points? Determining the distance between 2 data points can be very telling of the relationship they share. A relatively short or close distance could imply that the 2 data points are similar, and vice versa. Many machine learning algorithms, whether supervised or unsupervised, make use of distance measures.

Photo by John Cameron on Unsplash

To give a simple example, imagine you want to investigate if there are any relationships between different shoppers in a grocery store. Based on their purchasing habits, you can probably identify if a shopper is a vegetarian or not, if a shopper buys for a big or small family, etc.

When do we NOT use Euclidean Distance?

Now, let us get to the main content of this article — When do we NOT use Euclidean Distance?

I will explore the various limitations and scenarios where using Euclidean Distance might not be desirable.

Not Scale-Invariant

One of the key limitations of Euclidean distance is that it is not scale-invariant, which means that distances computed might be skewed depending on the units of the features. That means if you are working with feet and inches, or pounds and kilograms, you have to have them on the same scale. Well, there is a very simple fix to this and that is to standardise/normalise your data.

Curse of Dimensionality

Another significant limitation of Euclidean distance is that it suffers from the curse of dimensionality. As the number of dimensions in your vector grows, taking the euclidean distance between 2 such vectors does not make as much sense anymore.

A great summary of non-intuitive results in higher dimensions comes from “A Few Useful Things to Know about Machine Learning” by Pedro Domingos from the University of Washington. Too many dimensions can cause every observation in your data set to appear equidistant from all the others.

Do check out this discussion on StackExchange that elaborates on the various efforts to resolve the curse of dimensionality.

In conclusion, there are no distance metrics that are impervious to the curse of dimensionality. I would recommend reducing the dimensionality of your data using Principal Component Analysis (PCA) and/or Backward Feature Elimination, etc.

Sparsity

Euclidean distance does not work as well when the data is sparse. As a rule of thumb, for Euclidean distance to be usable, the vectors should be non-zero in ~75% of attributes. This applies to any other norm-induced distance measures such as Manhatten, Minkowski, and Chebyshev.

A common solution is to use distance measures such as Cosine Similarity. On some data, they work very well. That is because it only looks at attributes where both vectors are non-zero.

A quick clarification on the terminology here. We compare Euclidean Distance with the Cosine Similarity, not the Cosine Distance.

Cosine Distance = 1 - Cosine Similarity

The intuition behind this is that if 2 vectors are perfectly the same then the similarity is 1 (angle = 0) and thus, the distance is 0 (1–1 = 0).

Contrary to many claims that using Cosine Similarity is able to resolve the curse of dimensionality, that is unfortunately not true. Euclidean distance is commonly used on dense, continuous variables. There, every dimension matters and a 25-dimensional space can be very challenging. Cosine Similarity is mostly used on very sparse, discrete domains such as text. Here, most dimensions are close to, or 0. This is why some argue that Cosine Similarity is able to work well with high dimensions.

Text Data

That brings me to another important domain where other distance measures are preferred over Euclidean distance.

When it comes to text data, Cosine and Jaccard Similarity are the 2 most commonly used text similarity measures. Euclidean distance depends on the vector’s magnitude whereas Cosine Similarity depends on the angle between the vectors. The angle measure is more resilient to variations of occurrence counts between terms that are semantically similar, whereas the magnitude of vectors is influenced by occurrence counts and heterogeneity of word neighbourhoods. Even if the two text documents are far apart by Euclidean distance, there are still chances that they are close to each other in terms of their context.

Let me give you an example! Going back to the grocery store analogy I gave above. Let’s say you want to compare users for product recommendations:

Shopper A bought 1x milk, 1x toilet paper and 1x vegetable.
Shopper B bought 100x milk, 100x toilet paper and 100x vegetables.
Shopper C bought 1x milk, 1x meat and 1x fruit.

By Cosine Similarity, Shopper A and Shopper B are more similar. By Euclidean Similarity, Shopper C is more similar to Shopper A.

Comparing Cosine and Jaccard Similarity, Jaccard Similarity takes only a unique set of words for each sentence while Cosine Similarity takes the total length of the vectors. This means that if you repeat a specific word in one sentence several times, Cosine Similarity changes but Jaccard Similarity does not. Hence, Jaccard Similarity is suitable for cases where duplication does not matter, whereas Cosine Similarity is suitable for cases where duplication matters while analyzing text similarity. Here is a more in-depth Medium article that further explores the differences between the 2.

Working with Categorical Variables

Many real-world questions and problems involve data that are mixed. For example, if we study shoppers at a grocery store, the number of items and the total order amount will be our continuous variables, but we might also be interested in if they drove here or if they are shopping alone or with the company. Those will be categorical variables.

Unfortunately, Euclidean distance does not make sense when your data is discrete. It is unable to measure any relationship between categorical variables. The reason for that is that there is no known ordering between categorical variables.

If you are considering encoding your categorical variables, that might not work. If you dummy-encode blue as 1 and orange as 2 and green as 3, you are essentially implying a hierarchical order of the variables that might not be the case. You may suggest we one-hot encode our categorical variables, but then you will end up dealing with sparse data with high dimensions.

Hence, there is another set of distance measures we can use. They are the Hamming Distance and Gower’s Distance.

For Pure Categorical Variables, Hamming Distance will be preferred. Below is a very intuitive visual showing how Hamming Distance works.

From Sankhasubhramondal’s Medium Article

For Mixed Variables, Gower’s Distance, also known as Gower’s Dissimilarity, will be preferred. Gower’s distance is computed as the average of partial dissimilarities across individuals. Each partial dissimilarity ranges from 0 to 1. Partial dissimilarities computation depends on the type of variable being evaluated. This implies that a particular standardization will be applied to each feature, and the distance between two individuals is the average of all feature-specific distances. For a more in-depth exploration of Gower’s Distance, please check out this Medium Article by Thomas Filaire. A python implementation of Gower’s Distance can be found here.

Conclusion

In conclusion, while I had barely skimmed through the surfaces of these distance measures, I do hope I had helped you better understand the various applications of these different distance measures. Do keep in mind I did not cover every distance measures, such as Manhatten, Minkowski, Chebyshev, Haversine, etc. There are multiple articles online that will satisfy your curiosity.

Despite the various “attacks” on Euclidean distance, it is still the most widely used and preferred distance measure simply because of how intuitive and interpretable it is.

Thanks so much for reading my article!!! Feel free to connect with me on LinkedIn. Cheers!

--

--

Jed Lee

Passionate about AI & NLP. Based in Singapore. Currently a Data Scientist at PatSnap.