[High-Level Overview] Principal Component Analysis

Jed Lee
5 min readFeb 28, 2022

--

Transforming a large dataset of many variables into a smaller one

Diagram illustrating PCA’s Dimensions. Image by Author

Content of this Article

  1. Brief Introduction to Principal Component Analysis
  2. What is PCA and how do we use it?
  3. What are the benefits of PCA?
  4. The 2 Main Applications of PCA
  5. Limitations of PCA and When do we NOT use PCA?
  6. Conclusion

Introduction

Before we get to unpack what is Principal Component Analysis, or PCA for short, let us answer why you would ever encounter PCA in the first place.

PCA is one of the first few tools you will put in your data science toolbox. PCA is one of the most commonly used unsupervised machine learning algorithms, meaning it is used when we do not have a label or target for each observation in our dataset.

Take a step further and you will venture upon a modification of PCA called the Robust Principal Component Analysis (RPCA). I will not be diving too deeply into RPCA but here is a youtube video by Steve Brunton, who did a wonderful job explaining RPCA and the quintessential math behind it.

What is PCA and how do we use it?

Every dataset is different, and PCA usually comes into the picture when you have a dataset with many different variables!

The idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.

Let us take a look at this dataset about Dogs~!

Dataset of Dog’s Characteristics. Image by Author

This dataset shows you the different physical characteristics of various dog breeds. There are many different variables like Body Length, Weight, Average Life Span, Bark Loudness, etc. The list goes on…

So the question you might ask is, “There are too many variables to consider… Is there a way I can just look at the most important variables?

Translating that into technical terms, that would be “reducing the dimension of your feature space.” Reducing the dimension of the feature space is also called “dimensionality reduction”.

Then you might be wondering, “Can we just remove variables like that?”

Admittedly, there is a trade-off that we are making here. Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity.

Just imagine you have to work with 10, 20 or even 50 different variables. You cannot possibly be working around so many variables! Not all variables are as relevant either!

Take note that it is important to apply business/common sense when removing variables too!

What are the benefits of PCA?

  1. Captures the most “Important” Variables/Features. By The Law of Parsimony, or Occam’s razor, the simplest explanation of an event or observation is the preferred explanation. By reducing the dimension of your feature space, you have fewer relationships between variables to consider and you are less likely to overfit your model.
  2. “De-noise” your data and reduce redundancy. PCA identifies components that explain the greatest amount of variance, hence it can capture the most significant signal in the data and omits the less relevant variables that are noise.
  3. Better Visualization of your Data. It would make your life so much better visualizing a plot on a 2- or 3-dimensional plane.
  4. Better Data Storage & Computational Time of your Data. PCA is used to compress information to store and transmit data more efficiently. Think beyond conventional data points! You can even use PCA to compress images without losing too much quality, or in signal processing.

The 2 Main Applications of PCA

1. Feature Elimination

This needs no further introduction. You eliminate variables/features that are not as significant. The advantages of feature elimination methods include simplicity and maintaining the interpretability of your variables.

2. Feature Extraction

This is the main course of PCA itself. Let’s take the above Dog Dataset for example. Say for example we have their Body Length, Weight, Body Height, Body Width, Body Mass Index etc. Some of these variables may be just combinations of other attributions!

Some of the above variables are not necessary which we can eliminate because they are extraneous as they give us information that we can already derive from other variables. This is where PCA comes in to help us identify those cases where certain variables are not really independent and they are just linear combinations of variables that we already have.

This helps us to shrink the dimensionality of our data without really losing any information.

What are the limitations of PCA and when do we NOT use PCA?

As there are pros, there are cons.

  1. PCA is a Blackbox. PCA models create cryptic black boxes that are very challenging to explain. The principal components derived from PCA are mathematical constructs and are pretty much incomprehensible in logical terms. PCA is therefore frequently abandoned as a method whenever you have to present your results to an outside audience such as management, regulators, etc.
  2. PCA eliminates information. Not every large dataset with many features require PCA treatment. Sometimes, your top 2 or 3 principal components may be able to capture 90% of the variances/information, but that would also mean you are going to forsake 10% of the information. Depending on the context, that 10% may matter to different extents.
  3. PCA can only work with Continuous variables. PCA tries to minimize variance, which is essentially squared deviations. The concept of squared deviations breaks down when you have binary variables.
  4. PCA performs poorer if features are less correlated. If the features are not so correlated, the eigenvalues of the principal components will be lower. While you still can use PCA when you have highly uncorrelated features, your scree plot will not show a normal elbow.
  5. PCA is not robust against outliers. The PCA algorithm will be biased in datasets with strong outliers. It is recommended to treat the outliers before performing PCA.
  6. PCA is sensitive to unscaled data. PCA is a dimensionality reduction technique based on variance. If features are unscaled, those with higher magnitudes may have a higher variance, and PCA may end up giving them more importance. It is recommended to scale the data appropriately performing PCA.

In Conclusion…

I have simply given a High-Level Overview of what is the Principal Component Analysis. I believe a simple search online can easily retrieve resources that dive deeper into the Math and how exactly the PCA Algorithm works.

To sum it up, PCA is a great tool to transform a large dataset of many variables into a smaller one through dimensionality reduction, with the intention that the lower-dimensional space still captures as much of the dynamics of the original space.

Most importantly, it is imperative that one must apply business/common sense when selecting features instead of leaving that entirely on PCA.

I hope you enjoyed this article! Do check out the second part of this article where I will be running through some codes on how you use PCA in Python using 2 different Python libraries.

--

--

Jed Lee
Jed Lee

Written by Jed Lee

Passionate about AI & NLP. Based in Singapore. Currently a Data Scientist at PatSnap.

No responses yet