Content Summary — All About Data Science

Jed Lee
3 min readFeb 3, 2022

--

Wordcloud of ‘Data’ in different languages

As I was pondering what cover photo should I have on for this article, I did not want to go with a generic tech-looking or photos full of numbers and visualizations or any of those you can find in Google.

I decided to just create this simple word cloud on my own instead. To me, Data is akin to learning a new language, a big mix and often, a mess of information cluttered together, and most importantly, as a form of communication.

My passion for Data Science first came from my interest in Math, the little joy I got from calculating the right probability and using it in the real world. In the spirit of writing this article during the Lunar New Year, let me give an example from a very popular game played during Lunar New Year, called Blackjack.

Blackjack, also known as Twenty-one.

Ideally, you would want to hit a 21 at your first hand. The probability of getting a blackjack is 4.83%, and that is equivalent to about 1 in 21 blackjack hands. This is exactly why the game of blackjack is sometimes called Twenty-one!

These interesting applications of numbers and probabilities in my daily decision making fascinated me. This motivated me to pursue my current major, Business Analytics, at the National University of Singapore, where it sparked my passion for Data Science and Analytics.

I am writing these articles to firstly consolidate my learning over the years during my undergraduate studies as well as my internships, and secondly to simplify Data Science concepts and contextualize them. I do like to write in general and it warms the cockles of my heart when I am able to tell a good story.

This list is heavily under construction and definitely not exhaustive. I sincerely hope that this will be helpful to you one way or another.

Here is the Content Summary:

Data Science & Statistics

  1. Hypothesis Testing
  2. Normal Distribution
  3. Central Limit Theorem
  4. Sampling
  5. Different Types of Errors
  6. Linear Regression and its Assumptions
  7. Bias-Variance Trade-Off
  8. Confusion Matrix
  9. ROC Curve
  10. Correlation and Covarience
  11. A/B Testing
  12. P-Value/Significance Value
  13. [Code in Python] Working with Outliers
  14. [Code in Python] Treating Outliers & Missing Data — Using scipy, sklearn.impute Library

Data Analysis & Data Visualization

  1. Data Cleaning
  2. Exploratory Data Analysis
  3. Python or R
  4. Business Intelligence Tools (PowerBI vs Looker vs Tableau)
  5. Web UI Tools (Dash vs Streamlit)
  6. Charts
  7. Time Series
  8. Forecasting
  9. Granger Causality

Machine Learning

  1. What is Machine Learning?
  2. Supervised vs Unsupervised Learning
  3. Reinforcement Learning
  4. Linear Regression
  5. Logistic Regression
  6. Decision Tree
  7. SVM
  8. Naive Bayes
  9. kNN
  10. K-Means
  11. Random Forest
  12. Gradient Boosting Algorithms
  13. [High-Level Overview] Principal Component Analysis
  14. [Code in Python] Principal Component Analysis — Using sklearn & pca Library

Miscellaneous

  1. Interview Questions

I really appreciate every feedback that you have for me and do drop it in the comments should you have any specific request or content you want to see.

If you ever find my content helpful to you, please feel free to bookmark this page! I will be constantly editing and updating this page with every article I wrote. ❤

Feel free to drop me a message and connect with me on LinkedIn as well and I am more than happy to share more!!!

--

--

Jed Lee
Jed Lee

Written by Jed Lee

Passionate about AI & NLP. Based in Singapore. Currently a Data Scientist at PatSnap.

No responses yet