Why You Are Learning Data Science Wrong — & How To Change Your Learning Habits

Bernard Shaw, once said: “If you teach a man anything, he will never learn”. Learning is an active process — we learn by doing, simple as that.

Jun 01, 2023

Over the last 4 months, I have amassed over 5k followers on my tech instagram. One of the questions that I get asked more than any other question BY FAR, is “how can I learn data science?”

Alas, I decided to dedicate this article to answering just that.

The reality is, if you are just starting to learn data science, there’s a good chance that you may be learning it wrong. I don’t mean to say that you shouldn’t spend time doing python tutorials, data science bootcamps, or watching youtube videos covering the foundations of data science. It IS important to learn the fundamentals & you should! However, if you do not apply this concepts in tandem with your learning, you are vastly selling yourself short in your journey to learn data science efficiently & effectively.

The biggest tip that I can offer for learning data science, is if you want to master the principles that you study about data science, do something with those principles & concepts!

Apply every concept you learn. Make mistakes. Encounter errors.

If you don’t do this, you will quickly forget what you learned in the last tutorial you spent time going through or Data Science for Beginners book you spent time reading. Only knowledge that is used, sticks in your mind.

This is the best and most proven way to effectively learn data science techniques.

Why Is My Way Of Learning Data Science Wrong??

Humans forget at an astonishing rate.

While I was taking courses in data science at UC Berkeley, I often found myself reading a few chapters on statistical modeling or what linear regression was, and I would quickly forget the concepts and have to reread over & over again before my next exam.
However, once I applied these concepts, for example, predicting house prices using linear regression, I was able to understand the concepts so much more. By doing this, you won’t easily forget these concepts.

Far too many people watch tutorial after tutorial, begin a new “Data Science for Beginners” course every other week, or buy every book they possibly can on data science.

🤔Why doesn’t this work? You are cutting your learning potential in half, or even more by studying this way. Studies have shown that people retain information far more by putting into practice what they’re learning & reading.

💥👩🏼‍💻So let’s talk about ACTION

— With a practical data science example

Let's say you're learning about linear regression, a fundamental technique in data science for predicting numerical values.

Here's a practical example of how you you could apply this concept in a small-scale project:

1. Define your problem — Choose a real-world scenario where linear regression can be useful, such as predicting the prices of used cars based on their mileage.

2. Gather data — Collect a dataset of used cars that includes variables like mileage, brand, age, and price. You can find such datasets online or create one yourself by collecting data from local car listings.

3. Exploratory data analysis (EDA) — Understand the relationship between your variables by performing EDA on the data you found or created. Create visualizations such as scatter plots to understand the correlation between mileage & price, for example. Also, identify any outliers or missing values that need to be addressed.

4. Data preprocessing — Clean the data by handling missing values & outliers. Convert categorical variables like car brands into numerical representations using techniques like one-hot encoding.

5. Split the data — Split your dataset into training and testing sets. The training set will be used to build the linear regression model, and the testing set will evaluate its performance.

6. Training your model — Apply linear regression to the training data. Fit the model by estimating the coefficients that minimize the difference between the predicted and actual prices based on mileage.

7. Evaluating — Use the testing set to assess how well the model predicts the prices of unseen cars. Calculate evaluation metrics like mean squared error or R-squared to measure its performance.

8. Refine & improve — Analyze the results and identify areas for improvement. Experiment with different feature selections, consider feature engineering techniques (ex. polynomial features), or explore other regression models to enhance the predictive accuracy.

9. Make predictions — Once you're satisfied with the model's performance, you can make predictions on new, unseen data. For example, if you encounter a used car with a given mileage, you can use your trained linear regression model to estimate its price.

💡This practical application of linear regression will help solidify your understanding of the technique & its underlying concepts💡

Thank you for reading Ashley's Bulletin. This post is public so feel free to share it.

🔑Remember, the key is to actively engage with the concepts you learn & apply them in real-world scenarios. By doing so, you'll develop a deeper understanding of data science techniques and improve your ability to solve problems using data.
So, take this example as a starting point, adapt it to other data science techniques like classification or EDA, & continue exploring and applying these concepts in your own specific learning journey.

🦾What does Data Science in action look like?

Personal projects from scratch with meaningful datasets
Kaggle — Kaggle is a great way to get hands-on experience with Data Science, especially for beginners. Kaggle provides many datasets and projects that you can learn from or you can join a Kaggle competition!
Web scraping your own dataset, cleaning, & adding additional columns or features
Coming up with your own hypothesis & testing it to answer a business question you want to figure out (check out my hypothesis testing blog post!)
Papers With Code — my personal favorite! But this is designed for more advanced Data Science learners. Choose a machine learning paper & implement your own model based on the research. This article explains this concept very well!
Consulting, Freelance Work, Pro-bono — Offer your data science skills to local businesses, startups, or non-profit organizations as a consultant or freelancer. This gives you the opportunity to work on real-world projects with real clients, allowing you to understand their specific needs, gather requirements, & deliver data-driven solutions!

🔓Free Bonus Content

Enjoy this free, exclusive bonus content (code example of linear regression application)

Here is a practical code block for you to practice with for the linear regression example above!

Assume we have a dataset named 'used_cars_dataset.csv' with two columns: 'mileage' (independent variable) and 'price' (dependent variable)
We load the dataset using pandas, prepares the data by splitting it into input (X) & output (y) variables. We then split the data into training & testing sets using the train_test_split function
Next, we will train a linear regression model using the training data. Our model is then used to make predictions on the testing set (p.s, never let your model see the testing set before predicting!)
Finally, we evaluate the model's performance using mean squared error (MSE) & R-squared (R2) metrics & print our results✅

#Import libraries
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

#Load the dataset
data = pd.read_csv('used_cars_dataset.csv')

#Prepare the data
#Independent variable
X = data['mileage'].values.reshape(-1, 1) 
#Our Dependent variable
y = data['price'].values 

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train linear regression model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Make predictions on the testing set
y_pred = regressor.predict(X_test)

# Model evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print('Mean Squared Error:', mse)
print('R-squared:', r2)

Example output:

Mean Squared Error: 12567892.456789
R-squared: 0.7823

So, the "Mean Squared Error" here represents the average squared difference between the predicted prices (y_pred) and the actual prices (y_test). The "R-squared" value indicates the proportion of the variance in the dependent variable that can be explained by the independent variable. It ranges from 0 to 1, where 1 represents a perfect fit. In this example, the R2 value is 0.7823, indicating a relatively good fit of the linear regression model to the data.

😎TLDR;

Summary

Learning data science requires more than just passive consumption of tutorials & courses
Applying the concepts and principles learned is crucial for retaining knowledge
Practical projects, such as personal projects, Kaggle competitions, & web scraping your own data to play around with, help solidify understanding
Coming up with hypotheses & testing them in real-world scenarios enhances learning
Engaging in consulting or freelance work provides hands-on experience with data science — & can be rewarding and fun!
For more advanced learners, implementing machine learning papers from Papers With Code offers advanced learning opportunities & look great on a resume!

That’s all for this week! Give me a shout if you have any feedback, stories, or insights to share with me. Other than that, I’ll see you next week!

Happy learning,

👋Ashley

Ashley's Bulletin