Topics | Tools | Learning Plan | Status/Comment |
Programmming | Common data structures (data types, lists, dictionaries, sets, tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and working with external libraries. | Solve a lot of problems here: HackerRank (beginner-friendly) and LeetCode (solve easy or medium-level questions) Games like rock-paper-scissor, spin a yarn, hangman, dice rolling simulator, tic-tac-toe, and so on. Simple web apps like a YouTube video downloader, website blocker, music player, plagiarism checker, and so on. | |
SQL | SQL scripting: Querying databases using joins, aggregations, and subqueries Here’s a course on SQL and Databases on the freeCodeCamp YouTube channel Intro to SQL and Advanced SQL on Kaggle. freeCodeCamp now has a free interactive SQL course. | Data Extraction from a website/API endpoints — try to write Python scripts from extracting data from webpages that allow scraping like soundcloud.com. Store the extracted data into a CSV file or a SQL database. | |
Git | Comfort using the Terminal, version control in Git, and using GitHub Guide for Git and GitHub [free]: complete these tutorials and labs to develop a firm grip over version control. It will help you further in contributing to open-source projects. Here’s a Git and GitHub crash course on the freeCodeCamp YouTube channel | Deploy these projects on GitHub pages or simply host the code on GitHub so that you learn to use Git. | |
Python | learnpython.org [free]— a free resource for beginners. It covers all the basic programming topics from scratch. You get an interactive shell to practice those topics side-by-side. | Python Course by freecodecamp on YouTube [free] — This is a 5-hour course that you can follow to practice the basic concepts. Intermediate python [free]— Another free course by Patrick featured on freecodecamp.org. | |
Kaggle | Kaggle [free]— a free and interactive guide to learning python. It is a short tutorial covering all the important topics for data science. | ||
Python Certifications | Python certifications on freeCodeCamp [free] – freeCodeCamp offers several certifications based on Python, such as scientific computing, data analysis, and machine learning. | ||
Data Collection and Wrangling (Cleaning) | freeCodeCamp course on learning Numpy, Pandas, matplotlib, and seaborn [free]. Practical tutorial on data manipulation with NumPy and Pandas in Python from HackerEarth. Kaggle pandas tutorial [free] — A short and concise hands-on tutorial that will walk you through commonly used data manipulation skills. Data Cleaning course by Kaggle. Coursera course on Introduction to Data Science in Python — This is the first course in the Applied Data Science with Python Specialization. | Collect data from a website/API (open for public consumption) of your choice, and transform the data to store it from different sources into an aggregated file or table (DB). Example APIs include TMDB, quandl, Twitter API, and so on. Pick any publicly available dataset and define a set of questions that you’d want to pursue after looking at the dataset and the domain. Wrangle the data to find out answers to those questions using Pandas and NumPy. | |
Exploratory Data Analysis, Business Acumen, and Storytelling | The next stratum to master is data analysis and storytelling. Drawing insights from the data and then communicating the same to management in simple terms and visualizations is the core responsibility of a Data Analyst. The storytelling part requires you to be proficient with data visualization along with excellent communication skills. Exploratory data analysis — defining questions, handling missing values, outliers, formatting, filtering, univariate and multivariate analysis. Data visualization — plotting data using libraries like matplotlib, seaborn, and plotly. Know how to choose the right chart to communicate the findings from the data. Developing dashboards — a good percent of analysts only use Excel or a specialized tool like Power BI and Tableau to build dashboards that summarise/aggregate data to help management make decisions. Business acumen: Work on asking the right questions to answer, ones that actually target the business metrics. Practice writing clear and concise reports, blogs, and presentations. | Learn data analysis with Python in this free course on the freeCodeCamp YouTube channel. Data Analysis with Python — by IBM on Coursera. The course covers wrangling, exploratory analysis, and simple model development using python. Data Visualization — by Kaggle. Another interactive course that lets you practice all the commonly used plots. Build product sense and business acumen with these books: Measure what matters, Decode and conquer, Cracking the PM interview. Exploratory analysis on movies dataset to find the formula to create profitable movies (use it as inspiration), use datasets from healthcare, finance, WHO, past census, Ecommerce, and so on. Build dashboards (jupyter notebooks, excel, tableau) using the resources provided above. | |
Data Engineering | Data engineering underpins the R&D teams by making clean data accessible to research engineers and scientists at big data-driven firms. It is a field in itself and you may decide to skip this part if you want to focus on just the statistical algorithm side of the problems. Responsibilities of a data engineer comprise building an efficient data architecture, streamlining data processing, and maintaining large-scale data systems. Engineers use Shell (CLI), SQL, and Python/Scala to create ETL pipelines, automate file system tasks, and optimize the database operations to make them high-performance. Another crucial skill is implementing these data architectures which demand proficiency in cloud service providers like AWS, Google Cloud Platform, Microsoft Azure, and others. | Data Engineering Nanodegree by Udacity — as far as a compiled list of resources is concerned, I have not come across a better-structured course on data engineering that covers all the major concepts from scratch. Data Engineering, Big Data, and Machine Learning on GCP Specialization — You can complete this specialization offered by Google on Coursera that walks you through all the major APIs and services offered by GCP to build a complete data solution. AWS Certified Machine Learning (300 USD) — A proctored exam offered by AWS, adds some weight to your profile (doesn’t guarantee anything, though), requires a decent understanding of AWS services and ML. Professional Data Engineer — Certification offered by GCP. This is also a proctored exam and assesses your abilities to design data processing systems, deploying machine learning models in a production environment, and ensure solutions quality and automation. | |
Applied Statistics and Mathematics | Statistical methods are a central part of data science. Almost all data science interviews predominantly focus on descriptive and inferential statistics. People often start coding machine learning algorithms without a clear understanding of underlying statistical and mathematical methods that explain the working of those algorithms. This, of course, isn’t the best way to go about it. Descriptive Statistics — to be able to summarise the data is powerful, but not always. Learn about estimates of location (mean, median, mode, weighted statistics, trimmed statistics), and variability to describe the data. Inferential statistics — designing hypothesis tests, A/B tests, defining business metrics, analyzing the collected data and experiment results using confidence interval, p-value, and alpha values. Linear Algebra, Single and multi-variate calculus to understand loss functions, gradient, and optimizers in machine learning. | Learn college-level statistics in this free 8-hour course on the freeCodeCamp YouTube channel [Book] Practical statistics for data science (highly recommend) — A thorough guide on all the important statistical methods along with clean and concise applications/examples. [Book] Naked Statistics — a non-technical but detailed guide to understanding the impact of statistics on our routine events, sports, recommendation systems, and many more instances. An 8-hour University-level Statistics course — a foundation course to help you start thinking statistically. Intro to Descriptive Statistics— offered by Udacity. Consists of video lectures explaining widely used measures of location and variability(standard deviation, variance, median absolute deviation). Inferential Statistics, Udacity — the course consists of video lectures that educate you on drawing conclusions from data that might not be immediately obvious. It focuses on developing hypotheses and use common tests such as t-tests, ANOVA, and regression. And here’s a guide to statistics for data science to help you get started down the right path. Solve the exercises provided in the courses above and then try to go through a number of public datasets where you can apply these statistical concepts. Ask questions like “Is there sufficient evidence to conclude that the mean age of mothers giving birth in Boston is over 25 years of age at the 0.05 level of significance”? Try to design and run small experiments with your peers/groups/classes by asking them to interact with an app or answer a question. Run statistical methods on the collected data once you have a good amount of data after a period of time. This might be very hard to pull off but should be very interesting. Analyze stock prices, cryptocurrencies, and design hypothesis around the average return or any other metric. Determine if you can reject the null hypothesis or fail to do so using critical values. | |
Machine Learning and AI | After grilling yourself and going through all the major aforementioned concepts, you should now be ready to get started with the fancy ML algorithms. There are three major types of learning: Supervised Learning — includes regression and classification problems. Study simple linear regression, multiple regression, polynomial regression, naive Bayes, logistic regression, KNNs, tree models, ensemble models. Learn about evaluation metrics. Unsupervised Learning — Clustering and dimensionality reduction are the two widely used applications of unsupervised learning. Dive deep into PCA, K-means clustering, hierarchical clustering, and gaussian mixtures. Reinforcement learning (can skip*) — helps you build self-rewarding systems. Learn to optimize rewards, using the TF-Agents library, creating Deep Q-networks, and so on. | Here’s a free full course on Machine learning in Python with ScikitLearn on the freeCodeCamp YouTube channel. [book] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition — one of my all-time favorite books on machine learning. Doesn’t only cover the theoretical mathematical derivations, but also showcases the implementation of algorithms through examples. You should solve the exercises given at the end of each chapter. Machine Learning Course by Andrew Ng — the go-to course for anyone trying to learn machine learning. Hands down! Introduction to Machine Learning — Interactive course by Kaggle. Intro to Game AI and Reinforcement Learning — another interactive course on Kaggle on reinforcement learning. | |
Deep Learning | deeplearning.ai | ||

