Digital Technology Architecture: Data Science

Tuesday, August 30

Begining Your Data Science Journey

There are tons of data science resources but we often get confused on which resources to follow. I am sharing some steps I followed to learn data science on my own as a beginner. You can also check the links at the end of the article your learning and getting hands-on experience in Data Science.

Programming Language is must to start with Data Science

Whether you are a programmer or new to programming the first step while starting the Data Science Journey is to know programming language. Python is the most preferred coding language and is adopted by most Data Scientists. It is easy to understand, versatile, and Python supports various in-built libraries such as Numpy, Pandas, MatplotLib, Seaborn, Scipy, and many more. The 2nd preferred language for data science is R. Both Python and R learning resources are freely available on internet.

Learning SQL is important when you are working with data

Most programmers are expert on SQL and have worked with 1 or more databases. Structured Query Language (SQL) is used for extracting and communicating with large databases. When you are working with tons of data it is important to know how SQL is used to store & query data. You should have a good understanding of normalization, writing nested queries, group-by, performing join operations, etc., on the data and extract in raw format. This data is then processed using Python , R or any other library.

Cleaning Data is an important step of data processing

When a Data Scientist starts work on a project he has to deal with raw data which is not clean and can't be used for meaningful operations. One has to learn which libraries to use for cleaning the data set, removing unwanted values, formatting data to required format, handling missing values and purging unwanted data. It can be achieved by using some inbuilt python libraries like Pandas and Numpy.
When the data volume is small we can use MS Excel to process the data but Excel has limitations of volume and NOSQL and RDBMS database are used for storing volume data.

Data Analysis is performed on cleansed data

Exploratory data analysis is the essential part when talking about data science. The data scientist has many tasks, including finding data patterns, analyzing data, finding the appropriate trends in the data and obtaining valuable insights, etc., from them with the help of various graphical and statistical methods, including:

A) Data Analysis using Pandas and Numpy
B) Data Manipulation
C) Data Visualization

You can learn basics of Exploratory Data Analysis from this blog posted by Prasad Patil

What is Exploratory Data Analysis?

Learning Machine Learning

According to Google, “Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.”

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

Linear Regression
Logistic Regression
Decision Tree
SVM
Naive Bayes
kNN
K-Means
Random Forest
Dimensionality Reduction Algorithms
Gradient Boosting algorithms

Some Useful Links

FreeCodeCamp Website : https://www.freecodecamp.org/
Kaggle Website : https://www.kaggle.com/learn/pandas
AnalyticsVidhya Website : https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
Deep Learning Website: https://www.deeplearning.ai/
Google Digital garage : https://learndigital.withgoogle.com/digitalgarage/course/what-is-data-science

Blog Privacy Policy

This blog does not share personal information with third parties nor do we store any information about your visit to this blog other than to analyze and optimize your content and reading experience through the use of cookies. You can turn off the use of cookies at anytime by changing your specific browser settings. We are not responsible for republished content from this blog on other blogs or websites without our permission. This privacy policy is subject to change without notice and was last updated on 07, Aug 2022. If you have any questions feel free to contact me directly on email : projectincharge@yahoo.com.

Digital Technology Architecture