Tuesday, August 30

Begining Your Data Science Journey

https://cdn.analyticsvidhya.com/wp-content/uploads/2017/09/machine-.png

There are tons of data science resources but we often get confused on which resources to follow. I am sharing some steps I followed to learn data science on my own as a beginner. You can also check the links at the end of the article your learning and getting hands-on experience in Data Science.

Programming Language is must to start with Data Science

Whether you are a programmer or new to programming the first step while starting the Data Science Journey is to know programming language. Python is the most preferred coding language and is adopted by most Data Scientists. It is easy to understand, versatile, and Python supports various in-built libraries such as Numpy, Pandas, MatplotLib, Seaborn, Scipy, and many more. The 2nd preferred language for data science is R. Both Python and R learning resources are freely available on internet.

Learning SQL is important when you are working with data

Most programmers are expert on SQL and have worked with 1 or more databases. Structured Query Language (SQL) is used for extracting and communicating with large databases. When you are working with tons of data it is important to know how SQL is used to store & query data. You should have a good understanding of normalization, writing nested queries, group-by, performing join operations, etc., on the data and extract in raw format. This data is then processed using Python , R or any other library.

Cleaning Data is an important step of data processing

When a Data Scientist starts work on a project he has to deal with raw data which is not clean and can't be used for meaningful operations. One has to learn which libraries to use for cleaning the data set, removing unwanted values, formatting data to required format, handling missing values and purging unwanted data. It can be achieved by using some inbuilt python libraries like Pandas and Numpy.
When the data volume is small we can use MS Excel to process the data but Excel has limitations of volume and NOSQL and RDBMS database are used for storing volume data.

 Data Analysis is performed on cleansed data

Exploratory data analysis is the essential part when talking about data science. The data scientist has many tasks, including finding data patterns, analyzing data, finding the appropriate trends in the data and obtaining valuable insights, etc., from them with the help of various graphical and statistical methods, including:

A) Data Analysis using Pandas and Numpy
B) Data Manipulation
C) Data Visualization

You can learn basics of Exploratory Data Analysis from this blog posted by Prasad Patil  

What is Exploratory Data Analysis?

Learning Machine Learning 

According to Google, “Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.”

Here is the list of commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Tree
  4. SVM
  5. Naive Bayes
  6. kNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boosting algorithms

Some Useful Links

 

No comments:

Post a Comment

MUSTREAD : How can you use Index Funds to help create wealth? HDFC MF Weekend Bytes

https://www.hdfcfund.com/knowledge-stack/mf-vault/weekend-bytes/how-can-you-use-index-funds-help-create-wealth?utm_source=Netcore&...