Why is Data Science or Datalogy important?
Have you sat in a meeting discussing various point of views without going forward because the participants are not able to convince the team regarding suggested action because they do not have data to substantiate their argument? The right way for an enterprise to take a decision is to analyze data and take decision based on data points. Data Analytics helps us to analyze data and take informed decisions.
Data Science is the combination of
- Statistics / Mathematics skills
- Coding skills
- Domain Knowledge / Business Knowledge
Data is about numbers – and when you are working with numbers, you are going to use statistical and mathematical concepts. Coding skills are required because the data you will work with is often
hard-to-access, broken, messy, has missing values and so on and code can help solve these issues once and for all. Finally the domain knowledge and business thinking are as essential as statistics and coding. If you don’t have the business knowledge, you won’t be able to evaluate whether your data makes a
difference or not.
For a data scientist the languages that can be useful are SQL, Python, Bash & R.
SQL is a simple query language. It’s well structured and easy to interpret.
Python is also easy to interpret and easy to learn as well, but much more
complex than SQL. Python is better for certain data tasks and SQL is better for others.
What is the origin of Data Science?
Over the years, data science has become an integral part of many industry like agriculture, marketing optimization, risk management, fraud detection, marketing analytics and public policy among others. By using data preparation, statistics, predictive modeling and machine learning, data science tries to resolve many issues within individual sectors and the economy at large.
Data science emphasizes the use of general methods without changing its application, irrespective of the domain. This approach is different from traditional statistics tend to focus on providing solutions that are specific to particular sectors or domains.
The traditional methods depend on providing sectors with solutions that tailored to each problem rather than applying the standard solution.
Today, data science has far reaching implications in many fields, both academic and applied research domains like machine translation, speech recognition, digital economy on one hand and fields like healthcare, social science, medical informatics, on the other hand.
It effects the growth and development of brand by providing a lot of intelligence about consumers and campaigns, through techniques like data mining and data analysis.
The history of data science can be traced to 1960. Peter Naur a Danish computer science pioneer and Turing award winner disliked the very term "computer science" and suggested it be called "datalogy" or "
data science". In the year 1974, Peter published Concise Survey of Computer Methods, where he used the term data science in its survey of the contemporary data processing methods.
These methods were then used in a number of applications. Almost twenty two years later in 1996, the members of the International Federation of Classification Societies met Kobe for their biennial conference, where the term data science was used for the first time, in the title of conference which was called Data Science, classification and related methods. C.F. Jeff Wu in 1997 gave an inaugural lecture on the topic where he spoke about statistics being a form of data science.
Later in 2001, William S. Cleveland introduced data science as an independent discipline. In his article, Data Science: An Action Plan for Expanding the Technical Areas of Statistics, he incorporated advances in computing with data, which was published in the the International Statistical Review. In his report, William mentions six areas which he thought formed the base of data science: these includes multidisciplinary investigations, models and methods for data, pedagogy, computing with data, theory and tool evaluation.
The International Council for Science: Committee on Data for Science and Technology started the publication of Data Science Journal in 2002. DSJ focuses on topics related to data science like description of data systems, their publication on the internet, application and legal issues. Columbia University also began the publication of the Journal of Data Science which was a platform for data workers to share their opinions and exchange ideas about the use and benefits of data science. A journal that was devoted to the application of statistical methods and qualitative research, this journal was a platform that provided data workers with a voice of their own in the field of data science.
In 2005, the National Science Board published long lived digital Data Collections: Enabling Research and Education in the 21st century.
This article defined data scientist as the information and computer scientists, database and software programmers, disciplinary experts, curators and expert annotators, librarians who are extremely important for the successful management of digital data collection.
The primary activity of Data Scientist is to conduct creative inquiry and analysis so that data can be utilized in a proper and effective manner, by organization across all sectors.