Monday, June 18

Beginners - How to get started with Python (2018)

Here are steps to get started with Python.

  1. Go to automatetheboringstuff.com and download the the free pdf book Automate The Boring Stuff written by Al Sweigart. 
  2. The book is not boring at all and one of the best to get started. If you prefer video lessons Udemy.com has a paid video tutorial for the same book. Support the author by purchasing the print & ebook bundle from No Starch Press or separately on Amazon.
  3. There are many flavors of Python and you can download any one. Download Python from https://www.anaconda.com if you do not have preferences. 
  4. Python has a inbuilt IDE called IDLE  just type IDLE in search once you have installed Python. The best IDE for Python is Spyder IDE and you can either download it or use IDLE.  
  5. Once you install python if you have programming experience you can directly get started by looking at the demo examples in the Tools/demo folder of your Python installation.
  6. The last 2 images below show the beer bottle demo code and its execution.
  7. This is all you need to get started. For any issues you can drop an email or check solutions on stackoverflow.com/python
Python default IDLE IDE

Sample code with Python installation

Demo code beer.py

Execution of Beer demo example
Execution of beer.py

Saturday, June 9

Machine Learning with R

R is a powerful language used widely for data analysis and statistical computing.and has many provisions to implement machine learning algorithms in a fast and simple manner.  Born in 1990 R is getting prominence today because of its simplicity.

  • R is data analysis software: Data scientists can use R for statistical analysis, data visualization, and predictive modeling.
  • R is a programming language: R is an object-oriented language & provides objects, operators, and functions that allow users to explore, model, and visualize data.
  • R is an defacto environment for statistical analysis: Standard statistical methods are easy to implement in R, and since much of the cutting-edge research in statistics and predictive modeling is done in R, newly developed techniques are often available in R first.
  • R is an open-source software project: R’s open interfaces allow it to integrate with other applications and systems.
  • R community: The R project leadership has grown to include more than 20 leading statisticians and computer scientists from around the world, and thousands of contributors have created add-on packages. 
Take an example to understand the power of R language in today's collaborative open source world. The picture below shows heat map of a volcano is a R program of 15 lines that uses multiple R libraries developed by the community. 







And here are the lines of  R code that produced the above visualization :-




I will post more about R and why is it my ( actually all data scientist) favorite tool for data analysis. Mean while I am sharing a small  tutorial for beginners to R that is available at https://www.analyticsvidhya.com

A Complete Tutorial to learn Data Science in R from Scratch

Tuesday, May 29

How to determine the value of your company's big data?

Every company has Big Data

Today’s companies are generating an enormous amount of data  including traditional, structured data, as well as unstructured data. The massive amounts of information companies collect today can become a valuable new asset if used strategically. Players seeking additional organic revenue streams should consider tapping their data trove to power a new information services growth engine.  Enterprises can capitalize on insights derived from this data to make  better decisions, evaluate risk, and understand the market. There is also a  huge amount of data being generated about your company on mobile devices and social media.      
                                       Companies that are sitting on large amounts  of customer data—including insurance carriers, retailers, transportation companies and communications providers—have a unique opportunity to make this type of information services play.  According to a study by IDC, the amount of data that companies are accumulating is growing at 50 percent per year -- or more than doubling every two years. Many organizations are rich in data but poor in insight. Data requires collection, mining and, finally, analysis before we can realize its true value for enterprise.  Overall US demand for information services had exceeded $600 billion by 2016.  As data-driven insights become an increasingly critical competitive differentiator, companies will use them to drive and optimize business decisions across industries. In the past, this market was largely limited to traditional market research and data specialists, but today, virtually any company with a large customer database can potentially become  a serious player in the new information game. So question is how does an enterprise find which data is valuable and which data is not valuable? 

 

URL - statista.com - Worldwide Big Data/Business Analytics Revenue

 

What data to target for maximum insight?                       

So how does a company identify which data is valuable and what value can be derived from the data even before investing into the Big Data program? Research companies like Accenture Labs have defined Information Value Pyramid to illustrate various  information service strategies.  The pyramid has three levels - raw data, insights and transactions. The potential value and profitability  of an information services business depends in large part on the condition of the data the enterprise owns. The base of the pyramid features raw, less differentiated and thus less valuable data. Moving up the pyramid creates larger revenue opportunities, although these tend to be more difficult to execute.  One approach to do valuation of  your Big Data is by creating a custom Big Data Value Matrix.  Value Matrix is an approach to classify the different set of data using a standard set of parameters and evaluate the reference value of each data set in context of the company and its business goals. Various factors can be defined to classify the type of raw data, the potential use of the raw data, who are the consumers - once the data is processed, the efforts and cost of processing the raw data & potential insights that can be derived from the unprocessed big data. Weight-age is assigned to each of these factors and the Matrix us used to prioritize the various big data categories within the company and this valuation becomes the input to companies big data program. Rather than processing each and every data that is being generated by the company, data valuation helps companies understand their data, define a big data strategy and roadmap and expect a realistic outcome to their big data processing.  
                                        In auto industry since many vehicles now feature GPS and telematics systems, some car manufacturers have been able to collect and monetize a wealth of data on customer driving habits. General Motors Co.’s OnStar telematics  system, for example, not only provides vehicle security, information and diagnostics services to drivers, it also captures telemetry data. OnStar and GMAC Insurance partnered to create an opt-in program that uses the telemetry data to offer lower insurance premiums to customers who drive fewer miles. Thanks to the program, consumers can save significantly on car insurance, which boosts GM’s customer satisfaction performance. This, in turn, helps GM attract new OnStar paying customers.

Every company is working on some initiative to exploit the data and big investments are being made without really having a clear picture of the outcome or benefits of the investment. The company that understands its Big Data will be able to target the right data, use the insight strategically and derive maximum value for its investments.


Understanding Generative AI and Generative AI Platform leaders

We are hearing a lot about power of Generative AI. Generative AI is a vertical of AI that  holds the power to #Create content, artwork, code...