Saturday, June 9

Machine Learning with R

R is a powerful language used widely for data analysis and statistical computing.and has many provisions to implement machine learning algorithms in a fast and simple manner.  Born in 1990 R is getting prominence today because of its simplicity.

  • R is data analysis software: Data scientists can use R for statistical analysis, data visualization, and predictive modeling.
  • R is a programming language: R is an object-oriented language & provides objects, operators, and functions that allow users to explore, model, and visualize data.
  • R is an defacto environment for statistical analysis: Standard statistical methods are easy to implement in R, and since much of the cutting-edge research in statistics and predictive modeling is done in R, newly developed techniques are often available in R first.
  • R is an open-source software project: R’s open interfaces allow it to integrate with other applications and systems.
  • R community: The R project leadership has grown to include more than 20 leading statisticians and computer scientists from around the world, and thousands of contributors have created add-on packages. 
Take an example to understand the power of R language in today's collaborative open source world. The picture below shows heat map of a volcano is a R program of 15 lines that uses multiple R libraries developed by the community. 







And here are the lines of  R code that produced the above visualization :-




I will post more about R and why is it my ( actually all data scientist) favorite tool for data analysis. Mean while I am sharing a small  tutorial for beginners to R that is available at https://www.analyticsvidhya.com

A Complete Tutorial to learn Data Science in R from Scratch

Tuesday, May 29

How to determine the value of your company's big data?

Every company has Big Data

Today’s companies are generating an enormous amount of data  including traditional, structured data, as well as unstructured data. The massive amounts of information companies collect today can become a valuable new asset if used strategically. Players seeking additional organic revenue streams should consider tapping their data trove to power a new information services growth engine.  Enterprises can capitalize on insights derived from this data to make  better decisions, evaluate risk, and understand the market. There is also a  huge amount of data being generated about your company on mobile devices and social media.      
                                       Companies that are sitting on large amounts  of customer data—including insurance carriers, retailers, transportation companies and communications providers—have a unique opportunity to make this type of information services play.  According to a study by IDC, the amount of data that companies are accumulating is growing at 50 percent per year -- or more than doubling every two years. Many organizations are rich in data but poor in insight. Data requires collection, mining and, finally, analysis before we can realize its true value for enterprise.  Overall US demand for information services had exceeded $600 billion by 2016.  As data-driven insights become an increasingly critical competitive differentiator, companies will use them to drive and optimize business decisions across industries. In the past, this market was largely limited to traditional market research and data specialists, but today, virtually any company with a large customer database can potentially become  a serious player in the new information game. So question is how does an enterprise find which data is valuable and which data is not valuable? 

 

URL - statista.com - Worldwide Big Data/Business Analytics Revenue

 

What data to target for maximum insight?                       

So how does a company identify which data is valuable and what value can be derived from the data even before investing into the Big Data program? Research companies like Accenture Labs have defined Information Value Pyramid to illustrate various  information service strategies.  The pyramid has three levels - raw data, insights and transactions. The potential value and profitability  of an information services business depends in large part on the condition of the data the enterprise owns. The base of the pyramid features raw, less differentiated and thus less valuable data. Moving up the pyramid creates larger revenue opportunities, although these tend to be more difficult to execute.  One approach to do valuation of  your Big Data is by creating a custom Big Data Value Matrix.  Value Matrix is an approach to classify the different set of data using a standard set of parameters and evaluate the reference value of each data set in context of the company and its business goals. Various factors can be defined to classify the type of raw data, the potential use of the raw data, who are the consumers - once the data is processed, the efforts and cost of processing the raw data & potential insights that can be derived from the unprocessed big data. Weight-age is assigned to each of these factors and the Matrix us used to prioritize the various big data categories within the company and this valuation becomes the input to companies big data program. Rather than processing each and every data that is being generated by the company, data valuation helps companies understand their data, define a big data strategy and roadmap and expect a realistic outcome to their big data processing.  
                                        In auto industry since many vehicles now feature GPS and telematics systems, some car manufacturers have been able to collect and monetize a wealth of data on customer driving habits. General Motors Co.’s OnStar telematics  system, for example, not only provides vehicle security, information and diagnostics services to drivers, it also captures telemetry data. OnStar and GMAC Insurance partnered to create an opt-in program that uses the telemetry data to offer lower insurance premiums to customers who drive fewer miles. Thanks to the program, consumers can save significantly on car insurance, which boosts GM’s customer satisfaction performance. This, in turn, helps GM attract new OnStar paying customers.

Every company is working on some initiative to exploit the data and big investments are being made without really having a clear picture of the outcome or benefits of the investment. The company that understands its Big Data will be able to target the right data, use the insight strategically and derive maximum value for its investments.


Friday, March 30

Could blockchain have prevented the CBSE Data Leak in India?

Could blockchain have prevented the "CBSE Exam Paper Leak in India" ?
Definitely yes!  I will explain how it will work and I have a picture that explains how 3 sets of Exam Papers set by 3 people are put on blockchain and they move across the 'Exam Paper Distribution Process'. The data/question papers is accessible only to those authorized users (those who are given access). No one else can access or hack blockchain and leak the papers except the authorized personal who will get identified and caught if they try to leak exam paper. The process is transparent, secure and has an audit trail which gives everybody confidence.





Blockchain was invested by Satashi Nakamoto in a paper he published in 2008.  Let me explain blockchain as I understand it.

  1. Blockchain is a software program that runs on a network of computers  (for now this definition is good enough for newbies)
  2. If a document is put on blockchain then blockchain allows AUTHORIZED USERS" to update the document but does not allow then to EDIT or DELETE any data. Its like a notebook in which you can write and notebook records the date and time of your writing but if you make a mistake and want to correct it it will not allow, you will have to add a next entry saying previous entry was a mistake and should be ignored. In short history of every transaction can be viewed at any point of time which makes the document trust worthy (of-course there is security that ensures that document cannot be tampered so don't worry about it for now)
  3. Next aspect of blockchain is that every authorized user gets a copy of the blockchain record/ledger every time it gets updated. So if someone wants to tamper his copy of the document it is of no use because every other user will know what action you have performed this is 2nd level of trust.
  4. Blockchain maintains all documents for ever and no one is allowed to delete any document  not even a hacker can do it. So any point of time any authorized user can view the document.

Now take our example of exam papers and how blockchain and a well defined process can prevent paper leak -

  1. Lets assume paper setters, CBSE officials at center and in states & all school principals are authorized users of blockchain
  2. Assume 3 paper setters create 3 sets of Mathematics exam paper and put them on block chain
  3. Blockchain will record the transaction and inform all users that papers have been submitted and they will get a copy of the paper which is encrypted  with a password.
  4. At this  stage only CBSE  officials have the password to view the papers and approve them to be used in exam. School principals come to know papers are set but cannot access them till password is given to them
  5. On day of the exam CBSE will decide which paper from the 3 sets  has to be used for exam and 1 hour prior to the exam they will share the password with School principals. So far it is ensures that if paper leaks then it will be done by paper setter and he will get caught but less chance since he does not know which paper will be used for the exam
  6. Once principal get the password it is their responsibility to take print, update the block chain as soon as they take the print so everybody in CBSE knows when the papers were printed and how many copies were printed. Next the principal is responsible to distribute ( with help of his staff who are not carrying mobile devices) papers in the classroom
  7. Any student is missing and question paper is unused the school updates the blockchain that x number of paper are unused and they are being sealed and kept in principals custody.
  8. From beginning to end each step, each transaction is recorded on blockchain and immediately all users on blockchain get informed over the internet. If there is any leak it can only be possible if the principal and his staff do it as no one else has a hard copy of the papers, not even CBSE.
  9. This is a simple explanation of the process. Actual implementation can have more customization and security for example using GPS or RFID tags from the time  exam papers  are printed to track the Real Time movement of exam paper till they reach class rooms and many other security aspects can be added for additional security at the last leg of delivery.
I don't have knowledge of the system used by education institutes to deliver examination papers and there could be minor changes in the process. What is required is the transparency, security and trust that blockchain or similar solution can deliver so that students have faith in the system. Diamond merchants face challenge of fake or over valued diamonds and they use blockchain to trace the journey of each diamond from the mine to the stores and all the way to final buyer. Every diamond sold by DeBeers has a complete audit trail of every person who had handled each diamond and there the transparency in the process builds trust. Blockchain promises trust in your business process.

Tuesday, March 27

Does my company need a BPM software to implement business process management?

Please answer these questions in yes or no
  1. Does your company implement continuous business process improvement ?
  2. Do you think there is scope to improve company's productivity, collaboration, efficiency & QoS?
  3. Do you think there is scope to improve customer satisfaction?
  4. Does your company implement innovative changes to the way you do business?
  5. Does your company spend too much time in modifying software to adhere to changing process?
  6. Does your company face challenges in adherence to the defined process? 
  7. Does your company want your employees to have simplified process management & removing redundancy in existing process?
  8. Do you think your company can leverage big data to make smart decisions?
  9. Does your management expert a dashboard view of your business process management including insight about the performance across departments?
  10. Do you think there is scope to improve collaboration across distributed groups and partners ?
  11. Does your company want to reduce the dependence on software engineers to manage your business process? 
  12. Do you think a change is process should be tested in simulated environment before implementing the change?
  13. Are your competitors already using BPM ?
                                                                           

If the answer to all these questions is yes than your company needs a state of art BPM product to automate the business process management . In the next post we will take an example of a business process and see how you can achieve the above goals.

Friday, March 9

What you should know about Artificial Intelligence (AI) ? Why are some people worried about AI?

When people talk about how Artificial Intelligence (AI) is going to revolutionize the game they fail to tell you that AI is as old as computers, well almost. AI is as old as the first computer program way back in 1940 when programmer Ada Lovelace tried to write a program to do something which is normally done by people and that is the simple definition of AI. Garry Kasparov, the chess Grand Master who was defeated by IBM’s Deep Blue Computer in the 90’s.  Garry's reaction to the defeat was that he realized that he would have performed much better if he had access to the same chess programs that Deep Blue had.  Deep Blue had huge storage and was fed thousands of detailed chess game moves data' from which it was able to anticipate opponents next moves and plan its moves and coupled with it's CPU computing power Garry was no match to him!

https://digitaltechnologyarchitecture.blogspot.in/


Over the years as computers and related technologies have evolved we see evolution in AI capabilities.When you buy a laptop from Amazon and amazon web-page suggests relevant laptop accessories for your product that's actually a low level AI program that has processed your purchase data in real time to smartly predict what accessories you may require. Few others examples of AI implementations that you might already be using are Apples Siri, Google's Assistant and Now, Microsoft Cortana and IBM's Watson. We will discuss narrow AI & general AI , strong AI & weak AI at a later stage and for the moment you should know that these are just advanced concepts in AI implementation.
                So why is there sudden surge in implementation of AI over last 4/5 years?  As the info-graphic by digitalintelligencetoday.com shows research on developing smart programs has been going on since 1950 and we have seen some working examples over the years. Over last 10 yrs we have seen cost of storage decreasing drastically, the cost of internet decreasing and internet speeds are improving. We have seen technologies and software like Big Data, Social Media, Mobiles, Sensors, Cloud, Hadoop are enabling creation & consumption of huge amount of data. This huge amount of data created every second and the technologies that make it possible to store and process the huge data are helping AI evolve and become smarter. Don't let some hi-fi techie tell you that AI is a new born and its going to make your job absolute. Like any other successful technology AI is being used for few years and it will help drive data driven intelligent automation to make our lives simpler. Way back computers were supposed to take away the job of millions of workers and though some jobs became obsolete, computers actually created many more new jobs and improved & simplified the way we do business. Today no one seems to blame computers for rendering them job less and in few years people will accept AI (which is nothing but a smarter computer program) as an invisible entity that is integral part of their life.


AI - Its about machines/computers learning from Data from Everywhere

Learning from mistakes is human nature but humans have finite memory and processing power as compared to a computer. From mid 1950s research is being done on machine learning, since machines can access data storage and learn from the data to continuously correct the mistakes and improve. The latest advances include self-driving cars, IBM Watson, a computer that can beat humans at Jeopardy and real-time machine translation that  seem quite like the universal translator in Star Trek.

My favorite AI info-graphic is the one below and it is owned / created by www.digitalintelligencetoday.com



Next post -  So where is AI being used by today's businesses?





Understanding Generative AI and Generative AI Platform leaders

We are hearing a lot about power of Generative AI. Generative AI is a vertical of AI that  holds the power to #Create content, artwork, code...