Saturday, June 9

Machine Learning with R

R is a powerful language used widely for data analysis and statistical computing.and has many provisions to implement machine learning algorithms in a fast and simple manner.  Born in 1990 R is getting prominence today because of its simplicity.

  • R is data analysis software: Data scientists can use R for statistical analysis, data visualization, and predictive modeling.
  • R is a programming language: R is an object-oriented language & provides objects, operators, and functions that allow users to explore, model, and visualize data.
  • R is an defacto environment for statistical analysis: Standard statistical methods are easy to implement in R, and since much of the cutting-edge research in statistics and predictive modeling is done in R, newly developed techniques are often available in R first.
  • R is an open-source software project: R’s open interfaces allow it to integrate with other applications and systems.
  • R community: The R project leadership has grown to include more than 20 leading statisticians and computer scientists from around the world, and thousands of contributors have created add-on packages. 
Take an example to understand the power of R language in today's collaborative open source world. The picture below shows heat map of a volcano is a R program of 15 lines that uses multiple R libraries developed by the community. 







And here are the lines of  R code that produced the above visualization :-




I will post more about R and why is it my ( actually all data scientist) favorite tool for data analysis. Mean while I am sharing a small  tutorial for beginners to R that is available at https://www.analyticsvidhya.com

A Complete Tutorial to learn Data Science in R from Scratch

Tuesday, May 29

How to determine the value of your company's big data?

Every company has Big Data

Today’s companies are generating an enormous amount of data  including traditional, structured data, as well as unstructured data. The massive amounts of information companies collect today can become a valuable new asset if used strategically. Players seeking additional organic revenue streams should consider tapping their data trove to power a new information services growth engine.  Enterprises can capitalize on insights derived from this data to make  better decisions, evaluate risk, and understand the market. There is also a  huge amount of data being generated about your company on mobile devices and social media.      
                                       Companies that are sitting on large amounts  of customer data—including insurance carriers, retailers, transportation companies and communications providers—have a unique opportunity to make this type of information services play.  According to a study by IDC, the amount of data that companies are accumulating is growing at 50 percent per year -- or more than doubling every two years. Many organizations are rich in data but poor in insight. Data requires collection, mining and, finally, analysis before we can realize its true value for enterprise.  Overall US demand for information services had exceeded $600 billion by 2016.  As data-driven insights become an increasingly critical competitive differentiator, companies will use them to drive and optimize business decisions across industries. In the past, this market was largely limited to traditional market research and data specialists, but today, virtually any company with a large customer database can potentially become  a serious player in the new information game. So question is how does an enterprise find which data is valuable and which data is not valuable? 

 

URL - statista.com - Worldwide Big Data/Business Analytics Revenue

 

What data to target for maximum insight?                       

So how does a company identify which data is valuable and what value can be derived from the data even before investing into the Big Data program? Research companies like Accenture Labs have defined Information Value Pyramid to illustrate various  information service strategies.  The pyramid has three levels - raw data, insights and transactions. The potential value and profitability  of an information services business depends in large part on the condition of the data the enterprise owns. The base of the pyramid features raw, less differentiated and thus less valuable data. Moving up the pyramid creates larger revenue opportunities, although these tend to be more difficult to execute.  One approach to do valuation of  your Big Data is by creating a custom Big Data Value Matrix.  Value Matrix is an approach to classify the different set of data using a standard set of parameters and evaluate the reference value of each data set in context of the company and its business goals. Various factors can be defined to classify the type of raw data, the potential use of the raw data, who are the consumers - once the data is processed, the efforts and cost of processing the raw data & potential insights that can be derived from the unprocessed big data. Weight-age is assigned to each of these factors and the Matrix us used to prioritize the various big data categories within the company and this valuation becomes the input to companies big data program. Rather than processing each and every data that is being generated by the company, data valuation helps companies understand their data, define a big data strategy and roadmap and expect a realistic outcome to their big data processing.  
                                        In auto industry since many vehicles now feature GPS and telematics systems, some car manufacturers have been able to collect and monetize a wealth of data on customer driving habits. General Motors Co.’s OnStar telematics  system, for example, not only provides vehicle security, information and diagnostics services to drivers, it also captures telemetry data. OnStar and GMAC Insurance partnered to create an opt-in program that uses the telemetry data to offer lower insurance premiums to customers who drive fewer miles. Thanks to the program, consumers can save significantly on car insurance, which boosts GM’s customer satisfaction performance. This, in turn, helps GM attract new OnStar paying customers.

Every company is working on some initiative to exploit the data and big investments are being made without really having a clear picture of the outcome or benefits of the investment. The company that understands its Big Data will be able to target the right data, use the insight strategically and derive maximum value for its investments.


Friday, March 30

Could blockchain have prevented the CBSE Data Leak in India?

Could blockchain have prevented the "CBSE Exam Paper Leak in India" ?
Definitely yes!  I will explain how it will work and I have a picture that explains how 3 sets of Exam Papers set by 3 people are put on blockchain and they move across the 'Exam Paper Distribution Process'. The data/question papers is accessible only to those authorized users (those who are given access). No one else can access or hack blockchain and leak the papers except the authorized personal who will get identified and caught if they try to leak exam paper. The process is transparent, secure and has an audit trail which gives everybody confidence.





Blockchain was invested by Satashi Nakamoto in a paper he published in 2008.  Let me explain blockchain as I understand it.

  1. Blockchain is a software program that runs on a network of computers  (for now this definition is good enough for newbies)
  2. If a document is put on blockchain then blockchain allows AUTHORIZED USERS" to update the document but does not allow then to EDIT or DELETE any data. Its like a notebook in which you can write and notebook records the date and time of your writing but if you make a mistake and want to correct it it will not allow, you will have to add a next entry saying previous entry was a mistake and should be ignored. In short history of every transaction can be viewed at any point of time which makes the document trust worthy (of-course there is security that ensures that document cannot be tampered so don't worry about it for now)
  3. Next aspect of blockchain is that every authorized user gets a copy of the blockchain record/ledger every time it gets updated. So if someone wants to tamper his copy of the document it is of no use because every other user will know what action you have performed this is 2nd level of trust.
  4. Blockchain maintains all documents for ever and no one is allowed to delete any document  not even a hacker can do it. So any point of time any authorized user can view the document.

Now take our example of exam papers and how blockchain and a well defined process can prevent paper leak -

  1. Lets assume paper setters, CBSE officials at center and in states & all school principals are authorized users of blockchain
  2. Assume 3 paper setters create 3 sets of Mathematics exam paper and put them on block chain
  3. Blockchain will record the transaction and inform all users that papers have been submitted and they will get a copy of the paper which is encrypted  with a password.
  4. At this  stage only CBSE  officials have the password to view the papers and approve them to be used in exam. School principals come to know papers are set but cannot access them till password is given to them
  5. On day of the exam CBSE will decide which paper from the 3 sets  has to be used for exam and 1 hour prior to the exam they will share the password with School principals. So far it is ensures that if paper leaks then it will be done by paper setter and he will get caught but less chance since he does not know which paper will be used for the exam
  6. Once principal get the password it is their responsibility to take print, update the block chain as soon as they take the print so everybody in CBSE knows when the papers were printed and how many copies were printed. Next the principal is responsible to distribute ( with help of his staff who are not carrying mobile devices) papers in the classroom
  7. Any student is missing and question paper is unused the school updates the blockchain that x number of paper are unused and they are being sealed and kept in principals custody.
  8. From beginning to end each step, each transaction is recorded on blockchain and immediately all users on blockchain get informed over the internet. If there is any leak it can only be possible if the principal and his staff do it as no one else has a hard copy of the papers, not even CBSE.
  9. This is a simple explanation of the process. Actual implementation can have more customization and security for example using GPS or RFID tags from the time  exam papers  are printed to track the Real Time movement of exam paper till they reach class rooms and many other security aspects can be added for additional security at the last leg of delivery.
I don't have knowledge of the system used by education institutes to deliver examination papers and there could be minor changes in the process. What is required is the transparency, security and trust that blockchain or similar solution can deliver so that students have faith in the system. Diamond merchants face challenge of fake or over valued diamonds and they use blockchain to trace the journey of each diamond from the mine to the stores and all the way to final buyer. Every diamond sold by DeBeers has a complete audit trail of every person who had handled each diamond and there the transparency in the process builds trust. Blockchain promises trust in your business process.

Tuesday, March 27

Does my company need a BPM software to implement business process management?

Please answer these questions in yes or no
  1. Does your company implement continuous business process improvement ?
  2. Do you think there is scope to improve company's productivity, collaboration, efficiency & QoS?
  3. Do you think there is scope to improve customer satisfaction?
  4. Does your company implement innovative changes to the way you do business?
  5. Does your company spend too much time in modifying software to adhere to changing process?
  6. Does your company face challenges in adherence to the defined process? 
  7. Does your company want your employees to have simplified process management & removing redundancy in existing process?
  8. Do you think your company can leverage big data to make smart decisions?
  9. Does your management expert a dashboard view of your business process management including insight about the performance across departments?
  10. Do you think there is scope to improve collaboration across distributed groups and partners ?
  11. Does your company want to reduce the dependence on software engineers to manage your business process? 
  12. Do you think a change is process should be tested in simulated environment before implementing the change?
  13. Are your competitors already using BPM ?
                                                                           

If the answer to all these questions is yes than your company needs a state of art BPM product to automate the business process management . In the next post we will take an example of a business process and see how you can achieve the above goals.

Friday, March 9

What you should know about Artificial Intelligence (AI) ? Why are some people worried about AI?

When people talk about how Artificial Intelligence (AI) is going to revolutionize the game they fail to tell you that AI is as old as computers, well almost. AI is as old as the first computer program way back in 1940 when programmer Ada Lovelace tried to write a program to do something which is normally done by people and that is the simple definition of AI. Garry Kasparov, the chess Grand Master who was defeated by IBM’s Deep Blue Computer in the 90’s.  Garry's reaction to the defeat was that he realized that he would have performed much better if he had access to the same chess programs that Deep Blue had.  Deep Blue had huge storage and was fed thousands of detailed chess game moves data' from which it was able to anticipate opponents next moves and plan its moves and coupled with it's CPU computing power Garry was no match to him!

https://digitaltechnologyarchitecture.blogspot.in/


Over the years as computers and related technologies have evolved we see evolution in AI capabilities.When you buy a laptop from Amazon and amazon web-page suggests relevant laptop accessories for your product that's actually a low level AI program that has processed your purchase data in real time to smartly predict what accessories you may require. Few others examples of AI implementations that you might already be using are Apples Siri, Google's Assistant and Now, Microsoft Cortana and IBM's Watson. We will discuss narrow AI & general AI , strong AI & weak AI at a later stage and for the moment you should know that these are just advanced concepts in AI implementation.
                So why is there sudden surge in implementation of AI over last 4/5 years?  As the info-graphic by digitalintelligencetoday.com shows research on developing smart programs has been going on since 1950 and we have seen some working examples over the years. Over last 10 yrs we have seen cost of storage decreasing drastically, the cost of internet decreasing and internet speeds are improving. We have seen technologies and software like Big Data, Social Media, Mobiles, Sensors, Cloud, Hadoop are enabling creation & consumption of huge amount of data. This huge amount of data created every second and the technologies that make it possible to store and process the huge data are helping AI evolve and become smarter. Don't let some hi-fi techie tell you that AI is a new born and its going to make your job absolute. Like any other successful technology AI is being used for few years and it will help drive data driven intelligent automation to make our lives simpler. Way back computers were supposed to take away the job of millions of workers and though some jobs became obsolete, computers actually created many more new jobs and improved & simplified the way we do business. Today no one seems to blame computers for rendering them job less and in few years people will accept AI (which is nothing but a smarter computer program) as an invisible entity that is integral part of their life.


AI - Its about machines/computers learning from Data from Everywhere

Learning from mistakes is human nature but humans have finite memory and processing power as compared to a computer. From mid 1950s research is being done on machine learning, since machines can access data storage and learn from the data to continuously correct the mistakes and improve. The latest advances include self-driving cars, IBM Watson, a computer that can beat humans at Jeopardy and real-time machine translation that  seem quite like the universal translator in Star Trek.

My favorite AI info-graphic is the one below and it is owned / created by www.digitalintelligencetoday.com



Next post -  So where is AI being used by today's businesses?





Thursday, February 15

How Blockchain could have prevented the 'Great Indian 11000 Crore PNB Bank Fraud' ?

Blockchain is a distributed digital ledger that enables and records the secure transfer of data and documents through a public or private peer-to-peer network. Blockchain allows secure management of a shared ledger & transactions are verified & stored on a network without any governing authority.  Blockchain configuration can be on a public open-source network or a private Blockchain network that required explicit permissions to read/write.

The best example of how #Blockchain can prevent fraud is the 14th Feb 2018 news report about The Great Indian INR 11000 Crore PNB Bank Fraud.

  1. A businessman #NiravModi allegedly bribed couple or more bank officials of PNB (Punjab National Bank) and managed to get a fake letter of Undertaking or LOU from PNB Bank, without providing any collateral to the bank (providing collateral is the standard practice).
  2. Then #NiravModi allegedly used the FAKE  LOU to fool few more banks and businesses ( which basically means that PNB bank is his guarantor as per the fake LOU and if Nirav is not able to pay his creditors then PNB Bank would be responsible for paying his creditors, to an amount of INR 11000 Crore or more. Holy Flying Cow!)
  3. The Fraud was not detected for years because the fraudster issued a fake LOU but did not record it in the bank account so bank was not even aware of the LOU (apart from the people who were involved in the fraud)
  4. Whats also  surprising is that none of the business associates or banks cross checked with Punjab National Bank for 7 years to verify that the LOU was authentic.

Core issue of PNB Fraud is poor implementation of BPM process :

Before we discuss blockchain lets make one thing clear that the main issue in the PNB Scam is poor definition and implementation of business process regarding LOU. If the business process management software does not implement tasks of review and approval for critical process like LOU then this calls for a immediate review of the BPM system of PNB Bank (and other public sector banks) as there could be other issues in implementation of other critical processes. The other issues is banking process defined by RBI does not seem to have a task of crosschecking with issuing bank to verify the authenticity of LOU or similar  documents issued by a bank.  As a matter of fact if on bank official can forge a document and the banking does not have process to validate the authenticity of the document then tomorrow some outsider can also forge the document and bank would not be able to identify the fraud! Some serious software process audit is immediately required by PNB and would be ideal if other banks also audit the software implementation of their business process management and seek expert guidance on how to fix/improve the BPM implementation and have an audit trail which can help trace any anomaly or attempt at fraud .


So how could blockchain prevent a similar fraud? Before we begin discussing blockchain let me remind you to not confuse blockchain with Bitcoin or any other cryptocurrency Bitcoin is one implementation uses cases of blockchain technology. The following image shows typical steps in working of  #blockchain.

                                       
                                                                        
The Great Indian 11000 Crore PNB Bank Fraud is in-reality is a very basic type of fraud!  This kind of fraud is so basic that it needs brain of a 5th grade school kid who hides his mark sheet from his parents when he gets poor grades! The bank officer who gave the forged LOU to #NiravModi did not document in bank record that he had issued a LOU. As there was no record of issues LOU in the bank computers, no one in the  bank was able to detect the fraud for years. The LOU was allegedly used by  #NiravModi to commit more frauds of which details are not available in media as of today. Its a huge scam because if #NiravModi (assuming the fraud is proved) did not honor the creditors then PNB bank would end holding the sack worth INR 11000 Crore! This fraud was successful  because in the banking process there was no process to restrict a corrupt employee to issue a LOU and neither did the bank define a process for other banks to validate the authenticity of LOU issued by the bank.
                                                          At at business process level one would call this a very poor implementation of a business process. Any letter of credit issues by any bank should not be valid unless it is cross verified by the bank with PNB but here the letter was used for many years without any creditor ever bothering to check the authenticity of the LOU with the provider bank! Frankly I can't believe this fraud actually happened but I guess there are many fools in the business world who don't even bother to check a bank guarantee is authentic or not! So how could we have a software system than can prevent such fraud irrespective of how many foolish bankers are involved in the process!

How blockchain could have prevented the PNB Bank 11000Cr Fraud?

  1. In a blockchain world , all the steps in the 'Letter Of Credit' process would have been recorded in a blockchain ledger database
  2. Notification of each step in the process of  'Letter Of Credit' process would have gone to all approving bank officials and it would be impossible for any employee junior or senior to issue a letter of credit without knowledge of the bank officials.
  3. Even after 'Letter Of Credit' is issued when the customer shares the 'Letter Of Credit' with any bank or business entity, they would be able to view the process trail of the blockchain which is fool proof because blockchain ledger database it is like a database which only allows insert and does not allow update or delete
  4. Since 'Letter Of Credit' entry in database cannot be deleted from a transaction ledger in blockchain it is not possible for anyone to HIDE any information or UPDATE any information without knowledge of the approving bank authorities.This means the LOU could not have been issued at all if a system similar to blockchain was implemented.
  5. When the LOU is shared with another bank or business entity the guarantor bank (PNB in this case) would get notified when the blockchain transaction gets updated. This would ensure that same LOU is not shared with multiple banks or business entities to commit fraud.
  6. The following figure shows transactions in a sample business process flow


    1. A bank officer initiated a LOU , thus creating a transaction in the blockchain
    2. How approving authorities are automatically notified by the blockchain system
    3. How the approval transaction is inserted in the same block and becomes an immutable entry in the database
    4. How the issuing bank keeps getting informed when the LOU is submitted to another bank to get credit or to a business associate to ensure there is a immutable chain of life-cycle of the LOU that is only accessible to authorized personals
    5. For the life-cycle of the LOU the entire chain of transaction are attached to the LOU and all concerned people would be able to see the history and authenticate the LOU  
  • In summary, blockchain or a similar software design that creates an immutable log of a bank process flow foe ex. LOU ensures that a ''log' of all activities or transactions is maintained in a secure ledger database  and is through the life of the document and even after the document validity expires. The immutable log helps build a trust relationship between partner entities and and it also helps speed the business process as all entities get copy of entire transaction log' as updates in real time this facilitating transference. 
  • For those who did not understand the above example of business process, imagine a tamper proof paper register (from which pages cannot be removed!) in which all banking transactions have to be recorded in sequential order, using a permanent marker pen (so entries cannot be erased). A copy of the paper register is sent to each supervising  bank official (sounds redundant but this is just an example) Since a copy of register goes to every official within seconds of transaction being done, there is no way a official will not know about a transaction. Now when borrower submits this LOU document to another bank, a copy of register is again sent to the issuing bank officials and also to the receiving bank officials  - so all authorities get a copy of updated register every time a new update happens ( This is only an example in reality an entry is made in a 'Write only database" for every transaction, from which data cannot be deleted nor updated) . 
  • The rule of the game is to build trust of participating parties, each transaction is recorded in a 'write only immutable database' and participating parties get a copy of transaction log every time a new transaction happens. So at any point of time every participant has the latest  transaction ledger. There is some amount of redundancy in the process because the ledger gets sent to all participants every time there is a transaction but it helps to ensure absolute trust since there is no one person managing the central database and there is no change of manipulation of data without knowledge of other concerned parties and as mentioned earlier all participants have their own copy of database which is immutable and tamper proof.



Wednesday, February 14

How could Sensors, Social Media, Call data, Intellegence Data & Big Data help prevent terror attacks on Indian Borders?

As we hear the news of another terror attack in Jammu & Kashmir in India, I wonder what kind of software driven intelligence is used by Indian Army to predict and prevent terror attacks. I was in discussion with an army officer from intelligence department and we discussed how these technologies are being used by telecom, bank & insurance companies to prevent frauds, by power and oil companies to predict and prevent power outages and the army gentleman was amazed at the use of big data, sensor, predictive analytics, artificial intelligence technologies by industry verticals.

Software architects are trained to identify scope for innovation by using existing infrastructure coupled with latest technologies. We can sit with a bottle of beer or a cup of coffee and tell you a incredible story that may seems fantastic and surreal but quite easy to implement. The army men promised me that he would discuss the case studies with his boss and hopefully set a meeting with him! What could be better than being invited to share ideas with the guys from army intelligence and do my bit for the country. I am going to post some usecases that can be used to build a Border Security solution using Sensors, Predictive Intelligence & historical data. I am not familiar with the technologies in use by border security forces so I am going to assume certain facts.


What I think can be achieved in a short time is a Multi Dimensional Intelligence Dashboard (we love buzz words!) that can give various insights into potential security breaches on border and real time alerts.

Intelligence Dashboard can give intelligent insights for example :


1) What are the locations around the border where enemy may try to sneak in on a particular time on a particular day?

2) What is probability of an intrusion at various locations on border based on human movements on both sides of the border

3) What is probability of intrusion based on weather condition and day/time of the year

4) What kind of intrusion strategy can be expected on next D-day

5) What are the support groups that could be working within the border to support the enemy  intrusion and how to track their movement

6) What is the synergy between social media, telecommunication and messaging application and enemy activity

7) How to auto detect events that indicate potential enemy activity in near future

The following sample illustration shows different data sets being collected, processed in real time and predictive analytics being performed on data at rest. Data sets are collated and analysed to derive insight into potential incidents and displayed on user friendly dashboard






MUSTREAD : How can you use Index Funds to help create wealth? HDFC MF Weekend Bytes

https://www.hdfcfund.com/knowledge-stack/mf-vault/weekend-bytes/how-can-you-use-index-funds-help-create-wealth?utm_source=Netcore...