Modern businesses are collecting data at an unprecedented scale. And businesses are realizing the benefits of processing this data and generating insights. With the current hype surrounding data science and the bountiful returns being promised, advanced analytics and machine learning also run the risk of being misunderstood. Businesses should know few basic things about data science that determine the success or failure of any data science initiative.
Data Science workflow
The flowchart below shows a typical data science project implementation steps:
One of the most overlooked fact is that, typically 80% of the effort is focused on getting the right data and pre-processing it. The pie-chart below breaks down the time taken for building a data science solution:
Some key focus areas to achieve success in a data science initiative are:
1. Data Collection: Data collection is usually done by the IT team. Their usual focus is on standard KPIs, privacy, security and minimizing cost. In most cases, the IT team is not the end users of data, and miss the bigger picture – of why the data is being collected and the data that needs to be captured. They may not be capturing the right or complete data, leading to challenges in utilizing it later for insights and business decisions.
2. Identifying the business problem: “Well begun is half done” is true for data science initiatives too. A data science problem needs to solve a single, well-defined problem. Once the problem has been defined, the business needs to define the frequency and granularity of output
3. Right data in right format: The businesses generate a lot of data. It is important to know the right data to use for a business problem. If you collect too much data, the cost of data collection and storage may put strain on your finances. Collect too less data, and the business may not be able to generate the right insights. It is therefore important to know the data that needs to be collected
4. Integrating data science output in the business process: One of the major obstacles in a successful implementation of data science initiative is to get the output in the correct form. Often, data science outputs are used by business teams, having little or no knowledge of the underlying algorithms and assumptions. Business teams cannot be expected to reach out to the data scientists every time a decision needs to be made. It is upon data scientists to provide the model output in a format and through an application that enables business teams to use those output in their decision making processes
5. Measuring return on investment (RoI): Businesses want to quantify the benefits of data science in $ terms. It is however difficult to assign a dollar figure to the model’s decisions. Measuring impact of a data science initiative is complex and requires a well-planned approach to measuring the benefits – both tangible and intangible
6. Selling Data Science: Data science team has a supporting role, providing the decision makers with the right insights when they need it. It is therefore important to highlight how they are driving success for the organization. It is important to note that data science is a difficult concept to understand, and decision makers generally do not care about what data science model is being used. It is therefore important to find a way to quantify the benefits in a language that the business leaders understand – that is usually in terms of dollars saved or additional revenue being generated. Data science team need to plan the way they are going to communicate the benefits.
It is important to note that we have talked about the major challenges that are commonly observed across organizations. This does not mean that a business is going to face only these challenges. It is the responsibility of the data science team to communicate the challenges, manage the expectations of the decision makers and keep them realistic.