Data Science

Challenges in Data Science success

Modern businesses are collecting data at an unprecedented scale. And businesses are realizing the benefits of processing this data and generating insights. With the current hype surrounding data science and the bountiful returns being promised, advanced analytics and machine learning also run the risk of being misunderstood. Businesses should know few basic things about data science that determine the success or failure of any data science initiative.

Data Science workflow

The flowchart below shows a typical data science project implementation steps:

One of the most overlooked fact is that, typically 80% of the effort is focused on getting the right data and pre-processing it. The pie-chart below breaks down the time taken for building a data science solution:

Some key focus areas to achieve success in a data science initiative are:

1. Data Collection: Data collection is usually done by the IT team. Their usual focus is on standard KPIs, privacy, security and minimizing cost. In most cases, the IT team is not the end users of data, and miss the bigger picture – of why the data is being collected and the data that needs to be captured. They may not be capturing the right or complete data, leading to challenges in utilizing it later for insights and business decisions.

2. Identifying the business problem: “Well begun is half done” is true for data science initiatives too. A data science problem needs to solve a single, well-defined problem. Once the problem has been defined, the business needs to define the frequency and granularity of output

3. Right data in right format: The businesses generate a lot of data. It is important to know the right data to use for a business problem. If you collect too much data, the cost of data collection and storage may put strain on your finances. Collect too less data, and the business may not be able to generate the right insights. It is therefore important to know the data that needs to be collected

4. Integrating data science output in the business process: One of the major obstacles in a successful implementation of data science initiative is to get the output in the correct form. Often, data science outputs are used by business teams, having little or no knowledge of the underlying algorithms and assumptions. Business teams cannot be expected to reach out to the data scientists every time a decision needs to be made. It is upon data scientists to provide the model output in a format and through an application that enables business teams to use those output in their decision making processes

5. Measuring return on investment (RoI): Businesses want to quantify the benefits of data science in $ terms. It is however difficult to assign a dollar figure to the model’s decisions. Measuring impact of a data science initiative is complex and requires a well-planned approach to measuring the benefits – both tangible and intangible

6. Selling Data Science: Data science team has a supporting role, providing the decision makers with the right insights when they need it. It is therefore important to highlight how they are driving success for the organization. It is important to note that data science is a difficult concept to understand, and decision makers generally do not care about what data science model is being used. It is therefore important to find a way to quantify the benefits in a language that the business leaders understand – that is usually in terms of dollars saved or additional revenue being generated. Data science team need to plan the way they are going to communicate the benefits.

It is important to note that we have talked about the major challenges that are commonly observed across organizations. This does not mean that a business is going to face only these challenges. It is the responsibility of the data science team to communicate the challenges, manage the expectations of the decision makers and keep them realistic.

Data Science

Data Science: From 0 to 1

Before we get into ‘How to implement data science solutions’, let us formalize the definition of data science:

Data Science is all about using structured and unstructured data to identify trends, validate hypotheses, and predict future outcomes. Communication of results through charts and graphs is a part of data visualizations.

It has been well established now that data driven insights can provide businesses with the tools and methods to predict future trends, events, and behaviours. For example, banks could use data science to predict the customers who are more likely to require a home loan or the customers who are more likely to miss a credit card payment. Similarly, CPG brands can use their past advertisement data to measure the sales generated by their marketing spends, and optimize the marketing budget to achieve maximum return on investments (RoI). You can check a few more case studies here

With all the buzz around data science & machine learning these days, it is very easy to go overboard – without you and your organization being ready to unlock the true potential of data science. Here is a checklist of items before embarking upon your data science journey.

1. Define the problem: Identify the problems you want to solve using data science. You should know that data science is not a magic wand that will make all your problems disappear. Data can answer almost all questions – but for that to happen, you need to ask the right questions

2. Data science cannot replace choice: One of the reasons data science can never be 100% machine-driven is that decision making is always a selection between choices – that can at best be rank-ordered, but there would always be an element of philosophy, reason, experience, future plans in making a selection of choice today.

3. Estimate return on investment (RoI): Estimate the amount of benefits that you expect to draw from data science. It is not always easy to express the benefits of data science in terms of dollars. You need to identify the right metrics to measure the benefits.

4. Build a roadmap: Once you understand the benefits of data science, develop a clear roadmap of how you want to achieve those goals – whether you want to build an in-house team, or you want your data science tasks to be completely outsourced. You could also choose a combination of the above two.

5. Siloed or Integrated: You also need to define whether your organization is going to pursue a siloed approach to implementing data science or whether you want to view it as an organization-wide effort. Both the approaches have their pros and cons, and a rational decision, which works best for your organization, needs to be taken.

6. Data Governance: Define clear data governance rules, including data ownership, architecture, policies, data quality, rules for resolving data related issues and policies for data management. Your business might have access to private data, which needs to be protected. Proper data security, confidentiality and access rules need to be defined.

7. Graphical Analysis: A lot can be inferred by looking at simple one-dimensional graphs and cross-tabulations. Simple graphs can help you spot anomalies or identify trends. In the age of big data, do not ignore the power of simpler methods.

8. Identify experts: Not all people are apt in handling and interpreting data. Identify people within your organization who are more comfortable than others in handling data. Data science is always contextual, and the more experienced a person is within your organization, the better he/she can contextualize data science.

9. Engage Data science Consultants: While it is a nice idea to use in-house expertise, asking for external help can reduce your turn around time. Even if you have sufficient expertise, sometimes, having an external perspective helps you see things differently.

10. Test, Learn & Modify: Any successful strategy needs to evolve with time. The same is true for data science. You should not expect to use the same techniques or the same model over and over again. The problems need to be revisited over time for getting the best return out of your data science exercise.

Today, most businesses view data science as an essential capability. The reduced cost of data storage and drastically improved computation infrastructure is changing business models. The businesses that will be able to unlock the key to data driven insights are going to win.