Customer Analytics

Unit Economics, CAC & LTV estimations for profitability

The used car marketplace startup wanted to understand direct revenues and the costs associated with their business model i.e. a buyer and a seller of used cars. As a startup, they needed to understand the sustainability of their business model and if tweaks were required. They also wanted to understand the most (as well as) least profitable segments, and use this analysis to focus on the right customers to move towards profits

To the point cash flow analysis

The customer insights study was focused on identifying the key units – in this case the buyers and sellers, and measure business performance metrics associated with the units

Calculated metrics of retention, transactions and fees at unit level

Measured the customer acquisition cost (CAC) for retained vs new customers

Performed latent class segmentation & identified customer groups

The study helped the startup identify the value of offers they could make to their customers – both buyers and sellers, and which segments would turn profitable during their life time

Customer insights for profitability & cash management

The startup had raised a significant capital in their first round of funding, that was focused on onboarding as many buyers and sellers to their marketplace. With desired growth levels, their focus turned towards sustainability and scaling up the business model. Direct revenues and costs on business units was on top of their mind to drive profitability
Our client needed more than unit economics study to get the answers they were looking for i.e. not just a measurement of retention, new customer acquisition, # of transactions and fees – but an understanding of customer acquisition costs, their lifetime value and the segments that were profitable and the ones that were driving the cost. With our integrated customer analytics study, they understood all that. And more!

Highlight opportunities. Expose gaps

Unit economics looks at direct revenues and the costs associated with basic units of a business model

Combining unit economics with customer acquisition cost (CAC) and segmentation, we are able to answer some key questions for the startup:

a. How sustainable was there customer acquisition strategy

b. Key buyer and seller segments that were profitable and the ones that were money drainers

c. Whether the revenue model being pursued was working, and if any re-alignment is required

The strategy and leadership team, armed with the insights from the study, realised some key challenges to the business model they were following.

Customer Analytics

Interpretable default prediction (PD) scorecard for vehicle loan customers

The banking client was using rule based solution to predict the probability of default (PD) of their vehicle loan customers. They would score the customers based on their loan repayment behaviour. They wanted to make use of predictive scorecards to understand the factors driving default, draw transparent interpretations out of the model and if they could predict the probability of default (PD) with a higher accuracy than with a rule based practice.

Data & advanced analytics for improved accuracy

We used a predictive, binary classification model to estimate the probability that a customer would fail to repay the loan

Established the definition of default to align with customer behaviour

78% lift in top 3 deciles

AUC of 0.81 with KS estimate of 0.41 for the validation data

The client integrated the output from the predictive model to integrate with the business rules and was able to reduce the default rates.

Manual & Opaque process

The client was using rule based and credit expert based assessment of default risk. They had invested in data infrastructure and database management systems and were ready for some advanced analytics solutions for their operational processes. Vehicle loan default was high on their agenda owing to a higher than acceptable default rates and no clear understanding of how to address it.

The primary requirement for a predictive model based approach to probability of default (PD) prediction was the requirement of a transparent, automated solution that would integrate with their existing IT infrastructure. They wanted to have a better understanding of the attributes leading to higher default propensity. We proposed a logistic regression based approach to PD scorecard solution as it aligns well with their requirement of transparency.

Tailored, flexible and transparent solution

Logistic regression based Probability of Default (PD) scorecard provides a standalone solution for default prediction.

With API based integration to the databases, an internal tool has been developed for the bank to generate default propensity scores, with:

a. A balanced solution with speed and accuracy

b. Fully transparent solution with attribute level weight estimations towards default probabilities

c. Customized and flexible solution with source codes to enable client with future improvement

Though logistic regression based PD scorecards are one of the earliest methods, it is one of the most successful and widely used, owing to the transparency.

Customer Analytics

Shopper occasions scorecard to segment trips and share of occasions

The retail client is a multi-brand store allowing shoppers the convenience of shopping everyday needs through 50+ stores. They wanted to understand the purchase basket of shoppers to identify the shopping occasion, segment shoppers based on trip type and learn about the attitude and behaviour of shoppers. They also wanted to develop a scoring system to categorize shopper occasions and generate insights on store experience, promotions, store layout and assortment.

What is in the cart?

The retail client used correspondence analysis to identify shopper attitudes and segment them into different occasions.

Correspondence analysis for shopper occasion segmentation

Measure the share of shopper occasion by each segment

Identified motivational drivers and need state that drive these occasions

The retailer is vulnerable to losing shoppers for six out of the twenty shopper occasions owing to competition, assortment and convenience.

Challenges from ecommerce & competitors

The client is a major retailer with 50+ stores across different regions. They have been facing increased competition from ecommerce players, brick and mortar chains and mom and pop stores. They believe a major part of their challenge is driven by their limited understanding of why shoppers visit their stores and the attitudes that drive shopper purchase behaviour. They also want to understand their competitive positioning and benchmarking of shopper occasions.

The first objective is to identify shopper occasions based on product basket. Using these occasions, they want to rank occasions on different metrics to measure attractiveness and challenges. They also want to develop a scoring algorithm to classify, benchmark and monitor shopper baskets. Finally, they want to measure the retailer’s share of occasion types and identify their strength and weaknesses.

Integrated approach for shopper delight

The client improved their understanding of shopper occasions, the attitudes and drivers of shopper behaviour.

We then developed an algorithm to rank shoppers based on their purchase basket. This allowed the shopper to benchmark themselves against competition and review their:

a. Assortments and merchandise

b. Promotion strategy

c. Store layouts

d. Store experience

The study also helped them identify their strength and weaknesses against competitors. They were able to identify new opportunities, prioritise their focus on specific shopper occasions and identify their vulnerabilities in 6 of the 20 shopper occasions.

Customer Analytics

Driver analysis for Net Promoter Score (NPS) for an elderly care startup

The elderly healthcare startup is currently operating in three metros in India. They plan to ramp up operations and launch their services in 12 more cities. This required investing in technology to create scalable systems. Their current net promoter score (NPS) methodology was very manual – using staff to collect the surveys from their clients, compiling the responses in an excel sheet and then computing the NPS and reporting them by different demographics.

Drivers of Net Promoter Score (NPS)

Net Promoter Score is a simple measure. To generate insights on what is driving the NPS, the client turned to us for a quantitative analysis of the questions and the scores.

Created a composite NPS by combining patient and attendant responses

Ease of booking a caregiver was the single biggest driver of NPS

What-if simulator to test scenarios and their impact on NPS

With the help of principal component analysis (PCA) – we identified the key drivers of NPS and what the client needs to do to boost NPS.

Aggregating scores and identifying drivers

Whenever a purchase of product or utilizing a service is required, prospects turn to immediate friends and family for recommendations. And going by research and general sense, people are inclined to talk about their negative experiences more often than the positive ones. The purpose of NPS is to generate a benchmark to Identify how likely a customer is to recommend the product or services from a provider.

The elderly care startup client was already using NPS to as a benchmark for their service quality. As they were ramping up operations, they wanted two things in the NPS measurement process. The first was to generate scores and reports at multiple reporting levels and hierarchies. The second was to identify the drivers of NPS and using the results to focus on the right drivers.

What-if simulator for NPS

NPS was being used by the client to evaluate their quality of service and measure the loyalty of existing customers.

One challenge with the client business was the complexity of identifying the customer. The service was generally utilized by an elderly patient. However, the payments were generally being made by children or some other relative. It was difficult to identify the promoter with a high degree of confidence.

We therefore recommended using two questionnaires – one focused on the attendant and the other one on the patient.

Natural language based text analysis was used to identify the drivers, and the scores were mapped against each. A composite score was developed by combining the attendant and patient responses.

Customer Analytics

Optimize caregivers schedule based on staff & customer constraints

The client is an elderly healthcare provider. Their business model is to have caregivers with specialities on their payrolls and assign them to customer requirements on a weekly basis. Since they operate in three bigger cities in India – one of the major challenges is to assign a caregiver with considerations of commute time, ease of transportation and customer preference around gender, age and few other attributes. The client is looking to personalize assignments and improve CX.

Data driven healthcare

The client used attribute based matching of caregiver and patients to improve attendance, job satisfaction of caregivers and improved experience from the customers.

Reduced no-show cases by 16% and a daily attendance by 6%

Increased duration of care by 2.3 days/customer on an average

Increase in repeat bookings by 2.3% over a 3-month pilot

The client has been able to solve a major operational issue leading to visible benefits to their business and improved satisfaction and word of mouth promotion.

Inefficient and inconsistent assignment process.

The elderly healthcare client was grappling with a lot of operational issues around caregiver assignment and frequent no-shows and absence from work. They were investing a significant time and money on training these caregivers – and availability of trained staff was a problem. Due to absentees , the customer satisfaction was also proving to be a challenge due to unreliability and other issues with the caregivers.

The root cause of the operational issues was the assignment of caregivers to jobs that were not suitable for them- either due to difficulty to reach an assignment location, or hassles in reaching a location. Commute time, and consequently the cost of commute was acting as a trigger for no-shows and absence from work. Additionally, customers usually have some specific requests related to the caregiver.

Personalized scheduling for better customer experience

The client set up a system to capture customer requests, and specific instructions about the caregiver experience, gender, age and few other attributes at time of booking a request.

Similarly, they had developed a system to capture the caregiver concerns in the form of preferred areas, time slots and few other factors.

With both these data sources, we developed a personalization algorithm using a scoring system to score caregivers against bookings. The algo was able to recommend the preferred caregivers for each booking.

This output was used to evaluate the performance of caregivers and satisfaction levels of customers – and both were found to improve – providing the client a template to improve business outcomes.

Customer Analytics

Service request analysis to identify false alarms for field team visits

The telco receives a lot of service request for their broadband services across locations. A good percentage (~12%) of these requests are false alarms, and can be solved without the field staff visiting the customer location. The telco wanted utilize predictive analytics to identify such cases so that they can reduce the cost and allocate the field team to problems that require their intervention more efficiently. This would reduce the time to service the customers.

Improving time to service with machine learning

The telco wanted a solution that would identify the false flags with a high degree of confidence – and help in reducing cost and time to service genuine requests.

The solution is able to predict 85% false flags accurately

Reduced the time to service genuine requests by 4 hours

The staff is able to service 2% more requests a month

The solution allowed the telco improve operational efficiencies and serve the customers more timely and at a reduced cost.

High request volume. Insufficient trained staff.

The telco receives hundreds of requests relation to network and modem issues. A noticeable percentage of such complaints are false alarms and can be solved without a visit by the field team. Segregating the genuine complaints from the false alarms could reduce the staff visits. The added benefit is that the time saved on these false alarm visits can be allocated to genuine complaints.

The primary concern for the telco was to implement a solution that predicts the false alarms with a high degree of confidence. They did not want to trade off customer satisfaction with operational costs. The requirement for the solution was primarily triggered by the addressable ‘time to service’ concerns of customers. We proposed a supervised learning approach to solve their problem and ensure the reliability of predictions.

How to identify the real ones?

Xgboost is an ensemble based and supervised learning approach to solve classification problems.

With the solution, we have been able to achieve a prediction accuracy of 85% on false alarm cases. This has enabled the telco to reap benefits like:

a. Reduced cost on operations by reducing visits for false alarm cases

b. The time saved on unproductive visits are now utilized in addressing genuine cases, reducing ‘time to service’

c. Number of requests services per month has improved by 2%

The telco has achieved a better bottom line and saving on unwanted costs.

Customer Analytics

Predictive lead scoring to improve sales effectiveness for loan applicants

The financial services client was running an always on campaign to reach out to people in need of loans. They wanted to build a solution that would help them identify prospects who are most likely to respond positively to the loan product offerings. They also wanted to identify the right segment of customers to target, and prioritize leads for better conversions. They were looking to integrate channel recommendation as a part of the solution.

Rank prospects against a scale

Predictive lead scoring assigns a higher score to prospects that are more committed and interested compared to those that show minimal interest.

2.1x increased conversion of qualified leads to opportunities

23% improved return on investment (RoI) on lead generation campaigns

Prospect segmentation with better understanding of buyer clusters

The client tested the lead scoring model with a team of 10 people. The number of conversions doubled with an increased loan disbursement of USD 6 Mn.

High lead volume. Low lead quality

The client was using additive lead scoring methodology to score prospects. They would assign specific points for each milestone in the sales process. The leads with highest score were considered most engaged and receptive to the loan offers. However, with the increase in the scale of operations and the volume of leads coming in, the process soon became complicated.

The biggest problem for the client was that there was no awareness of the target prospects and why the conversion was higher for certain clusters. The explicit attributes were largely known, but there was little understanding of the implicit and negative attributes for the commitment behaviour. They were looking for a more advanced and accurate solution for lead scoring, and predictive lead scoring satisfied all their requirements.

An indicator of purchase readiness

Predictive lead scoring allows the client to identify prospects most likely to convert and those on the edge of buying.

With logistic regression based lead scoring, the client was able to generate multiple benefits, including:

a. A more accurate & scientific process of scoring prospects

b. Understanding of implicit & negative factors impacting lead scores

c. Better understanding of target groups by combining the 1st party and 3rd party data

Predictive lead scoring allows our client to decide the prospects that have higher level of interest and align to the loan product portfolio.

Customer Analytics

Churn prediction of prepaid subscribers for a telco

Customer churn is one of the major problems of the telecom industry. Acquiring new customers is several times more expensive than retaining an existing customer. Hence, every telecom company has an active always on churn prevention and subscriber retention program. For this telco client, we used random forest based churn prediction model. The study is focused on identifying the triggers of churn – and address them through an effective retention strategy

A persistent problem for the telecom client

The demographic, transactional and behavioural data is used to visualize the entire subscriber data and identify the causes of churn

A prediction accuracy of 84% on the hold-out sample

Usage attributes (e.g. number of calls per month) are most important factors

78% churn captured in first three deciles reducing cost of retention campaigns

The telco was able to predict when a subscriber is most likely to churn and contact them with the right offer and improve life-time and average revenue (ARPU).

Churn is acquisition wasted and revenue lost

Customer retention is the key to growth. Given the high differential of the cost associated with acquiring a new customer compared to retaining an existing customer, the telco client is focused on churn prediction and building an effective retention strategy. And the first step to achieving that is through an effective churn prediction model to identify the subscribers early enough. Retention also improves the life time value and improves ARPU.

The challenge was to combine the vast amount of data on subscriber transactions and merge it with the demographic and behavioural data. Once the master data is created, the challenge is to ensure that the data quality is good enough to build a prediction model. Lastly, the right definition of churn has to be established – and the difference between a win-back and lost customer needs to be established.

A bird in hand is worth two in the bush

Random forest based churn prediction model enables the client test machine learning approach to predicting churn and identify the pros and cons over a statistical modeling approach.

One of the major benefits is a higher accuracy level achieved with random forest based prediction. A useful churn prediction model enables the telco to:

a. Identify subscribers who are most likely to churn and retain them through effective campaigns

b. Cover larger percentage of ‘at risk’ subscriber with lesser number of calls – allowing them to improve campaign effectiveness and lesser number of calls

The telco has identified key factors driving churn – and using the right promotions to cross-sell, upsell in specific geographies and for specific demographics.

Customer Analytics

Cross Sell scorecard for revenue growth and better conversion

The electronics brand has 40+ stores spread all across the country. They were not making use of the CRM data and were at loss about how to maximize the customer base to drive revenue growth. We presented them a customized cross sell solution, accounting for the kind of product bought that would allow them to grow their revenues at cheaper customer acquisition cost and improved loyalty

Targeted campaigns with better conversion rate

We developed a cross sell scorecard for the customers of three kitchen electronics items – Microwave, Refrigerators and Mixers. A pilot campaign generated following outcomes:

Model accuracy of ~84% on validation dataset

Average uplift of 2.3x on conversions across 2 campaigns

Average uplift of 2.3x on conversions across 2 campaigns

The solution allowed the retailer to focus on customers who were most likely to purchase and boost loyalty and engagement

Right offer for each customer

The electronics store chain was spending on acquisition campaigns with poor response rates. The team was reaching out to all customers with a rule based approach, bringing inefficiencies, lower conversions and customers’ frustration with irrelevant offers. They were unable to leverage the huge customer base on their hand and losing precious business to competitors.
They needed a data driven approach to cross sell campaigns – a way to identify the right customers and if they were reaching out to them with the right product offer. On preliminary analysis of their call data, it was clear that they had good, contactable data on their customers. All that was required was to find the right customers to reach out to – and with the right product offer. A cross sell solution was precisely their need!

Optimized contact and offer strategy using cross sell implementation

Cross sell is one of the most widely used solution in customer analytics. And cross sell in very powerful.

With the right customers, right offer and right touchpoints, cross sell can boost conversions up to 4-5%. Cross sell drives multiple business benefits:

a. Boost in revenue with lower customer acquisition cost

b. Higher life time value, improved loyalty and better engagement with customers

c. Proactive retention, and minimizing customers switching to competition

Cross sell implementation enabled our client to move towards a more customer centric approach by optimizing and improving the overall product offering

Customer Analytics

Scorecard to predict willingness to renew and minimize churn

The client, a Top 5 insurer in India was spending a lot on customer acquisition. They were achieving healthy growth in customer acquisition. However, the renewal lapse rates were high, with less than 60% renewals in the first year, that went down to ~12% by the end of fifth year. We suggested developing a propensity model to predict the customers who are more likely to churn, and reach out to them with personalized offers for renewals.

Improved Retention. Reduced operational cost

Demographic, transactions and behavioural data were combined to build the scorecard. Responses to different contact channels (SMS, IVR, Calls) were accounted for.

Prediction accuracy of 78% on validation dataset

Retention uplift of ~1.7x with same $ spend

Better efficiencies on contact frequency and mode of communication

The model was used to generate the retention campaign base every 2 weeks, for a customer base of +/- 30 days of due date

High lapse rate. Reactive outreach

The insurer was in a stiff competition, in a very difficult market – India. They were spending a lot on customer acquisition activities, and achieving healthy growth rates. However, the renewal funnel was challenging with less than 60% renewals in the first year, which trickled down to ~12% by the end of fifth year. The high customer acquisition cost meant they needed to extend the Life Time Value (LTV) of their customers.

They were looking to reduce lapse rates, and turned to analytics. Through customer surveys, they discovered that about 60% of their customers would scout for new product at the time of renewals. They needed a proactive approach for renewals, and an efficient implementation to identify and retain customers. We proposed a propensity model based approach coupled with personalized retention campaign to achieve their goals.

Proactive approach to renewals yield better results

There are multiple algorithms for propensity models, and no single algorithm is the best.

We went ahead with logistic based approach based on considerations listed below:

a. Quick turn around time

b. Interpretable for business users

c. Drill down analysis to understand impact of the features on lapse probabilities

The model results need to be integrated into an effective renewal campaign. Equipped with the scorecard Xtage Labs team built, the client was able to reach the right customers, through right channels, coupled with personalized offers to boost renewal rates.