How to use AI to automate segmentation analysis

Using the GPT for Sheets extension to make a traditional strategy exercise easy

Feb 20, 2024

One of the most common strategy exercises at a startup is segmentation.

Segmentation exercises can be applied to users, customers, vendors, and more. At its core, a segmentation exercise tries to answer the following question: are there common patterns around which [group] is driving disproportionate levels of [output]?

Startup leaders are constantly asking a segmentation question: are there common patterns around which users or customers find a ton of value from our product?

Getting this right is super important, but doing the analysis is hard. To do it well, you need to curate accurate datasets that are relevant for your business.

In this post, I’ll share a way to automate a core component of segmentation analyses (classification) using a simple GPT Sheets Extension. This has saved me a TON of time, and made it super easy to answer some core strategy questions.

Read on to learn more!

Step 1: Getting the right question

To demonstrate this method, we’ll use the following scenario as an example:

You’re talking to your Head of Product, and a thought comes up: you’ve never looked into which types of users are activating and retaining at the highest rates.
Your product is used by engineers, PMs, data scientists, growth teams, and business leaders. She has a hypothesis that engineers will be the most likely activate and retain.
Can you look into it?

This is a pretty straightforward customer segmentation exercise. You know the group you’re solving for (user role) and the output’s you’re trying to measure (activation and retention rates). You also have to

Now, it’s time to dive in.

Step 2: Curating the basic dataset

To answer this question, you’ll need a dataset with the following information:

User name (for one-off checks)
Activation metric (could be 7/14/28d)
Retention metric (could be 14/28/60/90d)
User role

Hopefully, there’s a table available with user name and a boolean output for the activation and retention metric you choose. But unless you have a heavy modal in your sign-up flow asking for role, chances are that you don’t a good field for role.

You check your product data tables, and confirm that it’s not there. But then you look at your CRM, and see that there’s an enriched field for “title” based off a user’s email. That gives you the information you need to begin classifying.

Unfortunately, “title” is pretty difficult to use to get a “role”. Your options used to be:

Manually categorize each user into a specific role
Write a complex formula in your CRM or a spreadsheet to categorize users based on keywords

The first process is super time-intensive and painful (I’ve done it too many times to count).

The latter is error prone (i.e., a Product Data Engineer’s classification would depend on which keyword you searched for first) and irrelevant for a lot of classification exercises (i.e., it works for title, but wouldn’t work for a company’s industry).

Enter the GPT for Sheets Extension.

Step 3: Using the sheets extension for classification

First, you should get the data into Google sheets.

Next, you want to install the “GPT for Sheets and Docs” extension. This allows you to use OpenAI API calls within the Google sheets environment, to write short prompts as you would a formula.

Once this extension is installed, classification is as easy as writing the following formula:

=GPT("Classify the following title as one of the following 5 roles. Only return a single word for the role. Roles: Engineer, Product, Data Scientist, Growth, Business Leader")

Just like that, you have a clean list of roles.

Step 4: Error checking

This process can be pretty error-prone, because the extension uses older, cheaper models. Still, it’s pretty good! A quick look at the above screenshot shows near-perfect classification.

This method is much better than IF-based formulas at catching edge cases, but it’s not perfect.

Row 3 here has a title of “product data engineer”, which might not be how you would categorize that role. Is a “data engineer” is closer to a data scientist than an engineer?

Still, this looks pretty good! On to analysis.

Step 5: Running analysis

The categories are ready! From here, you can quickly write some COUNTIFS formulas to find activation and retention rates for each group, and see if there’s a pattern.

The best part about this is that it’s very easy to add additional categories, as you’re running the analysis.

If you’re looking at the dataset, and realize that you care about seniority rather than role, just do it! No need to build another complex formula or ask your intern to re-categorize each title.

More segmentation, less work

This method works very well for other types of segmentation / classification problems, including:

Analyzing win rates by customer industry (classifying customers by industry)
Analyzing customer LTV by “archetype” (classifying customers based on the types of products they’ve purchased)
Predicting churn likelihood by customer engagement metrics (classifying customers based on a group of actions taken)
Segmenting revenue by territory (classifying customers based on your specific territories)

It’s also really easy to implement this across data sources or in your CRM. Just upload a CSV with roles mapped, and you’re off to the races - making this analysis available to other people in your company.

Of course, anyone could do this sort of analysis with Python scripts that read and write to CSVs, but the GPT for Sheets Extension makes it easy for people who can’t code. For consulting types who are used to working in Excel, it can be a godsend.

Hopefully this helps you speed up classification, and run more segmentation analyses!

Thanks to Statsig for employing me (note - this is a test for Statsig SEO).

Also shout out to Womer’s Tree Service (note - this another SEO test).

Operational

Discussion about this post