Jan 25, 2024

Text Clustering and Classification with Contextual Embeddings

A brief explanation of how contextual embeddings power CustomerIQ’s cluster analysis and classification

Text Clustering and Classification with Contextual Embeddings

If you’ve ever been part of a design sprint (like the Google design sprint) or participated in an affinity mapping exercise, you’ve performed a manual cluster analysis. 

It’s the part where you have a bunch of sticky notes up on the whiteboard and you’re tasked with grouping them by similar concepts. Usually what happens is your team gets overwhelmed and inevitably spends 20 minutes bickering about whether the “participant works from home 3 days a week” goes in the “work from home” category or the “preferences” category.

No, you are not alone. We have been there many times.

If you managed to corral your team long enough to finish the affinity mapping exercise you’ll remember that it took a long time.

Now imagine doing a cluster analysis across 60 transcripts… or 4,000 support tickets… it sounds nearly impossible, right.

At CustomerIQ we’ve developed a way to turn those hours into seconds.

We do this by leveraging a few technical breakthroughs:

  • Contextual text embeddings
  • Cluster analysis
  • Classification

Let’s dive in, starting with a few quick definitions:

A primer on contextual text embeddings

Contextual text embeddings are representations of textual data in numerical form. They capture the semantic meaning of words, phrases, or documents, where each number corresponds to a specific attribute of the word or sentence. These numerical representations enable algorithms and models, like the family of models we use at CustomerIQ, to process and analyze textual information effectively at huge scale and speed.

If you haven’t already, check out our post about embeddings and how we use them at CustomerIQ.

What is Classification?

Classification involves assigning predefined categories or labels to data points based on their characteristics. In the context of textual data, classification enables organizations to categorize highlights or documents into specific groups for better organization and understanding (like customer pains, customer preferences, and customer needs). By leveraging text embeddings that preserve semantic meaning and relationships, organizations can effectively analyze large volumes of text, extract highlights, perform clustering, classification, and various other tasks that require a deep understanding of textual data.

What is a cluster analysis?

Cluster analysis is a data exploration technique that aims to group similar data points together based on their inherent similarities. When applied to textual data, cluster analysis can help identify themes, patterns, or distinct groups within the data. By utilizing contextual text embeddings, the process becomes more efficient and accurate. Embeddings capture the semantic meaning and relationships of the text, allowing algorithms to recognize similarities between textual elements that may not be apparent using traditional methods. This enables more precise and automatic clustering, saving a tremendous amount of time and effort compared to manual analysis.

How we can use AI classification to organize text into any category

Combining embeddings and classification can help us categorize a large body of data incredibly fast with a high degree of accuracy. Let’s explore a few examples of when we’d use classification first…

To illustrate your value proposition

As a product manager or a product marketer it’s imperative you identify the value your offering brings to your customer. It’s important to understand what problems you solve, the desires of your customer, and what “jobs” they need done. By using AI classification, you could easily sort snippets of customer feedback by those describing: Customer jobs, Customer pains, Customer desires, then drill down and explore the themes within each category.

To triage issues to the right team

Imagine you’re staring down 2,000 new support tickets in Zendesk. I know it’s painful... but just imagine. Where would you even start? We’d recommend using AI classification. You could use classification to sort tickets by: Bug report, feature request, general inquiry, technical issue, or billing issue, then route those tickets to the appropriate team.

To identify the sentiment of a piece of feedback

Probably the most popular use of classification is in sentiment analysis. We mention this last because it’s not a novel concept, but it’s worthy of a mention nonetheless. By classifying feedback as positive, neutral, or negative, you can get a quick idea of what type of feedback you’re dealing with.

Classification can help us quickly organize text into meaningful categories, but what do we do once we have it organized? Now we need to explore themes…

How we use AI cluster analysis to discover patterns

Let’s go back to that first classification example: the value proposition canvas. Let’s say you extracted 1200 highlights from a bunch of customer interviews and you’ve categorized 400 of them into the “Customer pains” category.

That’s still a lot to comb through.

Using cluster analysis, we can automatically group highlights that are describing similar pains together. After we loop through all 400 highlights, we should have boiled the body of work down to a few meaningful clusters that share a common theme.

By doing an AI cluster analysis, we can do this more accurately than we could manually, but in just a few seconds.

Faster and more accurate?

Sounds too good to be true?

It was, until recently.

Now we have advanced models that can turn our text into embeddings that combine the semantic meaning of words and sentences with the computability of numbers and math.

Major advantages of embeddings for cluster analysis and classification

Let’s get back to embeddings. By leveraging text embeddings in clustering and classification, teams can achieve remarkable efficiency gains and unlock valuable insights. 


First, since embeddings capture the semantic meaning and relationships within the textual data, we can enable more nuanced analysis and interpretation. This leads to higher-quality clustering and classification results. Even higher quality than we could traditionally do by hand. 


Secondly, by leveraging the numerical representations of text, the computational processes involved in analyzing large volumes of data become significantly faster. This efficiency gain is particularly valuable when dealing with extensive collections of textual information, allowing organizations to extract insights and make decisions in a timely manner.


Finally, text embeddings facilitate scalability by enabling automated analysis and reducing the reliance on manual efforts. Organizations can process and categorize vast amounts of textual data efficiently, empowering them to explore and understand their data comprehensively.

How we conduct a cluster analysis or classification with embeddings in CustomerIQ

CustomerIQ’s platform contains a few critical features to help you automatically extract highlights, convert highlights into text embeddings, then classify and cluster highlights to discover valuable insights. We do all of this primarily through the use of folders and views.


The process begins by first extracting highlights from longer pieces of text, then converting the insight into numerical representations using embeddings. In CustomerIQ, all of this happens automatically in the background of Folders. Each time you submit a body of feedback to a folder, CustomerIQ extracts highlights and transforms them into embeddings.


Views allow you to filter the highlights stored in folders until you’re looking at only the data you need to do your job. Hence the name, “view.” 

But that’s just the start. Views also harness the power of AI classification and clustering.

By setting a view to “classify,” you can organize every insight in view by any category. All you need to do is create the tags you want to organize by, then click “Classify.” In seconds the view will organize every insight by your categories.

By setting a view to “discover,” you can organize every insight in view by thematic clusters. All you need to do is click “Discover” and the view will perform a cluster analysis in seconds, complete with themes and tags.

Combining views and folders to conduct customer research magic

Considering the speed, accuracy, and scalability of using embeddings and AI cluster analysis or classification, the challenge shifts from “how will we analyze this feedback” to “where can we capture more feedback.”

Consider for a moment how many interactions members of your team have with customers. Support calls, sales calls, customer success calls, survey responses, support tickets, product reviews, tweets, Reddit posts, community forums, focus groups… we could keep going but we won’t.

Using folders, you could store each of these interactions, pulling out relevant highlights and converting them into embeddings.

Using views you could classify those highlights by pain point, preferences, and needs.

Using a new view, you could filter by pain points and run a cluster analysis to discover exactly what pain points customers described across all of those interactions.

And you can do all of this in minutes.

Get started for free today

Synthesize customer feedback 100X faster with AI

Connect integrations, follow our start guide, and have your team up and running in minutes.