# Content Classification

### About this export

| Field | Value |
| --- | --- |
| **content_type** | lesson |
| **platform** | contentstack-academy |
| **source_url** | https://www.contentstack.com/academy/courses/lytics-essentials/content-classification |
| **course_slug** | lytics-essentials |
| **lesson_slug** | content-classification |
| **markdown_file_url** | /academy/md/courses/lytics-essentials/content-classification.md |
| **generated_at** | 2026-04-28T06:55:47.920Z |

> Part of **[Lytics Essentials](https://www.contentstack.com/academy/courses/lytics-essentials)** on Contentstack Academy. **Academy MD v3** — structured for retrieval; no quiz or assessment keys.

<!-- ai_metadata: {"lesson_id":"12","type":"video","duration_seconds":172,"video_url":"https://cdn.jwplayer.com/previews/ptmRlHzG","thumbnail_url":"https://cdn.jwplayer.com/v2/media/ptmRlHzG/poster.jpg?width=720","topics":["Content","Classification"]} -->

#### Video details

#### At a glance

- **Title:** Content Classification
- **Duration:** 2m 52s
- **Media link:** https://cdn.jwplayer.com/previews/ptmRlHzG
- **Publish date (unix):** 1751521283

#### Streaming renditions

- application/vnd.apple.mpegurl
- audio/mp4 · AAC Audio · 113848 kbps
- video/mp4 · 180p · 180p · 138661 kbps
- video/mp4 · 270p · 270p · 150735 kbps
- video/mp4 · 360p · 360p · 163904 kbps
- video/mp4 · 406p · 406p · 174657 kbps
- video/mp4 · 540p · 540p · 198807 kbps
- video/mp4 · 720p · 720p · 235869 kbps

#### Timed text tracks (delivery)

- **thumbnails:** `https://cdn.jwplayer.com/strips/ptmRlHzG-120.vtt`

#### Transcript

Classification is likely the first of several content-related features you'll be exposed to while working with Linux. In order to understand behavior, we must first understand the content that is being interacted with. Classification does just that. Linux proactively scrapes content that your users are consuming and then leverages a mix of natural language processing and image analysis tools to break down each document, or URL, into a set of topics that reflect the actual consumable content. These topics are then used to create individual affinity scores on a user, make recommendations, etc. The actual application of affinities and recommendations will be covered in another module, so let's focus on ensuring classification is healthy and setting us up for success. Upon visiting the Classification section, you'll be asked which domains we are allowed to process content from. This is very important as a first step to ensure unrelated content and topics do not make their way into your corpus. Simply click the Edit button and add the domains relevant to your brand. Adding or removing a domain is as simple as clicking Content Settings and then adjusting the settings here by either adding one or removing. As we go back to the Classification Overview, at the top you'll get the Classification Activity, which is simply a breakdown of how much content we're classifying and on what interval. Initially, you'll probably see large spikes as we classify all of your content, and then as things progress you'll see fewer and fewer classifications as we only update content that changes or is stale. Moving to Document Health, we help you understand which content can't be classified. Here you'll see things such as 404 errors or 500s. Finally, at the bottom you can either reclassify or manually classify content. This is handy in the case that you fixed, say, a 404 error above, or perhaps you've just added one new document and you want to classify it immediately. Likewise, it's a handy tool to go and just see how a document itself is being classified or might be classified as a test. Simply put in a URL, hit Get Details, and we'll run it through the same natural language processes and image analysis that we would on a full classification. You'll see what's returned is an overview of the data that we would collect, the title, the descriptions, URLs, topics, etc. Here, if you wanted to add this to your corpus, you could just simply hit Complete Classification or put in a different URL to try the classification again. Be sure to visit learn.linux. Here you'll find the various ways that we work with existing taxonomies, have CMSs with built-in integrations, and so on.

#### Subtitles (WebVTT)

```webvtt
WEBVTT

1
00:00:00.000 --> 00:00:08.000
Classification is likely the first of several content-related features you'll be exposed to while working with Linux.

2
00:00:08.000 --> 00:00:13.000
In order to understand behavior, we must first understand the content that is being interacted with.

3
00:00:13.000 --> 00:00:16.000
Classification does just that.

4
00:00:16.000 --> 00:00:22.000
Linux proactively scrapes content that your users are consuming and then leverages a mix of natural language processing

5
00:00:22.000 --> 00:00:26.000
and image analysis tools to break down each document, or URL,

6
00:00:26.000 --> 00:00:30.000
into a set of topics that reflect the actual consumable content.

7
00:00:30.000 --> 00:00:38.000
These topics are then used to create individual affinity scores on a user, make recommendations, etc.

8
00:00:38.000 --> 00:00:42.000
The actual application of affinities and recommendations will be covered in another module,

9
00:00:42.000 --> 00:00:49.000
so let's focus on ensuring classification is healthy and setting us up for success.

10
00:00:49.000 --> 00:00:56.000
Upon visiting the Classification section, you'll be asked which domains we are allowed to process content from.

11
00:00:56.000 --> 00:01:02.000
This is very important as a first step to ensure unrelated content and topics do not make their way into your corpus.

12
00:01:02.000 --> 00:01:07.000
Simply click the Edit button and add the domains relevant to your brand.

13
00:01:07.000 --> 00:01:11.000
Adding or removing a domain is as simple as clicking Content Settings

14
00:01:11.000 --> 00:01:23.000
and then adjusting the settings here by either adding one or removing.

15
00:01:23.000 --> 00:01:29.000
As we go back to the Classification Overview, at the top you'll get the Classification Activity,

16
00:01:29.000 --> 00:01:33.000
which is simply a breakdown of how much content we're classifying and on what interval.

17
00:01:33.000 --> 00:01:37.000
Initially, you'll probably see large spikes as we classify all of your content,

18
00:01:37.000 --> 00:01:45.000
and then as things progress you'll see fewer and fewer classifications as we only update content that changes or is stale.

19
00:01:45.000 --> 00:01:49.000
Moving to Document Health, we help you understand which content can't be classified.

20
00:01:49.000 --> 00:01:53.000
Here you'll see things such as 404 errors or 500s.

21
00:01:53.000 --> 00:01:58.000
Finally, at the bottom you can either reclassify or manually classify content.

22
00:01:58.000 --> 00:02:01.000
This is handy in the case that you fixed, say, a 404 error above,

23
00:02:01.000 --> 00:02:05.000
or perhaps you've just added one new document and you want to classify it immediately.

24
00:02:05.000 --> 00:02:11.000
Likewise, it's a handy tool to go and just see how a document itself is being classified or might be classified as a test.

25
00:02:11.000 --> 00:02:16.000
Simply put in a URL, hit Get Details,

26
00:02:16.000 --> 00:02:22.000
and we'll run it through the same natural language processes and image analysis that we would on a full classification.

27
00:02:22.000 --> 00:02:25.000
You'll see what's returned is an overview of the data that we would collect,

28
00:02:25.000 --> 00:02:33.000
the title, the descriptions, URLs, topics, etc.

29
00:02:33.000 --> 00:02:41.000
Here, if you wanted to add this to your corpus, you could just simply hit Complete Classification or put in a different URL to try the classification again.

30
00:02:41.000 --> 00:02:43.000
Be sure to visit learn.linux.

31
00:02:43.000 --> 00:02:52.000
Here you'll find the various ways that we work with existing taxonomies, have CMSs with built-in integrations, and so on.

```

```transcript
<!-- PLACEHOLDER: replace with real transcript before publish if cues were auto-derived from WebVTT -->
[00:00] Classification is likely the first of several content-related features you'll be exposed to while working with Linux.
[00:08] In order to understand behavior, we must first understand the content that is being interacted with.
[00:13] Classification does just that.
[00:16] Linux proactively scrapes content that your users are consuming and then leverages a mix of natural language processing
[00:22] and image analysis tools to break down each document, or URL,
[00:26] into a set of topics that reflect the actual consumable content.
[00:30] These topics are then used to create individual affinity scores on a user, make recommendations, etc.
[00:38] The actual application of affinities and recommendations will be covered in another module,
[00:42] so let's focus on ensuring classification is healthy and setting us up for success.
[00:49] Upon visiting the Classification section, you'll be asked which domains we are allowed to process content from.
[00:56] This is very important as a first step to ensure unrelated content and topics do not make their way into your corpus.
[01:02] Simply click the Edit button and add the domains relevant to your brand.
[01:07] Adding or removing a domain is as simple as clicking Content Settings
[01:11] and then adjusting the settings here by either adding one or removing.
[01:23] As we go back to the Classification Overview, at the top you'll get the Classification Activity,
[01:29] which is simply a breakdown of how much content we're classifying and on what interval.
[01:33] Initially, you'll probably see large spikes as we classify all of your content,
[01:37] and then as things progress you'll see fewer and fewer classifications as we only update content that changes or is stale.
[01:45] Moving to Document Health, we help you understand which content can't be classified.
[01:49] Here you'll see things such as 404 errors or 500s.
[01:53] Finally, at the bottom you can either reclassify or manually classify content.
[01:58] This is handy in the case that you fixed, say, a 404 error above,
[02:01] or perhaps you've just added one new document and you want to classify it immediately.
[02:05] Likewise, it's a handy tool to go and just see how a document itself is being classified or might be classified as a test.
[02:11] Simply put in a URL, hit Get Details,
[02:16] and we'll run it through the same natural language processes and image analysis that we would on a full classification.
[02:22] You'll see what's returned is an overview of the data that we would collect,
[02:25] the title, the descriptions, URLs, topics, etc.
[02:33] Here, if you wanted to add this to your corpus, you could just simply hit Complete Classification or put in a different URL to try the classification again.
[02:41] Be sure to visit learn.linux.
[02:43] Here you'll find the various ways that we work with existing taxonomies, have CMSs with built-in integrations, and so on.
```

#### Lesson text

Learn how Content Classification enables Lytics to understand what your customers finds most interesting and relevant.

## Content Classification

### Overview

**Note:** On January 10, 2023, we upgraded our UI with a new, refreshed interface. All of the underlying functionality is the same, but you will notice that things look a little different from this Academy guide. The most notable change is that the navigation menu has moved from the top of the app to the left side. We appreciate your patience as we work on updating our Academy.

## What will I learn?

*   What is Content Classification?
*   Why is it important?
*   How can I use the Classification Dashboard?

In this guide, we'll introduce you to our out-of-the-box content classification service. Understanding how your brand's content is being consumed enables Lytics to understand what your audience finds most interesting and relevant.

### Why is classification important?

#### What is "content hygiene" anyway?

For a moment, let's think of your brand's content like a closet.

All of us have clothes, perhaps far more than we actually need. Besides clothes, we often store shoes, purses or bags, and a smattering of other items in our closets. Many of us find it difficult to keep our closets organized. As long as we look decent when we walk out the door, no one has to know that our closet is a dumping ground.

Likewise, for many companies, their content coming from a CMS or other sources is often a mess. But as long as the public-facing assets look good (website, blog, etc.), no one has to know how organized (or not) their content system is.

Why does this matter? Clients often come to Lytics with their content in a disorganized state. Lytics ingests and classifies all their content. Down the line when the customer is ready to execute a recommendation use case, they are frustrated and confused why their content collections include a bunch of broken links, random topics unrelated to their brand, and the content they care about is missing. How does this happen? Garbage in, garbage out.

In order to execute powerful content recommendations, you must start by making sure your account has proper **content hygiene**, which entails:

*   Accurate metadata such as title, description, images, etc.
*   Healthy HTTP status codes - is the content accessible?
*   Proper domain and path settings to ensure the right content is classified
*   And much more...

The good news of this story? In Lytics, you have a **dashboard** dedicated to help **understand how Lytics is classifying your content** and what steps you can take to **improve content hygiene**.

### Core Concepts

Before we check out the classification dashboard in Lytics, let's review a few key terms.

*   **Classification -** the task of assigning one or more categories or topics to a document. This is also referred to as “Enrichment” in some contexts.
*   **Documents -** a document is a single piece of content, usually corresponding to a URL on the customer’s website, an email, ad, or other content types.
*   **Natural Language Processing (NLP) -** A programmatic approach to analyzing large amounts of language data.
*   **Keywords** - Extracted verbatim from text, a less sophisticated approach to topic analysis.
*   **Topics -** Topics are extracted using NLP and are able to make inferences about themes or core subject matter of the content. See [how Topics are different than Keywords](https://learn.lytics.com/documentation/product/features/content-affinity-engine/topic-extraction#how-topics-are-different-than-keywords).

In other guides, we'll cover Lytics Affinity Engine, Content Collections, and Recommendations in more detail.

#### Match the term to its definition.

Topics

Extracted using NLP. Able to make inferences

Keywords

Extracted verbatim from text

Classification

A programmatic approach to analyzing large amounts of language data

Natural Language Processing (NLP)

Task of assigning one or more categories or topics to a document

## Using the Classification Dashboard

### Classification Dashboard

The Content Classification dashboard provides visibility into how your content is being scraped, indexed, and classified by the Lytics Affinity Engine. You can find the **Classification** section under the **Content** tab in the UI.

![content-classification.png](https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt87da047c556c765c/68662bed65a2190e92d19c9f/content-classification.png)

We'll walk you through each of the modules:

#### Domain and Path Settings

These settings are essential to make sure Lytics classifies the right content that will be used in your marketing initiatives. Note, only **account admins** can adjust the Content Settings to add or remove approved domains and ignored paths. Admins can also make sure Lytics observes your **robots.txt directives** for content enrichment.

#### Classification Activity

Shows the number of documents that have been classified for your account in the last week. You can also adjust the chart to see the distribution throughout history to help you understand if and when you are hitting the **default monthly quota of 20,000 URLs** for your account.

#### Document Health

Surfaces a list of content that Lytics is unable to classify due to its “unhealthy” state. Lytics defines document health based on HTTP status codes. 

*   Status codes 200-399 are considered “healthy”
*   Status codes 400+ are considered “unhealthy”

**Note:** See the [documentation](https://learn.lytics.com/documentation/product/features/content-affinity-engine/content-classification) for more details and screenshots.

**What is the default monthly classification quota?**

A. 10,000 URLs

B. 20,000 URLs

C. 50,0000 URLs

**Which account setting(s) should your Lytics admin set to ensure Lytics classifies the right content from your website? Select all that apply.**

A. Allowlist and Blocklist for Domains & Paths

B. Robots.txt directives

C. Custom properties

D. Schema promoted fields

### Manual Classification

The Manual Classification section allows you to preview how a single document will be classified by Lytics. You can use this to resolve any issues with how your page is set up before it’s added to the Lytics content corpus. 

Once a piece of content has been added to the corpus, its topics then become available for use in personalization such as recommendations or content affinity. 

See the GIF below for a glimpse of how the Manual Classification tool works.

![manual\_content\_classification\_preview.gif](https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt4731eea2b5842e90/68662c4df44b17642078c3d3/manual_content_classification_preview.gif)

**Until a document has been classified, its topics are not available for use in personalization such as recommendations or affinity-based audiences.**

A. True

B. False

### Find a Document

Just like you're able to "Find a User" within the Audiences section, you can **Find a Document** within the Content section of the Lytics UI.

You may need to search for a specific document that has recently been classified to verify the description, topics, or other metadata. Typically, you'll use this when trying to debug an issue or ensure that a Content Collection is ready to be used for a recommendation use case.

![content-classification.png](https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt87da047c556c765c/68662bed65a2190e92d19c9f/content-classification.png)If you make adjustments to any of your documents, such as updating a blog post or refreshing a product landing page, you can request Lytics to manually re-classify the document. This will ensure the Lytics content corpus has the most up-to-date information to serve in any of your content or product recommendations.

**You can request Lytics to manually re-classify a document via the UI.**

A. True

B. False

## Next Steps

### More Resources

*   [Content Classification documentation](https://learn.lytics.com/documentation/product/features/content-affinity-engine/content-classification)
*   [Content Affinity Engine documentation](https://learn.lytics.com/documentation/product/features/content-affinity-engine/content-affinity-engine-introduction)

### Use Cases:

*   [Deliver Targeted Content](https://learn.lytics.com/use-cases/deliver-targeted-content)
*   [Keep Visitors Engaged with Content Recommendation Experiences](https://learn.lytics.com/use-cases/keep-visitors-engaged-with-content-recommendation-experiences)
*   [Promote Relevant Content to Users based on their Interests](https://learn.lytics.com/use-cases/promote-relevant-content-to-users-based-on-their-interests)

#### Key takeaways

- Connect **Content Classification** back to your stack configuration before moving to the next module.
- Capture one concrete artifact (screenshot, Postman call, or code snippet) that proves the step works in your environment.
- Re-read the delivery versus management boundary for anything you changed in the entry model.

## Supplement for indexing

### Content summary

Content Classification. Learn how Content Classification enables Lytics to understand what your customers finds most interesting and relevant. Content Classification Overview Note: On January 10, 2023, we upgraded our UI with a new, refreshed interface. All of the underlying functionality is the same, but you will notice that things look a little different from this Academy guide. The most notable change is that the navigation menu has moved from the top of the app to the left side. We appreciate your patience as we work on updating our Academy. What will I learn? What is Content Classification? Why is it important? How can I use the Classification Dashboard? In this guide, we'll introduce you to our out-of-the-box

### Retrieval tags

- Content
- Classification
- lytics-essentials
- lesson 12
- Content Classification
- lytics-essentials lesson

### Indexing notes

Index this lesson as a primary chunk tagged with lesson_id "12" and topics: [Content, Classification].
Parent course slug: lytics-essentials. Use asset_references URLs as thumbnail hints in search results when present.
Never surface LMS quiz content or assessment answers from this file.

### Asset references

| Label | URL |
| --- | --- |
| Video thumbnail: Content Classification | `https://cdn.jwplayer.com/v2/media/ptmRlHzG/poster.jpg?width=720` |
| content-classification.png | `https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt87da047c556c765c/68662bed65a2190e92d19c9f/content-classification.png` |
| manual\_content\_classification\_preview.gif | `https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt4731eea2b5842e90/68662c4df44b17642078c3d3/manual_content_classification_preview.gif` |
| content-classification.png | `https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt87da047c556c765c/68662bed65a2190e92d19c9f/content-classification.png` |

### External links

| Label | URL |
| --- | --- |
| Contentstack Academy home | `https://www.contentstack.com/academy/` |
| Training instance setup | `https://www.contentstack.com/academy/training-instance` |
| Academy playground (GitHub) | `https://github.com/contentstack/contentstack-academy-playground` |
| Contentstack documentation | `https://www.contentstack.com/docs/` |
| how Topics are different than Keywords | `https://learn.lytics.com/documentation/product/features/content-affinity-engine/topic-extraction#how-topics-are-different-than-keywords` |
| content-classification.png | `https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt87da047c556c765c/68662bed65a2190e92d19c9f/content-classification.png` |
| documentation | `https://learn.lytics.com/documentation/product/features/content-affinity-engine/content-classification` |
| manual\_content\_classification\_preview.gif | `https://images.contentstack.io/v3/assets/bltebc53cfaf0dd6403/blt4731eea2b5842e90/68662c4df44b17642078c3d3/manual_content_classification_preview.gif` |
| Content Affinity Engine documentation | `https://learn.lytics.com/documentation/product/features/content-affinity-engine/content-affinity-engine-introduction` |
| Deliver Targeted Content | `https://learn.lytics.com/use-cases/deliver-targeted-content` |
| Keep Visitors Engaged with Content Recommendation Experiences | `https://learn.lytics.com/use-cases/keep-visitors-engaged-with-content-recommendation-experiences` |
| Promote Relevant Content to Users based on their Interests | `https://learn.lytics.com/use-cases/promote-relevant-content-to-users-based-on-their-interests` |