# Interest Scores & Classification

### About this export

| Field | Value |
| --- | --- |
| **content_type** | lesson |
| **platform** | contentstack-academy |
| **source_url** | https://www.contentstack.com/academy/courses/data-insights-data-ingestion-profile-construction/data-insights-course-3--interest-scores-classification |
| **course_slug** | data-insights-data-ingestion-profile-construction |
| **lesson_slug** | data-insights-course-3--interest-scores-classification |
| **markdown_file_url** | /academy/md/courses/data-insights-data-ingestion-profile-construction/data-insights-course-3--interest-scores-classification.md |
| **generated_at** | 2026-04-28T06:55:44.158Z |

> Part of **[Data Ingestion & Profile Construction](https://www.contentstack.com/academy/courses/data-insights-data-ingestion-profile-construction)** on Contentstack Academy. **Academy MD v3** — structured for retrieval; no quiz or assessment keys.

<!-- ai_metadata: {"lesson_id":"12","type":"video","duration_seconds":340,"video_url":"https://cdn.jwplayer.com/previews/iRhAPCRQ","thumbnail_url":"https://cdn.jwplayer.com/v2/media/iRhAPCRQ/poster.jpg?width=720","topics":["Interest","Scores","Classification"]} -->

#### Video details

#### At a glance

- **Title:** 20-data-insights-interest-scores-classification
- **Duration:** 5m 40s
- **Media link:** https://cdn.jwplayer.com/previews/iRhAPCRQ
- **Publish date (unix):** 1752879751

#### Streaming renditions

- application/vnd.apple.mpegurl
- audio/mp4 · AAC Audio · 113718 kbps
- video/mp4 · 180p · 180p · 138386 kbps
- video/mp4 · 270p · 270p · 155796 kbps
- video/mp4 · 360p · 360p · 172959 kbps
- video/mp4 · 406p · 406p · 183666 kbps
- video/mp4 · 540p · 540p · 221294 kbps
- video/mp4 · 720p · 720p · 279070 kbps

#### Timed text tracks (delivery)

- **thumbnails:** `https://cdn.jwplayer.com/strips/iRhAPCRQ-120.vtt`

#### Transcript

But we covered our behavioral scores as well, right? The momentum and propensity, those are really, really important behavioral statistics, behavioral scores that help fuel and empower these models. The other one that is maybe the most important thing, certainly one of the most important things in the kind of Linux coming together with content stack story is our interest scores. So you saw them, I think, in the very first conversation where we go back to Petsy, turn on our trustee Chrome extension. So at the bottom of the Chrome extension, you'll see a set of interest scores. These, for most customers, just come out of the box. So we'll talk about how they actually work, what we're doing to classify the content associated, but ultimately what it allows you to do as a customer is as I browse, you'll see in real time and it might be kind of like subtle because I've used this profile a lot, but you'll see my scores change with every single piece of content that I interact with. So as I go here and I look at pet carriers, obviously, and this is like a sandbox demo account, but you'll see those scores, like again, it's a little bit kind of like hard to see in the bigger scale, but they're being recalculated every single time that an event comes into the pipeline, which is super, super useful in one, helping you understand anonymous users, which is most users in the case of marketing. That's where a lot of other CDPs fail and fall short as they don't talk about the importance of anonymous and how that particular product helps with the anonymous use cases. It's always around. We help you get your data together and build known profiles and it will, at the end of the day, it's pretty easy to mark two people that, you know, if you have their email address and everything about them, it's much more difficult when you only have the information that they give you in sort of like different browser sessions, essentially. So kind of to just start at how all of this works, these values on the profile are called interest scores. So if I go to the raw details for this particular user and find. So the thing that's actually happening is for every single one of the topics, which we'll talk about topics here in a second, there is a score of how much or how little I am interested. This works the exact same way that all of our other scores do. So not only do you understand what I'm interested in, you can target the people that have an above average interest in a specific topic on your website. So that granularity is super important. But what you see in the bar is essentially this data getting digitized. Today how this works is every time that a user visits your site that's been tagged, we get a URL. Our system actually goes out, scrapes that URL and runs it through a series of different systems to do NLP, image analysis, some of those kind of things to ultimately uncover the topics that are associated with a particular document. So for instance, if I close this, give us some more screen real estate. If I go into our Petsy sandbox account real quick and just go to like documents with images, what it's actually doing automatically. So this is with no configuration from the user other than saying, you're okay to classify this domain. We're going to go out, we're going to look at the content, we're going to pull that content in, we're going to understand what topics it's about, what images are on it, all of this information. So that every time that a user visits that particular URL, we know what the content is about. And then we can start to build scores for how much the topics on that particular document align with all of the other interests that we've seen from that particular user. So it's essentially, we go out, we scrape the content, we turn it into topics. And then every time that the user then sees that content, we understand what it's about, we can associate that with their scores and we can update them in real time. So we're actually going through what we call classifying the documents. So if I go into, for instance, this like Elegant Paws Cat Carrier, again, make-believe content. And this is the thing that I wanted to touch on. One of the things that is really cool, and my biased opinion about Linux is that the same exact identity resolution model, the way that we handle profiles and that graph and all of that kind of stuff, works in the exact same way for content. So we essentially out of the box build a user table, and that's how we're able to associate all the different profiles with mark together. We also have a content table. So for every document, we go out, we analyze that document, we collect the information about that document, and we essentially build document profiles inside of Linux. So on the screen, they're not as pretty, because we haven't spent as much time sort of like showcasing this particular thing. But you can see that like for a piece of content, this URL that we just clicked on, you understand the hashed value, which is how we associate it with the user, the number of users that have seen it, the different information, the entire body of that document, the topics, the header image, the primary image, we can pull in meta tags, whether it's fail, like all of this meta information around not just the content, but this sort of document that ultimately users interact with. Thank you. Thank you.

#### Subtitles (WebVTT)

```webvtt
WEBVTT

1
00:00:00.000 --> 00:00:18.400
But we covered our behavioral scores as well, right?

2
00:00:18.400 --> 00:00:22.560
The momentum and propensity, those are really, really important behavioral statistics, behavioral

3
00:00:22.560 --> 00:00:25.840
scores that help fuel and empower these models.

4
00:00:25.840 --> 00:00:29.800
The other one that is maybe the most important thing, certainly one of the most important

5
00:00:29.800 --> 00:00:35.720
things in the kind of Linux coming together with content stack story is our interest scores.

6
00:00:35.720 --> 00:00:45.440
So you saw them, I think, in the very first conversation where we go back to Petsy, turn

7
00:00:45.440 --> 00:00:53.920
on our trustee Chrome extension.

8
00:00:53.920 --> 00:00:59.560
So at the bottom of the Chrome extension, you'll see a set of interest scores.

9
00:00:59.560 --> 00:01:02.320
These, for most customers, just come out of the box.

10
00:01:02.320 --> 00:01:05.920
So we'll talk about how they actually work, what we're doing to classify the content associated,

11
00:01:05.920 --> 00:01:11.320
but ultimately what it allows you to do as a customer is as I browse, you'll see in real

12
00:01:11.320 --> 00:01:14.400
time and it might be kind of like subtle because I've used this profile a lot, but you'll see

13
00:01:14.400 --> 00:01:18.760
my scores change with every single piece of content that I interact with.

14
00:01:18.760 --> 00:01:23.920
So as I go here and I look at pet carriers, obviously, and this is like a sandbox demo

15
00:01:23.920 --> 00:01:27.440
account, but you'll see those scores, like again, it's a little bit kind of like hard

16
00:01:27.440 --> 00:01:32.320
to see in the bigger scale, but they're being recalculated every single time that an event

17
00:01:32.320 --> 00:01:38.640
comes into the pipeline, which is super, super useful in one, helping you understand anonymous

18
00:01:38.640 --> 00:01:42.000
users, which is most users in the case of marketing.

19
00:01:42.000 --> 00:01:45.880
That's where a lot of other CDPs fail and fall short as they don't talk about the importance

20
00:01:45.880 --> 00:01:50.760
of anonymous and how that particular product helps with the anonymous use cases.

21
00:01:50.760 --> 00:01:51.760
It's always around.

22
00:01:51.760 --> 00:01:55.280
We help you get your data together and build known profiles and it will, at the end of

23
00:01:55.280 --> 00:01:58.920
the day, it's pretty easy to mark two people that, you know, if you have their email address

24
00:01:58.920 --> 00:02:03.760
and everything about them, it's much more difficult when you only have the information

25
00:02:03.760 --> 00:02:10.400
that they give you in sort of like different browser sessions, essentially.

26
00:02:10.400 --> 00:02:17.280
So kind of to just start at how all of this works, these values on the profile are called

27
00:02:17.280 --> 00:02:18.960
interest scores.

28
00:02:18.960 --> 00:02:26.480
So if I go to the raw details for this particular user and find.

29
00:02:26.480 --> 00:02:30.520
So the thing that's actually happening is for every single one of the topics, which

30
00:02:30.520 --> 00:02:35.240
we'll talk about topics here in a second, there is a score of how much or how little

31
00:02:35.240 --> 00:02:36.480
I am interested.

32
00:02:36.480 --> 00:02:39.160
This works the exact same way that all of our other scores do.

33
00:02:39.160 --> 00:02:42.560
So not only do you understand what I'm interested in, you can target the people that have an

34
00:02:42.560 --> 00:02:45.960
above average interest in a specific topic on your website.

35
00:02:45.960 --> 00:02:48.000
So that granularity is super important.

36
00:02:48.000 --> 00:02:52.400
But what you see in the bar is essentially this data getting digitized.

37
00:02:52.400 --> 00:02:58.200
Today how this works is every time that a user visits your site that's been tagged,

38
00:02:58.200 --> 00:02:59.680
we get a URL.

39
00:02:59.680 --> 00:03:04.860
Our system actually goes out, scrapes that URL and runs it through a series of different

40
00:03:04.860 --> 00:03:11.760
systems to do NLP, image analysis, some of those kind of things to ultimately uncover

41
00:03:11.760 --> 00:03:15.300
the topics that are associated with a particular document.

42
00:03:15.300 --> 00:03:22.540
So for instance, if I close this, give us some more screen real estate.

43
00:03:22.540 --> 00:03:28.500
If I go into our Petsy sandbox account real quick and just go to like documents with images,

44
00:03:28.500 --> 00:03:30.700
what it's actually doing automatically.

45
00:03:30.700 --> 00:03:34.900
So this is with no configuration from the user other than saying, you're okay to classify

46
00:03:34.900 --> 00:03:36.480
this domain.

47
00:03:36.480 --> 00:03:39.540
We're going to go out, we're going to look at the content, we're going to pull that content

48
00:03:39.540 --> 00:03:43.780
in, we're going to understand what topics it's about, what images are on it, all of

49
00:03:43.780 --> 00:03:45.860
this information.

50
00:03:45.860 --> 00:03:51.100
So that every time that a user visits that particular URL, we know what the content is

51
00:03:51.100 --> 00:03:52.100
about.

52
00:03:52.100 --> 00:03:57.620
And then we can start to build scores for how much the topics on that particular document

53
00:03:57.620 --> 00:04:01.140
align with all of the other interests that we've seen from that particular user.

54
00:04:01.140 --> 00:04:05.780
So it's essentially, we go out, we scrape the content, we turn it into topics.

55
00:04:05.780 --> 00:04:09.940
And then every time that the user then sees that content, we understand what it's about,

56
00:04:09.940 --> 00:04:13.060
we can associate that with their scores and we can update them in real time.

57
00:04:13.060 --> 00:04:15.980
So we're actually going through what we call classifying the documents.

58
00:04:15.980 --> 00:04:22.100
So if I go into, for instance, this like Elegant Paws Cat Carrier, again, make-believe content.

59
00:04:22.100 --> 00:04:25.100
And this is the thing that I wanted to touch on.

60
00:04:25.100 --> 00:04:29.100
One of the things that is really cool, and my biased opinion about Linux is that the

61
00:04:29.100 --> 00:04:34.540
same exact identity resolution model, the way that we handle profiles and that graph

62
00:04:34.540 --> 00:04:38.820
and all of that kind of stuff, works in the exact same way for content.

63
00:04:38.820 --> 00:04:42.540
So we essentially out of the box build a user table, and that's how we're able to associate

64
00:04:42.540 --> 00:04:44.940
all the different profiles with mark together.

65
00:04:44.940 --> 00:04:46.660
We also have a content table.

66
00:04:46.660 --> 00:04:51.980
So for every document, we go out, we analyze that document, we collect the information

67
00:04:51.980 --> 00:04:57.760
about that document, and we essentially build document profiles inside of Linux.

68
00:04:57.760 --> 00:05:01.340
So on the screen, they're not as pretty, because we haven't spent as much time sort of like

69
00:05:01.340 --> 00:05:04.040
showcasing this particular thing.

70
00:05:04.040 --> 00:05:09.940
But you can see that like for a piece of content, this URL that we just clicked on, you understand

71
00:05:09.940 --> 00:05:14.420
the hashed value, which is how we associate it with the user, the number of users that

72
00:05:14.420 --> 00:05:20.220
have seen it, the different information, the entire body of that document, the topics,

73
00:05:20.220 --> 00:05:24.580
the header image, the primary image, we can pull in meta tags, whether it's fail, like

74
00:05:24.580 --> 00:05:30.260
all of this meta information around not just the content, but this sort of document that

75
00:05:30.260 --> 00:05:31.820
ultimately users interact with.

76
00:05:39.940 --> 00:05:40.940
Thank you.

77
00:05:40.940 --> 00:05:41.940
Thank you.

```

```transcript
<!-- PLACEHOLDER: replace with real transcript before publish if cues were auto-derived from WebVTT -->
[00:00] But we covered our behavioral scores as well, right?
[00:18] The momentum and propensity, those are really, really important behavioral statistics, behavioral
[00:22] scores that help fuel and empower these models.
[00:25] The other one that is maybe the most important thing, certainly one of the most important
[00:29] things in the kind of Linux coming together with content stack story is our interest scores.
[00:35] So you saw them, I think, in the very first conversation where we go back to Petsy, turn
[00:45] on our trustee Chrome extension.
[00:53] So at the bottom of the Chrome extension, you'll see a set of interest scores.
[00:59] These, for most customers, just come out of the box.
[01:02] So we'll talk about how they actually work, what we're doing to classify the content associated,
[01:05] but ultimately what it allows you to do as a customer is as I browse, you'll see in real
[01:11] time and it might be kind of like subtle because I've used this profile a lot, but you'll see
[01:14] my scores change with every single piece of content that I interact with.
[01:18] So as I go here and I look at pet carriers, obviously, and this is like a sandbox demo
[01:23] account, but you'll see those scores, like again, it's a little bit kind of like hard
[01:27] to see in the bigger scale, but they're being recalculated every single time that an event
[01:32] comes into the pipeline, which is super, super useful in one, helping you understand anonymous
[01:38] users, which is most users in the case of marketing.
[01:42] That's where a lot of other CDPs fail and fall short as they don't talk about the importance
[01:45] of anonymous and how that particular product helps with the anonymous use cases.
[01:50] It's always around.
[01:51] We help you get your data together and build known profiles and it will, at the end of
[01:55] the day, it's pretty easy to mark two people that, you know, if you have their email address
[01:58] and everything about them, it's much more difficult when you only have the information
[02:03] that they give you in sort of like different browser sessions, essentially.
[02:10] So kind of to just start at how all of this works, these values on the profile are called
[02:17] interest scores.
[02:18] So if I go to the raw details for this particular user and find.
[02:26] So the thing that's actually happening is for every single one of the topics, which
[02:30] we'll talk about topics here in a second, there is a score of how much or how little
[02:35] I am interested.
[02:36] This works the exact same way that all of our other scores do.
[02:39] So not only do you understand what I'm interested in, you can target the people that have an
[02:42] above average interest in a specific topic on your website.
[02:45] So that granularity is super important.
[02:48] But what you see in the bar is essentially this data getting digitized.
[02:52] Today how this works is every time that a user visits your site that's been tagged,
[02:58] we get a URL.
[02:59] Our system actually goes out, scrapes that URL and runs it through a series of different
[03:04] systems to do NLP, image analysis, some of those kind of things to ultimately uncover
[03:11] the topics that are associated with a particular document.
[03:15] So for instance, if I close this, give us some more screen real estate.
[03:22] If I go into our Petsy sandbox account real quick and just go to like documents with images,
[03:28] what it's actually doing automatically.
[03:30] So this is with no configuration from the user other than saying, you're okay to classify
[03:34] this domain.
[03:36] We're going to go out, we're going to look at the content, we're going to pull that content
[03:39] in, we're going to understand what topics it's about, what images are on it, all of
[03:43] this information.
[03:45] So that every time that a user visits that particular URL, we know what the content is
[03:51] about.
[03:52] And then we can start to build scores for how much the topics on that particular document
[03:57] align with all of the other interests that we've seen from that particular user.
[04:01] So it's essentially, we go out, we scrape the content, we turn it into topics.
[04:05] And then every time that the user then sees that content, we understand what it's about,
[04:09] we can associate that with their scores and we can update them in real time.
[04:13] So we're actually going through what we call classifying the documents.
[04:15] So if I go into, for instance, this like Elegant Paws Cat Carrier, again, make-believe content.
[04:22] And this is the thing that I wanted to touch on.
[04:25] One of the things that is really cool, and my biased opinion about Linux is that the
```

#### Key takeaways

- Connect **Interest Scores & Classification** back to your stack configuration before moving to the next module.
- Capture one concrete artifact (screenshot, Postman call, or code snippet) that proves the step works in your environment.
- Re-read the delivery versus management boundary for anything you changed in the entry model.

## Supplement for indexing

### Content summary

Interest Scores & Classification. Interest Scores & Classification in Data Ingestion & Profile Construction (data-insights-data-ingestion-profile-construction).

### Retrieval tags

- Interest
- Scores
- Classification
- data-insights-data-ingestion-profile-construction
- lesson 12
- Interest Scores & Classification
- data-insights-data-ingestion-profile-construction lesson

### Indexing notes

Index this lesson as a primary chunk tagged with lesson_id "12" and topics: [Interest, Scores, Classification].
Parent course slug: data-insights-data-ingestion-profile-construction. Use asset_references URLs as thumbnail hints in search results when present.
Never surface LMS quiz content or assessment answers from this file.

### Asset references

| Label | URL |
| --- | --- |
| Video thumbnail: Interest Scores & Classification | `https://cdn.jwplayer.com/v2/media/iRhAPCRQ/poster.jpg?width=720` |

### External links

| Label | URL |
| --- | --- |
| Contentstack Academy home | `https://www.contentstack.com/academy/` |
| Training instance setup | `https://www.contentstack.com/academy/training-instance` |
| Academy playground (GitHub) | `https://github.com/contentstack/contentstack-academy-playground` |
| Contentstack documentation | `https://www.contentstack.com/docs/` |
