# Building Lookalike Models

### About this export

| Field | Value |
| --- | --- |
| **content_type** | lesson |
| **platform** | contentstack-academy |
| **source_url** | https://www.contentstack.com/academy/courses/data-insights-data-ingestion-profile-construction/data-insights-course-3--building-lookalike-models |
| **course_slug** | data-insights-data-ingestion-profile-construction |
| **lesson_slug** | data-insights-course-3--building-lookalike-models |
| **markdown_file_url** | /academy/md/courses/data-insights-data-ingestion-profile-construction/data-insights-course-3--building-lookalike-models.md |
| **generated_at** | 2026-04-28T06:55:44.157Z |

> Part of **[Data Ingestion & Profile Construction](https://www.contentstack.com/academy/courses/data-insights-data-ingestion-profile-construction)** on Contentstack Academy. **Academy MD v3** — structured for retrieval; no quiz or assessment keys.

<!-- ai_metadata: {"lesson_id":"11","type":"video","duration_seconds":224,"video_url":"https://cdn.jwplayer.com/previews/qZ02gNuc","thumbnail_url":"https://cdn.jwplayer.com/v2/media/qZ02gNuc/poster.jpg?width=720","topics":["Building","Lookalike","Models"]} -->

#### Video details

#### At a glance

- **Title:** 19-data-insights-lookalike-models
- **Duration:** 3m 44s
- **Media link:** https://cdn.jwplayer.com/previews/qZ02gNuc
- **Publish date (unix):** 1752878955

#### Streaming renditions

- application/vnd.apple.mpegurl
- audio/mp4 · AAC Audio · 113880 kbps
- video/mp4 · 180p · 180p · 132922 kbps
- video/mp4 · 270p · 270p · 144948 kbps
- video/mp4 · 360p · 360p · 153786 kbps
- video/mp4 · 406p · 406p · 159929 kbps
- video/mp4 · 540p · 540p · 180737 kbps
- video/mp4 · 720p · 720p · 211742 kbps
- video/mp4 · 1080p · 1080p · 295901 kbps

#### Timed text tracks (delivery)

- **thumbnails:** `https://cdn.jwplayer.com/strips/qZ02gNuc-120.vtt`

#### Transcript

Linux has a pretty powerful lookalike modeling feature which allows you essentially to go into the UI, it makes it super easy for marketers and less technical folks. If you have a data science team, they're probably already doing scoring, it probably lives in your warehouse, you probably don't want to just bank on lookalike models. But for somebody that wants to kind of like fill gaps in some of their data or get a better understanding easily in a few clicks, what lookalike model allows you to do, and I think one of the best use cases is like, if I am a brand and I use Sixth Sense, for those that don't know, Sixth Sense allows you to understand where traffic, like what business traffic comes from. When I go to a website, they can, for a portion of my audience associated with, okay, Mark works at ContentStack, ContentStack has X number of employees and here's their annual revenue and all this information. The downside to tools like Sixth Sense is that they only effectively analyze 10, 15-ish percent of traffic. So for that 10% of the users, you have a really good understanding of, is this a highly qualified lead, right? So even like ContentStack as a company, when people go to the website, do you want to understand, do they work for and represent a company that's a high value company that looks like a good ContentStack customer? You can do that with a tool like Sixth Sense for about 10% of the audience, but you lose the other 90% and that you don't really know where they're coming from or what they could do. So what lookalike model allows you to do is take a target audience, say the 10% of the users that you know that you've built an audience that, okay, here's my highly qualified users. Here's the people that work for a company that's big enough that has the right sort of focuses or whatever it may be. And then I want to compare that to the other 90% of the audience and see where their behaviors overlap. So I don't necessarily know specifically where they work or what companies, but I can understand that this 90% of the audience, whether they behave like, they act like, they're looking at the things, they're clicking on the same things, that sort of like 10% of highly qualified leads looks at. Effectively, if you have a portion of your audience that you don't understand and a portion of your audience that you do understand, you can go in here, you can create a new lookalike model, you can choose your source audience. So if I go to one that's already configured, for instance, this one that we just did, it is going to essentially compare the no Sixth Sense data to the all audience that it already has Sixth Sense data. And then from there, it's going to essentially put a score on how much they look like based on their behavior, based on their interest scores and their behavioral scores and all that information that we're ultimately putting on these profiles to tell you how much or how little they actually look like a particular set of users. That data too is represented by scores on the profile. So like, so we can come back and kind of like dig deeper into lookalike models. But there are particular like marketing use cases that are super useful. The net result, though, is ultimately a score on the profile. So you'll be able to see that Mark Hayden has a score for this particular model, and it's a 76 out of 100 or a 10 out of 100. So then you can go build an audience of people that look more like highly qualified leads than less than highly qualified leads. A lot of the marketing tools also have similar functionality, like when you get to the ad tech and whatnot, they'll have their own lookalike modeling and whatnot. But we find this is really useful when you're trying to explore your data. So that's it for this week. I hope you found this video helpful. If you did, please like and subscribe. And if you want to see more videos like this, please subscribe to our YouTube channel. And if you have any questions, or you have any questions about the product, or any other topics that you'd like to see covered in this video, please leave them in the comments below. And I'll see you in the next one. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye.

#### Subtitles (WebVTT)

```webvtt
WEBVTT

1
00:00:00.000 --> 00:00:22.000
Linux has a pretty powerful lookalike modeling feature which allows you essentially to go

2
00:00:22.000 --> 00:00:26.520
into the UI, it makes it super easy for marketers and less technical folks.

3
00:00:26.520 --> 00:00:30.760
If you have a data science team, they're probably already doing scoring, it probably lives in

4
00:00:30.760 --> 00:00:33.720
your warehouse, you probably don't want to just bank on lookalike models.

5
00:00:33.720 --> 00:00:37.960
But for somebody that wants to kind of like fill gaps in some of their data or get a better

6
00:00:37.960 --> 00:00:42.000
understanding easily in a few clicks, what lookalike model allows you to do, and I think

7
00:00:42.000 --> 00:00:48.040
one of the best use cases is like, if I am a brand and I use Sixth Sense, for those that

8
00:00:48.040 --> 00:00:52.360
don't know, Sixth Sense allows you to understand where traffic, like what business traffic

9
00:00:52.360 --> 00:00:53.360
comes from.

10
00:00:53.600 --> 00:00:57.560
When I go to a website, they can, for a portion of my audience associated with, okay, Mark

11
00:00:57.560 --> 00:01:01.280
works at ContentStack, ContentStack has X number of employees and here's their annual

12
00:01:01.280 --> 00:01:03.560
revenue and all this information.

13
00:01:03.560 --> 00:01:10.720
The downside to tools like Sixth Sense is that they only effectively analyze 10, 15-ish

14
00:01:10.720 --> 00:01:12.720
percent of traffic.

15
00:01:12.720 --> 00:01:16.840
So for that 10% of the users, you have a really good understanding of, is this a highly qualified

16
00:01:16.840 --> 00:01:17.840
lead, right?

17
00:01:17.840 --> 00:01:23.080
So even like ContentStack as a company, when people go to the website, do you want to understand,

18
00:01:23.080 --> 00:01:26.240
do they work for and represent a company that's a high value company that looks like

19
00:01:26.240 --> 00:01:28.160
a good ContentStack customer?

20
00:01:28.160 --> 00:01:32.760
You can do that with a tool like Sixth Sense for about 10% of the audience, but you lose

21
00:01:32.760 --> 00:01:36.760
the other 90% and that you don't really know where they're coming from or what they could

22
00:01:36.760 --> 00:01:37.760
do.

23
00:01:37.760 --> 00:01:42.520
So what lookalike model allows you to do is take a target audience, say the 10% of the

24
00:01:42.520 --> 00:01:46.640
users that you know that you've built an audience that, okay, here's my highly qualified users.

25
00:01:46.640 --> 00:01:49.560
Here's the people that work for a company that's big enough that has the right sort

26
00:01:49.560 --> 00:01:52.040
of focuses or whatever it may be.

27
00:01:52.040 --> 00:01:56.560
And then I want to compare that to the other 90% of the audience and see where their behaviors

28
00:01:56.560 --> 00:01:57.560
overlap.

29
00:01:57.560 --> 00:02:03.760
So I don't necessarily know specifically where they work or what companies, but I can understand

30
00:02:03.760 --> 00:02:07.840
that this 90% of the audience, whether they behave like, they act like, they're looking

31
00:02:07.840 --> 00:02:12.400
at the things, they're clicking on the same things, that sort of like 10% of highly qualified

32
00:02:12.400 --> 00:02:14.040
leads looks at.

33
00:02:14.040 --> 00:02:19.440
Effectively, if you have a portion of your audience that you don't understand and a portion

34
00:02:19.440 --> 00:02:23.200
of your audience that you do understand, you can go in here, you can create a new lookalike

35
00:02:23.200 --> 00:02:26.040
model, you can choose your source audience.

36
00:02:26.040 --> 00:02:30.800
So if I go to one that's already configured, for instance, this one that we just did, it

37
00:02:30.800 --> 00:02:36.280
is going to essentially compare the no Sixth Sense data to the all audience that it already

38
00:02:36.280 --> 00:02:37.840
has Sixth Sense data.

39
00:02:37.840 --> 00:02:42.160
And then from there, it's going to essentially put a score on how much they look like based

40
00:02:42.160 --> 00:02:45.640
on their behavior, based on their interest scores and their behavioral scores and all

41
00:02:45.640 --> 00:02:50.720
that information that we're ultimately putting on these profiles to tell you how much or

42
00:02:50.720 --> 00:02:55.560
how little they actually look like a particular set of users.

43
00:02:55.560 --> 00:02:59.600
That data too is represented by scores on the profile.

44
00:02:59.600 --> 00:03:04.640
So like, so we can come back and kind of like dig deeper into lookalike models.

45
00:03:04.640 --> 00:03:07.720
But there are particular like marketing use cases that are super useful.

46
00:03:07.720 --> 00:03:10.480
The net result, though, is ultimately a score on the profile.

47
00:03:10.640 --> 00:03:15.840
So you'll be able to see that Mark Hayden has a score for this particular model, and

48
00:03:15.840 --> 00:03:18.320
it's a 76 out of 100 or a 10 out of 100.

49
00:03:18.320 --> 00:03:22.840
So then you can go build an audience of people that look more like highly qualified leads

50
00:03:22.840 --> 00:03:25.120
than less than highly qualified leads.

51
00:03:25.120 --> 00:03:28.440
A lot of the marketing tools also have similar functionality, like when you get to the ad

52
00:03:28.440 --> 00:03:32.480
tech and whatnot, they'll have their own lookalike modeling and whatnot.

53
00:03:32.480 --> 00:03:35.560
But we find this is really useful when you're trying to explore your data.

54
00:03:40.840 --> 00:03:42.120
So that's it for this week.

55
00:03:42.120 --> 00:03:43.720
I hope you found this video helpful.

56
00:03:43.720 --> 00:03:45.520
If you did, please like and subscribe.

57
00:03:45.520 --> 00:03:48.840
And if you want to see more videos like this, please subscribe to our YouTube channel.

58
00:03:48.840 --> 00:03:52.760
And if you have any questions, or you have any questions about the product, or any other

59
00:03:52.760 --> 00:03:56.480
topics that you'd like to see covered in this video, please leave them in the comments

60
00:03:56.480 --> 00:03:57.480
below.

61
00:03:57.480 --> 00:03:58.480
And I'll see you in the next one.

62
00:03:58.480 --> 00:03:59.480
Bye.

63
00:03:59.480 --> 00:04:00.480
Bye.

64
00:04:00.480 --> 00:04:01.480
Bye.

65
00:04:01.480 --> 00:04:02.480
Bye.

66
00:04:02.480 --> 00:04:03.480
Bye.

67
00:04:03.480 --> 00:04:04.480
Bye.

68
00:04:04.480 --> 00:04:05.480
Bye.

69
00:04:05.480 --> 00:04:06.480
Bye.

70
00:04:06.480 --> 00:04:07.480
Bye.

71
00:04:07.480 --> 00:04:08.480
Bye.

72
00:04:08.480 --> 00:04:09.480
Bye.

```

```transcript
<!-- PLACEHOLDER: replace with real transcript before publish if cues were auto-derived from WebVTT -->
[00:00] Linux has a pretty powerful lookalike modeling feature which allows you essentially to go
[00:22] into the UI, it makes it super easy for marketers and less technical folks.
[00:26] If you have a data science team, they're probably already doing scoring, it probably lives in
[00:30] your warehouse, you probably don't want to just bank on lookalike models.
[00:33] But for somebody that wants to kind of like fill gaps in some of their data or get a better
[00:37] understanding easily in a few clicks, what lookalike model allows you to do, and I think
[00:42] one of the best use cases is like, if I am a brand and I use Sixth Sense, for those that
[00:48] don't know, Sixth Sense allows you to understand where traffic, like what business traffic
[00:52] comes from.
[00:53] When I go to a website, they can, for a portion of my audience associated with, okay, Mark
[00:57] works at ContentStack, ContentStack has X number of employees and here's their annual
[01:01] revenue and all this information.
[01:03] The downside to tools like Sixth Sense is that they only effectively analyze 10, 15-ish
[01:10] percent of traffic.
[01:12] So for that 10% of the users, you have a really good understanding of, is this a highly qualified
[01:16] lead, right?
[01:17] So even like ContentStack as a company, when people go to the website, do you want to understand,
[01:23] do they work for and represent a company that's a high value company that looks like
[01:26] a good ContentStack customer?
[01:28] You can do that with a tool like Sixth Sense for about 10% of the audience, but you lose
[01:32] the other 90% and that you don't really know where they're coming from or what they could
[01:36] do.
[01:37] So what lookalike model allows you to do is take a target audience, say the 10% of the
[01:42] users that you know that you've built an audience that, okay, here's my highly qualified users.
[01:46] Here's the people that work for a company that's big enough that has the right sort
[01:49] of focuses or whatever it may be.
[01:52] And then I want to compare that to the other 90% of the audience and see where their behaviors
[01:56] overlap.
[01:57] So I don't necessarily know specifically where they work or what companies, but I can understand
[02:03] that this 90% of the audience, whether they behave like, they act like, they're looking
[02:07] at the things, they're clicking on the same things, that sort of like 10% of highly qualified
[02:12] leads looks at.
[02:14] Effectively, if you have a portion of your audience that you don't understand and a portion
[02:19] of your audience that you do understand, you can go in here, you can create a new lookalike
[02:23] model, you can choose your source audience.
[02:26] So if I go to one that's already configured, for instance, this one that we just did, it
[02:30] is going to essentially compare the no Sixth Sense data to the all audience that it already
[02:36] has Sixth Sense data.
[02:37] And then from there, it's going to essentially put a score on how much they look like based
[02:42] on their behavior, based on their interest scores and their behavioral scores and all
[02:45] that information that we're ultimately putting on these profiles to tell you how much or
[02:50] how little they actually look like a particular set of users.
[02:55] That data too is represented by scores on the profile.
[02:59] So like, so we can come back and kind of like dig deeper into lookalike models.
[03:04] But there are particular like marketing use cases that are super useful.
[03:07] The net result, though, is ultimately a score on the profile.
[03:10] So you'll be able to see that Mark Hayden has a score for this particular model, and
[03:15] it's a 76 out of 100 or a 10 out of 100.
[03:18] So then you can go build an audience of people that look more like highly qualified leads
[03:22] than less than highly qualified leads.
[03:25] A lot of the marketing tools also have similar functionality, like when you get to the ad
[03:28] tech and whatnot, they'll have their own lookalike modeling and whatnot.
[03:32] But we find this is really useful when you're trying to explore your data.
[03:40] So that's it for this week.
[03:42] I hope you found this video helpful.
[03:43] If you did, please like and subscribe.
[03:45] And if you want to see more videos like this, please subscribe to our YouTube channel.
[03:48] And if you have any questions, or you have any questions about the product, or any other
[03:52] topics that you'd like to see covered in this video, please leave them in the comments
[03:56] below.
```

#### Key takeaways

- Connect **Building Lookalike Models** back to your stack configuration before moving to the next module.
- Capture one concrete artifact (screenshot, Postman call, or code snippet) that proves the step works in your environment.
- Re-read the delivery versus management boundary for anything you changed in the entry model.

## Supplement for indexing

### Content summary

Building Lookalike Models. Building Lookalike Models in Data Ingestion & Profile Construction (data-insights-data-ingestion-profile-construction).

### Retrieval tags

- Building
- Lookalike
- Models
- data-insights-data-ingestion-profile-construction
- lesson 11
- Building Lookalike Models
- data-insights-data-ingestion-profile-construction lesson

### Indexing notes

Index this lesson as a primary chunk tagged with lesson_id "11" and topics: [Building, Lookalike, Models].
Parent course slug: data-insights-data-ingestion-profile-construction. Use asset_references URLs as thumbnail hints in search results when present.
Never surface LMS quiz content or assessment answers from this file.

### Asset references

| Label | URL |
| --- | --- |
| Video thumbnail: Building Lookalike Models | `https://cdn.jwplayer.com/v2/media/qZ02gNuc/poster.jpg?width=720` |

### External links

| Label | URL |
| --- | --- |
| Contentstack Academy home | `https://www.contentstack.com/academy/` |
| Training instance setup | `https://www.contentstack.com/academy/training-instance` |
| Academy playground (GitHub) | `https://github.com/contentstack/contentstack-academy-playground` |
| Contentstack documentation | `https://www.contentstack.com/docs/` |
