How we use AI to speed up manual penetration testing at Contentstack

Published: October 15, 2025

TL;DR
The Contentstack Security Team developed a small Burp Suite extension that lets any Burp request talk to a local LLM (or via MCP clients such as Claude Desktop). The extension proposes candidate inputs, mutates the current request and sends variants through BurpSuite so traffic, auth, scope and logging remain controlled.

Result: Our “think → craft → send → compare” loop on staging has dropped from minutes to seconds.

Contentstack is more than just a headless CMS today. It’s a composable DXP with Marketplace, Automation Hub, Brand Kit, Launch, Personalization and Lytics. That composability is powerful for builders, but it expands the attack surface.

Automation catches many classes of issues, but the most interesting bugs still show up when a human tests with domain context: auth edges, tenant boundaries, unusual encodings and quirky parsers. That’s why manual testing remains central to our security reviews.

Over one week, we streamlined one of the slowest manual loops — “think → craft input → send → compare” — by wiring a small AI helper into BurpSuite. This post describes the helper, how we use it and how it fits our process.

Why we built this

Manual payload generation and iteration are time-consuming. The process generally requires an engineer to:

Identify parameters (URL, body, JSON, cookie)
Try a handful of benign detection inputs
Send them one by one and repeat with alternate encodings
Move promising results into Repeater and dig deeper

Credit-metered cloud AI features are useful, but not always practical for continuous fuzzing on internal environments. We wanted a human-in-the-loop assistant that:

Lives inside Burp’s workflow and logger
Runs locally or via an org-approved MCP client
Never sends anything without an explicit tester action

What we built

Local LLM Assistant (open source, Montoya API) — a Burp extension that adds a “Local LLM” tab and a context-menu action.

Key features:

Seed from Burp: Right-click any request → “Use this request as seed.”
2 Generation Modes:
- Command mode: Leave the prompt blank, select vuln family (SQL/NoSQL/XSS/etc.), location (URL/BODY/JSON/COOKIE) and count. The model proposes candidate inputs.
- Prompt mode: Write a short instruction; the model returns a structured list.
Fire through Burp: The extension mutates the seeded request and sends variants in parallel through Burp’s HTTP stack (optionally adds each to Repeater).
Encoding variants: Optionally send both normal and encoded versions (URL/Base64/HTML).
Timing/observability: Logs generation time, send time and wall-clock total, so that we can see how the time is spent.
MCP bridge: A tiny HTTP bridge that lets MCP clients (e.g., Claude Desktop, VS Code MCP) fetch the seed, propose inputs and callback to send — without requiring a local model download.

IMPORTANT: All processes run on authorized staging targets. A tester always selects the seed and clicks Send. All traffic remains in Burp.

Architecture at a glance

Burp stays in the control plane: the extension mutates the selected seed and fires via Burp’s HTTP stack (shown in Logger/Repeater).
Logic runs locally: the extension builds variants and exposes a tiny bridge (/v1/seed, /v1/send).
Local model (Ollama or other approved runtime) or MCP client via the bridge supplies candidate inputs.
Generated variants flow back through Burp, so logging, auth and scope rules apply.

burp_suite_extension_manual_penetration_testing_contentstack.png

Why this matters: The LLM never talks directly to the target. Burp remains the network control plane, so logging, proxying, auth and scopes stay intact.

Results so far

From representative staging runs:

End-to-end (generate + send): ~11–20 seconds typical (versus ~3–4 minutes in the first prototype).
Biggest wins: Parallel sending + stricter structured output format.
Tester experience: Fewer context switches, faster iteration and clearer timing to spot slowness (model vs. network).

Note: Your mileage may vary depending on model, network and target.

Where AI helps — and where it doesn’t

Helps:

Suggesting a few benign detection inputs quickly
Covering boring variants and alternate encodings
Moving “plausible” cases into Repeater for human investigation

Doesn’t replace:

Understanding authorization flows, tenant isolation and data paths
Choosing safe, business-relevant checks
Turning a quirky response into a verified finding with evidence

We’re not trying to replace testers — we’re cutting keystrokes so testers can spend time on judgment and evidence-gathering.

Internal workflow

Seed the request: In Repeater/Proxy history → right-click → “Use this request as seed.” The extension surfaces parameter names by location (URL/BODY/JSON/COOKIE) and cookies.
Choose generation mode: Command mode for quick candidates; Prompt mode for targeted needs (e.g., “vary boolean and time-based checks”).
Send & observe: Parallel requests fire through Burp; each result shows status, size and duration. Optionally add variants to Repeater for manual follow-up.
Record findings: Log inputs, response evidence and timings to internal notes. If anything is interesting, write a reproducible proof and open a ticket.

Guardrails and governance

Human in the loop: No autonomous crawling — a tester selects the seed and clicks Send.
Authorized targets only: UI, defaults, and docs emphasize “lab/staging use only.”
Local model by default: Use a local model or route MCP calls through the local bridge with optional bearer tokens.
Observability: All traffic flows through BurpSuite and is visible in Logger.

What we didn’t automate (on purpose)

Assertions and finding classification (human judgment still required)
Advanced session/state management tricks beyond Burp’s normal flows
Intrusive traffic outside the explicit tester’s action

Roadmap

Planned improvements:

Auto-iterate: A loop that runs N rounds (or time-bounded): plan → send → observe → revise score deltas (status/latency/body/header), keeping top-K and tweaking strategy each round.
This is a human-started, scope-bound, benign probe whose execution stays in Burp while the model just proposes the next steps.
Playbooks: Repeatable sequences for common test families.
Assertions library: Non-destructive checks and diffing to assist triage.

Getting started

Repo: github.com/KaustubhRai/BurpSuite_LocalAI
Build the JAR or install the release JAR in Burp → Extensions → Add.
Local model path: Run Ollama (or an approved local runtime) and set Base URL/Model in the extension tab. Or enable MCP bridge and register the MCP server in your client (Claude Desktop/VS Code).
Right-click a request → Use as seed → choose mode → Send.
Watch Logger (Sources → Extensions) for traffic and timings.

Conclusion

This is an internal productivity tool for our security team — not a point-and-shoot scanner. Burp remains the single control plane: logging, proxying, auth and scoping stay exactly where they should. The extension speeds iteration so testers can focus on judgment, evidence and remediation.

About Contentstack

The Contentstack team comprises highly skilled professionals specializing in product marketing, customer acquisition and retention, and digital marketing strategy. With extensive experience holding senior positions at renowned technology companies across Fortune 500, mid-size, and start-up sectors, our team offers impactful solutions based on diverse backgrounds and extensive industry knowledge.

Contentstack is on a mission to deliver the world’s best digital experiences through a fusion of cutting-edge content management, customer data, personalization, and AI technology. Iconic brands, such as AirFrance KLM, ASICS, Burberry, Mattel, Mitsubishi, and Walmart, depend on the platform to rise above the noise in today's crowded digital markets and gain their competitive edge.

In January 2025, Contentstack proudly secured its first-ever position as a Visionary in the 2025 Gartner® Magic Quadrant™ for Digital Experience Platforms (DXP). Further solidifying its prominent standing, Contentstack was recognized as a Leader in the Forrester Research, Inc. March 2025 report, “The Forrester Wave™: Content Management Systems (CMS), Q1 2025.” Contentstack was the only pure headless provider named as a Leader in the report, which evaluated 13 top CMS providers on 19 criteria for current offering and strategy.

Follow Contentstack on LinkedIn.

Ready to reimagine possible?

Discover how Contentstack can help you gain an Experience Edge for your business.