Back to Blogs

Augmented Reality Frameworks for an Enterprise Web-Based AR Application

Varia MakagonovaSep 10, 2020

Talk to an expert

about something you read on this page

Contact an expert

How do you create augmented reality?

In the process of building an Augmented Reality proof of concept in under 4 weeks (see details here), the team at Valtech evaluated a series of AR frameworks and software development kits (SDKs) that would enable them to rapidly pull in data from a headless CMS (Contentstack) and display it in an Augmented Reality interface on a phone or tablet web browser. Here is their quick research report.

For total beginners to AR (like me), an AR framework is the SDK to merge the digital world on-screen with the physical world in real-life. AR frameworks generally work with a graphics library, bundling a few different technologies under the hood — a vision library that tracks markers, images, or objects in the camera; a lot of math to make points in the camera registered to 3D space — and then hooks to a graphics library to render things on top of the camera view.

Which software is best for our web-based Augmented Reality use case?

The key considerations for the research were:

  • Speed. The goal was to create a working prototype as fast as possible. Once we were successfully displaying content and had completed an MVP, we could continue testing more advanced methods of object detection and tracking
    • Training custom models
    • Identifying and distinguishing objects without explicit markers
    • Potentially using OCR as a way to identify product names
    • More of a wow-factor
  • The team was agnostic on whether to work with marker or image-tracking -- willing to use whichever was most feasible for our use case.
  • Object tracking - Since the team was not trying to place objects on a real-world plane (like a floor), they realized they may not need all the features of a native iOS or Android AR library (aside from marker tracking)
  • Content display. That said, the framework needed to allow for content to be displayed in a cool and engaging way, even if we didn’t achieve fancy detection methods in 3 weeks
    • Something more dynamic than just billboarded text on video
    • Maybe some subtle animation touches to emphasize the 3D experience (e.g. very light Perlin movement in z plane)
  • Platform. The preference was for a web-based build (not requiring an app installation)

Comparing the available AR Frameworks: Marker tracking, object tracking, and platform-readiness

Here's an overview of our AR / ML library research notes:


  • Uses Vuforia*
  • Cross-browser & lightweight
    Probably the least-effort way to get started
  • Offers both marker & image tracking. Image tracking uses NFT markers.
  • Platforms: Web (works with Three.js or A-Frame.js)

Zappar WebAR

  • Has SDK for Three.js.
  • SDK seems free; content creation tools are paid
  • Image tracking only
  • Platforms: Web (Three.js / A-Frame / vanilla JS); Unity; C++


  • Not web-based
  • Image tracking is straightforward, but can’t distinguish between two similar labels with different text
  • Offers both marker & image tracking
  • Platforms: iOS


  • Uses Vuforia
  • Has a complex absolute coordinate system that must be translated into graphics coordinates. No Github updates since 2017.
  • Offers both marker & image tracking
  • Platforms: Works in Argon4 browser

Web XR

  • Primarily for interacting with specialized AR/VR hardware (headsets, etc.)

  • Primarily an AR content publishing tool to create 3D scenes

Google MediaPipe (KNIFT)

  • Uses template images to match objects in different orientations (allows for perspective distortion.) You can learn more here.
  • Marker and image tracking: Yes, sort of...even better. KNIFT is an advanced machine learning model that does NFT (Natural Feature Tracking), or image tracking -- the same as AR.js does, but much better and faster. It doesn't have explicit fiducial markers tracking, but markers are high-contrast simplified images, so it would handle them well, too. 
  • Platforms: Just Android so far, doesn't seem to have been ported to iOS or Web yet

Google Vision API - product search

  • Create a set of product images, match a reference image to find the closest match in the set.
  • Cloud-based. May or may not work sufficiently in real-time?
  • Image classification
  • Platforms: Mobile / web

Google AutoML (Also option for video-based object tracking)

  • Train your own models to classify images according to custom labels
  • Image classification
  • Platforms: Any


  • Friendly ML library for the web. Experimented with some samples that used pre-trained models for object detection. Was able to identify “bottles” and track their position.
  • Object detection
  • Platforms: Web


  • AR add-on for p5. Uses WebXR.
  • Platforms: Seems geared towards VR / Cardboard

* Vuforia is an API that is popular among a lot of AR apps for image / object tracking. Their tracking technology is widely used in apps and games, but is rivaled by modern computer vision APIs - from Google, for example

Graphics Library Research

Under the hood, browsers usually use WebGL to render 3D to a <canvas> element, but there are several popular graphics libraries that make writing WebGL code easier. Here's what we found in our graphics library research:


  • WebGL framework in Javascript. Full control over creating graphics objects, etc., but requires more manual work.
  • Examples: Github Repo


  • HTML wrapper for Three.js that integrates an entity-component system for composability, as well as a visual 3D inspector. Built on HTML / the DOM
  • Easy to create custom components with actions that happen in a lifecycle (on component attach, on every frame, etc.)
  • Examples: Github Repo


  • WebGL framework with Unity-like editor
  • Could be convenient for quickly throwing together complex scenes. You can link out a scene to be displayed on top of a marker, or manually program a scene. Potentially less obvious to visualize / edit / collaborate / see what’s going on in code if you use an editor and publish a scene.
  • Slightly unclear how easy it is to dynamically generate scenes based on incoming data / how to instantiate a scene with parameters
  • Examples: Github Repo

Recommendations for this project

Here is what we decided to go with for our AR demo.

  • Start with AR.js (another option was Zappar) + A-Frame.js for a basic working prototype
  • In the longer term, explore options for advanced object recognition and tracking

Read more about determining the best way to do marker tracking; narrowing down the use case and developing the interaction design; and content modeling for AR in our full coverage of week one of development.

About Contentstack

The Contentstack team comprises highly skilled professionals specializing in product marketing, customer acquisition and retention, and digital marketing strategy. With extensive experience holding senior positions in notable technology companies across various sectors, they bring diverse backgrounds and deep industry knowledge to deliver impactful solutions.  

Contentstack stands out in the composable DXP and Headless CMS markets with an impressive track record of 87 G2 user awards, 6 analyst recognitions, and 3 industry accolades, showcasing its robust market presence and user satisfaction.

Check out our case studies to see why industry-leading companies trust Contentstack.

Experience the power of Contentstack's award-winning platform by scheduling a demo, starting a free trial, or joining a small group demo today.

Follow Contentstack on Linkedin

Share on:

Talk to an expert

about something you read on this page

Contact an expert

Recommended posts