Atlassian’s Inference Engine, our self-hosted AI inference service

Atlassian’s Inference Engine, our self-hosted AI inference service

Powering Enterprise-scale AI

As Atlassian’s AI capabilities continue to scale rapidly across multiple products, a pressing challenge emerged: how do we deliver world-class AI-powered solutions to millions of users without compromising on latency, flexibility, and operational control?

The answer: Atlassian’s Inference Engine, our custom-built, self-hosted AI inference platform that now powers production LLMs, search models, content moderators, and more across the Atlassian Cloud.

With it, we’ve achieved:

Why We Built It

Off-the-shelf solutions got us off the ground, but as usage scaled and product teams demanded more control, bottlenecks became unavoidable:

So we built a solution from first principles. The Atlassian Inference Engine isn’t a proof-of-concept, it’s a foundation in how we ship AI at Atlassian.

Technical Challenges & Early Concerns

Building our own inference platform wasn’t a decision we took lightly, we knew we’d be taking ownership of everything from optimization and performance to uptime and cost control, and we had real concerns going in.

Some of the early challenges and questions we asked ourselves included:

Architecture Overview

Atlassian’s Inference Engine is built on a foundation of open-source technologies, with internal systems layered in to support enterprise-grade AI workloads. This initiative laid the groundwork for AI workloads on KITT (Atlassian’s internal Kubernetes platform).

Under the hood, Atlassian’s Inference Engine runs on a modern, cloud-native stack:

This stack allows us to fully automate cluster provisioning, model rollouts, and versioning, and gives us fine-grained control over GPU spend, rollout safety, and operational observability without being locked into a proprietary platform.

We designed Atlassian’s Inference Engine to scale our impact. That meant investing early in designing for automation and clean interfaces between systems. This mindset has helped us support dozens of models in parallel, optimize for cost performance, and move fast without compromising reliability.

Deployment Model: GitOps for Reliability and Traceability

We use a Git-based deployment model for everything from system configurations to model deployments. Each environment is managed via declarative manifests stored in Git, with ArgoCD continuously reconciling the desired state.

This approach gives us:

Our Git deployment approach also reduces human error and simplifies collaboration, proposed changes are brought up through PRs, validated with CI, and roll out safely through Git merges. This model gives our team confidence in how we scale inference across regions, models, and versions.

Optimization Stack

For model inferencing and optimization, our go-to setup is NVIDIA Triton Inference Server, and for GenAI models, we use TensorRT-LLM and vLLM as backends. These tools have allowed us to realize impactful latency reductions at scale, with open-source setups, and have the flexibility to deploy a wide array of AI solutions. Here are some key optimization benefits:

This high-level diagram outlines the structure of Atlassian’s Inference Engine, showing the split between the Control Plane, where models are prepared and optimized, and the Data Plane, where inference happens at scale.

Control Plane – Preparing the Models

Once model artifacts are uploaded to our internal registry:

We’re also experimenting with benchmarking different compilation strategies and broader container support.

Data Plane – Serving Inference at Scale

This setup lets us run resilient inference, tightly optimized for cost and performance.

Results

Today, Atlassian’s Inference Engine:

We monitor and alert on:

These metrics are surfaced through Prometheus, and we alert on the autoscaling, pod disruptions, deployment interruptions, and subnet saturation.

What’s Next

Inference at Atlassian is an evolving discipline, and we’re just getting started. Looking ahead, our focus is on improved smart scaling, deeper optimizations, and supporting an even broader range of AI use cases across products.

We’re continuing to innovate and invest in:

Exit mobile version