Sentis — ML inference in Unity
ONNX models in real time, IWorker, a practical image classification example.
Sentis is a system built into Unity 6 for running machine learning in real time inside the game. It replaced Barracuda (deprecated since 2023).
What you can do with Sentis:
- Image classification from the camera (ResNet, MobileNet, YOLO).
- NPC AI based on neural networks (decision-making, dialog generation).
- Style transfer for post-process effects.
- Pose estimation for motion capture without sensors.
- Voice command recognition.
Installation and the basic idea
- Package Manager → install Sentis (
com.unity.sentis). As of Unity 6.1 it’s officially GA. - Import an
.onnxfile (ONNX is an open format for exporting models from PyTorch, TensorFlow, etc.) as a regular asset. Unity automatically recognizes and converts it. - In code you use
ModelLoader.Load(modelAsset)→Worker→worker.Schedule(input)→worker.PeekOutput().
transformers.js from Hugging Face — running models in the browser via ONNX Runtime Web.
Sentis is the same conceptually, but native through compute shaders, without WebAssembly
overhead.
Under the hood, Sentis uses GPU compute shaders (optionally CPU). Inference at 60+ FPS for small models on consumer hardware. One Worker is one runtime for one model; you create it at load time.
A full example: image classification
Let’s take MobileNetV2 — a lightweight model that classifies a 224×224 image into one of 1000 categories (ImageNet).
Preparation
- Download
mobilenetv2.onnx(for example, from the ONNX Model Zoo). - Drag it into
Assets/— aModelAssetwill appear. - Create a labels.txt with 1000 category lines (the ImageNet classes).
The script
using System.Collections.Generic;
using System.Linq;
using Unity.Sentis;
using UnityEngine;
public class ImageClassifier : MonoBehaviour
{
[SerializeField] private ModelAsset modelAsset;
[SerializeField] private TextAsset labels; // labels.txt
[SerializeField] private Camera sourceCamera;
[SerializeField] private RenderTexture cameraOutput;
private Worker _worker;
private Model _model;
private string[] _classes;
private void Awake() {
_model = ModelLoader.Load(modelAsset);
_worker = new Worker(_model, BackendType.GPUCompute);
_classes = labels.text.Split('\n').Select(s => s.Trim()).ToArray();
}
private void OnDestroy() {
_worker?.Dispose();
}
public void Classify() {
// 1. Convert the image (RenderTexture) → Tensor 1×3×224×224 (NCHW)
using var input = TextureConverter.ToTensor(cameraOutput, width: 224, height: 224, channels: 3);
// 2. Run the model
_worker.Schedule(input);
// 3. Read the output (1×1000 — confidence for each class)
using var output = (_worker.PeekOutput() as Tensor<float>).ReadbackAndClone();
// 4. Find the argmax
int bestIdx = 0;
float bestVal = output[0];
for (int i = 1; i < output.shape[1]; i++) {
if (output[0, i] > bestVal) {
bestVal = output[0, i];
bestIdx = i;
}
}
Debug.Log($"Detected: {_classes[bestIdx]} (confidence: {bestVal:F3})");
}
private void Update() {
if (Input.GetKeyDown(KeyCode.Space)) {
Classify();
}
}
}
When Space is pressed, the script grabs the image from the camera (cameraOutput is the
RenderTexture that sourceCamera renders into), runs it through the model, and prints the most
probable class.
Backend selection — GPU vs CPU
Worker accepts a BackendType:
| Backend | When |
|---|---|
BackendType.GPUCompute | Default. Compute shaders, fast on modern GPUs. |
BackendType.GPUPixel | Fallback for GPUs without compute shader support (old WebGL). |
BackendType.CPU | When there’s no GPU or you need deterministic output (for a replay system). |
Performance for MobileNetV2 224×224 on an NVIDIA GTX 1060:
- GPUCompute: ~5 ms
- CPU: ~50 ms
Tensor — the input/output format
Sentis works with the Tensor<float> type (or Tensor<int>). It has a shape — an array of
dimensions. For NCHW (the standard image format):
N= batch size (usually 1 for real-time)C= channels (3 for RGB)H,W= height × width
// Create a tensor from an array
var tensor = new Tensor<float>(new TensorShape(1, 3, 224, 224));
for (int i = 0; i < tensor.count; i++) {
tensor[i] = Random.value;
}
// Or from a Texture / RenderTexture
var tex = TextureConverter.ToTensor(rt);
Important: Tensor implements IDisposable. If you forget .Dispose(), you get a GPU memory
leak.
Using it in game logic
Common patterns:
NPC decision-making
A NN model takes a “game state” vector and returns an action. Pre-trained reinforcement learning from ML-Agents → export to ONNX → import into the shipping build.
Style transfer on video
Apply an artistic style (e.g., Ghibli, oil painting) to the camera render in real time. Sentis runs a StyleGAN-like model on every frame.
Speech-to-text
Whisper or a similar model → converting microphone input into text for voice commands.
Image-based AI
Capturing an image from the camera (AR mode) → detecting objects via YOLO → placing virtual objects next to real ones.
For large models (several hundred MB), inference often costs >16.6 ms per frame — the game stutters. Solutions:
- Use a smaller model (MobileNet instead of ResNet).
- Run it once every N frames rather than every frame — for NPC decision-making, 5 FPS is enough.
- Quantize the model to int8 (if the ML framework supports it).
- Async inference via
worker.ScheduleAsync(returns something Promise-like).
Async inference
public async Awaitable ClassifyAsync(RenderTexture input) {
using var tensor = TextureConverter.ToTensor(input, 224, 224, 3);
_worker.Schedule(tensor);
// Wait for the GPU to finish — without blocking the main thread
using var output = (_worker.PeekOutput() as Tensor<float>);
await output.ReadbackRequestAsync();
using var cpu = output.ReadbackAndClone();
// Process the result
}
ReadbackRequestAsync waits asynchronously without the main thread — the game keeps rendering.
Where to get models
- ONNX Model Zoo — the official repo with ready-made models (classification, detection, NLP).
- Hugging Face — a huge library. Often shipped with PyTorch — you export to ONNX via
torch.onnx.export(). - Unity AI Hub — Unity’s collection of models optimized for Sentis.
- Custom training: train in PyTorch/TensorFlow → export to ONNX → import into Unity.
Sentis limitations
- Not all operators are supported — some exotic layers (custom CUDA operations) don’t work.
- Dynamic shapes are limited — Sentis prefers a fixed input size (which is fine for most game ML models).
- Memory footprint — large models eat VRAM. Profile it.
- WebGL — works, but slowly (GPU compute shaders in the browser = WebGPU only).
Comparison with alternatives
| Sentis | ML-Agents | TorchSharp | |
|---|---|---|---|
| Purpose | Production inference | Training agents | General .NET ML |
| Backend | GPU compute / CPU | PyTorch (training) | LibTorch native |
| Size | Light | Heavy | Heavy |
| Integration with Unity | Native | Native | Manual |
ML-Agents are used for training RL agents, Sentis for running the resulting model. These are different phases of the pipeline.