~4 min read

Sentis — ML inference in Unity

ONNX models in real time, IWorker, a practical image classification example.

Sentis is a system built into Unity 6 for running machine learning in real time inside the game. It replaced Barracuda (deprecated since 2023).

What you can do with Sentis:

  • Image classification from the camera (ResNet, MobileNet, YOLO).
  • NPC AI based on neural networks (decision-making, dialog generation).
  • Style transfer for post-process effects.
  • Pose estimation for motion capture without sensors.
  • Voice command recognition.

Installation and the basic idea

  1. Package Manager → install Sentis (com.unity.sentis). As of Unity 6.1 it’s officially GA.
  2. Import an .onnx file (ONNX is an open format for exporting models from PyTorch, TensorFlow, etc.) as a regular asset. Unity automatically recognizes and converts it.
  3. In code you use ModelLoader.Load(modelAsset)Workerworker.Schedule(input)worker.PeekOutput().
Web

transformers.js from Hugging Face — running models in the browser via ONNX Runtime Web. Sentis is the same conceptually, but native through compute shaders, without WebAssembly overhead.

Unity

Under the hood, Sentis uses GPU compute shaders (optionally CPU). Inference at 60+ FPS for small models on consumer hardware. One Worker is one runtime for one model; you create it at load time.

A full example: image classification

Let’s take MobileNetV2 — a lightweight model that classifies a 224×224 image into one of 1000 categories (ImageNet).

Preparation

  1. Download mobilenetv2.onnx (for example, from the ONNX Model Zoo).
  2. Drag it into Assets/ — a ModelAsset will appear.
  3. Create a labels.txt with 1000 category lines (the ImageNet classes).

The script

using System.Collections.Generic;
using System.Linq;
using Unity.Sentis;
using UnityEngine;

public class ImageClassifier : MonoBehaviour
{
    [SerializeField] private ModelAsset modelAsset;
    [SerializeField] private TextAsset labels;        // labels.txt
    [SerializeField] private Camera sourceCamera;
    [SerializeField] private RenderTexture cameraOutput;

    private Worker _worker;
    private Model _model;
    private string[] _classes;

    private void Awake() {
        _model = ModelLoader.Load(modelAsset);
        _worker = new Worker(_model, BackendType.GPUCompute);
        _classes = labels.text.Split('\n').Select(s => s.Trim()).ToArray();
    }

    private void OnDestroy() {
        _worker?.Dispose();
    }

    public void Classify() {
        // 1. Convert the image (RenderTexture) → Tensor 1×3×224×224 (NCHW)
        using var input = TextureConverter.ToTensor(cameraOutput, width: 224, height: 224, channels: 3);

        // 2. Run the model
        _worker.Schedule(input);

        // 3. Read the output (1×1000 — confidence for each class)
        using var output = (_worker.PeekOutput() as Tensor<float>).ReadbackAndClone();

        // 4. Find the argmax
        int bestIdx = 0;
        float bestVal = output[0];
        for (int i = 1; i < output.shape[1]; i++) {
            if (output[0, i] > bestVal) {
                bestVal = output[0, i];
                bestIdx = i;
            }
        }

        Debug.Log($"Detected: {_classes[bestIdx]} (confidence: {bestVal:F3})");
    }

    private void Update() {
        if (Input.GetKeyDown(KeyCode.Space)) {
            Classify();
        }
    }
}

When Space is pressed, the script grabs the image from the camera (cameraOutput is the RenderTexture that sourceCamera renders into), runs it through the model, and prints the most probable class.

Backend selection — GPU vs CPU

Worker accepts a BackendType:

BackendWhen
BackendType.GPUComputeDefault. Compute shaders, fast on modern GPUs.
BackendType.GPUPixelFallback for GPUs without compute shader support (old WebGL).
BackendType.CPUWhen there’s no GPU or you need deterministic output (for a replay system).

Performance for MobileNetV2 224×224 on an NVIDIA GTX 1060:

  • GPUCompute: ~5 ms
  • CPU: ~50 ms

Tensor — the input/output format

Sentis works with the Tensor<float> type (or Tensor<int>). It has a shape — an array of dimensions. For NCHW (the standard image format):

  • N = batch size (usually 1 for real-time)
  • C = channels (3 for RGB)
  • H, W = height × width
// Create a tensor from an array
var tensor = new Tensor<float>(new TensorShape(1, 3, 224, 224));
for (int i = 0; i < tensor.count; i++) {
    tensor[i] = Random.value;
}

// Or from a Texture / RenderTexture
var tex = TextureConverter.ToTensor(rt);

Important: Tensor implements IDisposable. If you forget .Dispose(), you get a GPU memory leak.

Using it in game logic

Common patterns:

NPC decision-making

A NN model takes a “game state” vector and returns an action. Pre-trained reinforcement learning from ML-Agents → export to ONNX → import into the shipping build.

Style transfer on video

Apply an artistic style (e.g., Ghibli, oil painting) to the camera render in real time. Sentis runs a StyleGAN-like model on every frame.

Speech-to-text

Whisper or a similar model → converting microphone input into text for voice commands.

Image-based AI

Capturing an image from the camera (AR mode) → detecting objects via YOLO → placing virtual objects next to real ones.

Performance budget matters

For large models (several hundred MB), inference often costs >16.6 ms per frame — the game stutters. Solutions:

  • Use a smaller model (MobileNet instead of ResNet).
  • Run it once every N frames rather than every frame — for NPC decision-making, 5 FPS is enough.
  • Quantize the model to int8 (if the ML framework supports it).
  • Async inference via worker.ScheduleAsync (returns something Promise-like).

Async inference

public async Awaitable ClassifyAsync(RenderTexture input) {
    using var tensor = TextureConverter.ToTensor(input, 224, 224, 3);

    _worker.Schedule(tensor);
    // Wait for the GPU to finish — without blocking the main thread
    using var output = (_worker.PeekOutput() as Tensor<float>);
    await output.ReadbackRequestAsync();

    using var cpu = output.ReadbackAndClone();
    // Process the result
}

ReadbackRequestAsync waits asynchronously without the main thread — the game keeps rendering.

Where to get models

  • ONNX Model Zoo — the official repo with ready-made models (classification, detection, NLP).
  • Hugging Face — a huge library. Often shipped with PyTorch — you export to ONNX via torch.onnx.export().
  • Unity AI Hub — Unity’s collection of models optimized for Sentis.
  • Custom training: train in PyTorch/TensorFlow → export to ONNX → import into Unity.

Sentis limitations

  • Not all operators are supported — some exotic layers (custom CUDA operations) don’t work.
  • Dynamic shapes are limited — Sentis prefers a fixed input size (which is fine for most game ML models).
  • Memory footprint — large models eat VRAM. Profile it.
  • WebGL — works, but slowly (GPU compute shaders in the browser = WebGPU only).

Comparison with alternatives

SentisML-AgentsTorchSharp
PurposeProduction inferenceTraining agentsGeneral .NET ML
BackendGPU compute / CPUPyTorch (training)LibTorch native
SizeLightHeavyHeavy
Integration with UnityNativeNativeManual

ML-Agents are used for training RL agents, Sentis for running the resulting model. These are different phases of the pipeline.