~3 min read

Job System and Burst — Parallelism and SIMD

IJob, IJobParallelFor, the Burst compiler, NativeArray — a working path to high performance.

The main C# thread in Unity is a single thread. If Update() computes a path for a thousand NPCs, the FPS will drop. Job System + Burst is the official way to parallelize computation across all CPU cores and get SIMD optimization through a special AOT compiler.

What it is

  • Job System — a package (com.unity.jobs; in Unity 6 it’s core). An API for describing parallel tasks through IJob/IJobParallelFor structs.
  • Burst — a package (com.unity.burst). A high-performance AOT compiler: it takes your code and turns it into SIMD-optimized native code (via LLVM). Usually a ×5–×100 speedup.
  • Unity.CollectionsNativeArray<T>, NativeList<T>, etc. — GC-free structures that are passed between a Job and the main thread.
  • Unity.Mathematicsfloat3, quaternion, math.* — types on which Burst works optimally (instead of UnityEngine.Vector3/Quaternion).
Web

Web Workers + SharedArrayBuffer + WASM SIMD. The same 3 elements: parallelism + cross-thread data + low-level optimization.

Unity

Job System is threads. Burst is compilation to SIMD. NativeArray is shared memory without race conditions thanks to the safety system.

A basic IJob

The task is to compute result = sum(a × b) for two large arrays:

using Unity.Burst;
using Unity.Collections;
using Unity.Jobs;
using UnityEngine;

[BurstCompile]
public struct DotProductJob : IJob
{
    [ReadOnly] public NativeArray<float> A;
    [ReadOnly] public NativeArray<float> B;
    public NativeArray<float> Result; // [0] — the result

    public void Execute() {
        float sum = 0f;
        for (int i = 0; i < A.Length; i++) {
            sum += A[i] * B[i];
        }
        Result[0] = sum;
    }
}

public class JobUser : MonoBehaviour
{
    private void Start() {
        var a = new NativeArray<float>(10_000_000, Allocator.TempJob);
        var b = new NativeArray<float>(10_000_000, Allocator.TempJob);
        var result = new NativeArray<float>(1, Allocator.TempJob);

        // ... fill a, b with values ...

        var job = new DotProductJob { A = a, B = b, Result = result };
        JobHandle handle = job.Schedule();
        handle.Complete(); // block the main thread until the job finishes

        Debug.Log($"Dot product: {result[0]}");

        a.Dispose();
        b.Dispose();
        result.Dispose();
    }
}

What Burst does:

  1. The [BurstCompile] attribute tells the compiler to “take this struct and generate optimized native code”.
  2. The for loop unrolls into SIMD (4 or 8 floats per instruction).
  3. Without Burst — plain C# IL, about 10× slower.
Allocator strategies

Allocator.TempJob — for data that lives up to 4 frames (a job usually finishes earlier). Allocator.Persistent — if the NativeArray outlives many jobs (you free it manually via .Dispose()). Allocator.Temp — for very short-lived allocations (1 frame). Safety: Unity complains about a leak if you forget .Dispose().

IJobParallelFor — splitting work across cores

If the task is “compute something for each element independently”, use IJobParallelFor:

[BurstCompile]
public struct MoveBoidsJob : IJobParallelFor
{
    public NativeArray<float3> Positions;
    [ReadOnly] public NativeArray<float3> Velocities;
    public float DeltaTime;

    public void Execute(int i) {
        Positions[i] += Velocities[i] * DeltaTime;
    }
}

public class BoidsManager : MonoBehaviour
{
    private NativeArray<float3> _positions;
    private NativeArray<float3> _velocities;
    private const int Count = 100_000;

    private void Start() {
        _positions = new NativeArray<float3>(Count, Allocator.Persistent);
        _velocities = new NativeArray<float3>(Count, Allocator.Persistent);
        // ... initialization ...
    }

    private void Update() {
        var job = new MoveBoidsJob {
            Positions = _positions,
            Velocities = _velocities,
            DeltaTime = Time.deltaTime,
        };

        // innerloopBatchCount = how many iterations to give one worker thread at a time
        JobHandle handle = job.Schedule(Count, 1024);
        handle.Complete();
    }

    private void OnDestroy() {
        _positions.Dispose();
        _velocities.Dispose();
    }
}

100 thousand boids are updated in ~0.5 ms on an 8-core CPU with Burst — versus ~50 ms for naive C#. This is no longer “optimization”, it’s a different class of performance.

Chains of jobs

Jobs can be linked through JobHandle:

var firstJob = new ComputeForcesJob { /* ... */ };
JobHandle firstHandle = firstJob.Schedule(count, 64);

var secondJob = new IntegrateVelocityJob { /* ... */ };
JobHandle secondHandle = secondJob.Schedule(count, 64, firstHandle);
// secondJob will wait for firstJob before starting

secondHandle.Complete(); // block the main thread at the end

This is exactly the foundation of the DOTS paradigm: many small jobs with dependencies, and the scheduler parallelizes them across a pool of worker threads by itself.

Safety system and common mistakes

The Unity Job System has a built-in safety system (in the Editor, not in release). It catches:

  • Race condition — two jobs write to the same NativeArray at the same time.
  • Read-after-write without a dependency — a job reads an array that another job writes to in parallel.
  • Disposed array — an attempt to use a NativeArray after .Dispose().

Mark fields the job only reads as [ReadOnly] — this gives the compiler more freedom for parallelism.

Burst and the UnityEngine API

Burst cannot call most of the UnityEngine APITransform.position, GameObject.Find, Debug.Log. Only pure functions from Unity.Mathematics, Unity.Collections, and plain C#. This is the price of optimization: you move the logic into a “pure” computational world, then feed the results back into Transforms on the main thread.

When it is worth using

  • Heavy computation: pathfinding for 100+ agents, flock simulation, LOD calculation, batch processing of assets.
  • Custom mesh generation: procedural landscapes, marching cubes, voxel terrain.
  • AI batch decision-making: ECS-style, where you have 1000 NPCs and each one’s “what to do” is computed separately.

When it is NOT needed

  • Little computation (10–100 iterations) — the Schedule overhead eats the gain.
  • Logic tightly coupled to the UnityEngine API — rewriting everything onto NativeArray is expensive.
  • A simple prototype — Burst isn’t needed as long as the FPS is fine.

Comparison with ECS / Entities

ECS (the Entities package) is the next step. There, data is stored in Native arrays (Chunks) from the start, and systems automatically work through Job + Burst. If you are writing a simulation-heavy project (RTS, sandbox, MMO), DOTS gives a structural advantage over the MonoBehaviour-Job combination. But the entry barrier is higher.