Sandboxes - Lightdash

🛠 This page is for engineering teams self-hosting their own Lightdash instance. On Lightdash Cloud, sandboxes are fully managed for you — there’s nothing to configure.

What sandboxes are for

Some Lightdash features use an AI agent (Claude Code) that writes and runs code on your behalf. Today that’s:

AI writeback — the agent edits your dbt project (e.g. adds a metric or dimension), runs lightdash compile to validate it, and opens a pull request.
Data app generation — the agent generates and builds a small web app from a prompt.

Running model-generated code directly on the Lightdash server would be unsafe: the code is untrusted, can run arbitrary commands, and needs its own toolchain (git, dbt, the Lightdash CLI, Node). Lightdash instead runs each agent inside a sandbox — an isolated, disposable environment with a constrained network. The agent does its work there, Lightdash collects the result (a PR, a built app), and the sandbox is torn down. Sandboxes are also what make these features fast and multi-turn: a sandbox can be suspended between turns and resumed later, so a conversation with the agent keeps its state without holding a container open the whole time.

Sandbox providers

The sandbox backend is pluggable. Lightdash talks to a provider-neutral interface, so the same feature code runs on whichever backend your deployment is configured for. You select the provider with the SANDBOX_PROVIDER environment variable.

Provider	`SANDBOX_PROVIDER`	Use for	Isolation
E2B (default)	`e2b`	Production / managed	microVM
AWS Lambda MicroVMs	`lambda-microvm`	Production on AWS (self-hosted)	microVM
Azure Container Apps dynamic sessions	`azure-container-sessions`	Production on Azure (self-hosted)	Hyper-V-isolated container
Local Docker	`docker`	Local development only	container

E2B, AWS Lambda MicroVMs, and Azure Container Apps dynamic sessions are all supported production backends. E2B is the managed default; the AWS and Azure providers are for teams who want sandboxes to run inside their own cloud account. More providers (Kubernetes, ECS) are planned.

The local Docker provider is for development only. It launches plain Docker containers via the Docker socket, which is root-equivalent on the host and provides no real isolation between the sandbox and your machine. It refuses to start when NODE_ENV=production. Do not use it for a production deployment.

E2B (production default)

E2B runs each sandbox as a Firecracker microVM in E2B’s cloud. It’s the default — if you don’t set SANDBOX_PROVIDER, Lightdash uses E2B. To use it you need an E2B account and API key, and the agent needs an Anthropic API key:

SANDBOX_PROVIDER=e2b            # default, can be omitted
E2B_API_KEY=e2b_...            # from your E2B dashboard
ANTHROPIC_API_KEY=sk-ant-...   # the agent (Claude Code) runs inside the sandbox

The sandbox images are E2B templates. Lightdash uses separate templates for data apps and for AI writeback so they can be pinned or rolled back independently. These default to the published Lightdash templates and rarely need to be set:

E2B_TEMPLATE_NAME=lightdash/lightdash-data-app
E2B_TEMPLATE_TAG=<lightdash-version>             # defaults to your Lightdash version
E2B_AI_WRITEBACK_TEMPLATE_NAME=lightdash/lightdash-ai-writeback
E2B_AI_WRITEBACK_TEMPLATE_TAG=<lightdash-version>

AWS Lambda MicroVMs (self-hosted production)

AWS Lambda MicroVMs run each sandbox as a Firecracker microVM inside your own AWS account, so untrusted agent code and your repository contents never leave your infrastructure. The microVMs have no public IP — your backend reaches each one through an AWS-managed endpoint that requires a short-lived per-microVM token — and you control their outbound network access (see Networking and IAM). This is the recommended sandbox provider for customers deploying Lightdash on AWS — it keeps the sandbox boundary inside your existing AWS account and avoids sending agent workloads or repository contents to a third-party service.

Prerequisites

Provision these with your own IaC, in the same AWS account and region your Lightdash backend already runs in:

Two MicroVM images — one for data app generation and one for AI writeback (they bundle different toolchains). Build them from the Dockerfiles in the Lightdash repo (sandboxes/data-apps/, sandboxes/ai-writeback/, and the exec agent in sandboxes/microvm-agent/), push them to ECR, and register each as a Lambda MicroVM image on the AWS-managed al2023 base — ARM_64, 4 GB memory, with the agent’s /ready hook on port 8080. Each registration returns an image ARN for the config below.
Control-plane permissions on your backend’s existing IAM role — add RunMicrovm, GetMicrovm, SuspendMicrovm, ResumeMicrovm, TerminateMicrovm, and CreateMicrovmAuthToken.

Registering an image uses AWS’s create-microvm-image, which needs a build role (trusting lambda.amazonaws.com) and an S3 location to stage the build context. These are build-time only — not the S3 bucket Lightdash already uses for results and snapshots, and the backend never touches them at runtime.

Configure the provider

SANDBOX_PROVIDER=lambda-microvm
ANTHROPIC_API_KEY=sk-ant-...   # the agent (Claude Code) runs inside the microVM

# Region the microVMs run in (defaults to eu-west-1, the EU launch region)
LAMBDA_MICROVM_REGION=eu-west-1

# The image ARNs from registering the two MicroVM images above. Required.
LAMBDA_MICROVM_DATA_APP_IMAGE_ARN=arn:aws:lambda:<region>:<account>:microvm-image/...
LAMBDA_MICROVM_AI_WRITEBACK_IMAGE_ARN=arn:aws:lambda:<region>:<account>:microvm-image/...

The backend uses its ambient AWS credentials (instance role / IRSA / standard SDK credential chain) to call the Lambda MicroVMs control plane, so no access keys are configured here.

Networking and IAM

Configure the network connectors before going to production. The AWS-managed defaults give the microVM open inbound and outbound access, which means untrusted agent code can reach the public internet from inside your AWS account. We currently recommend pointing the egress connector at a VPC connector that has no outbound access by default, and only opening up the destinations the agent actually needs (your dbt repository host, the Anthropic / Bedrock API, your ECR registry).

Override these to tighten the network boundary or to give the microVM an IAM role:

# IAM role the microVM assumes (e.g. to pull from a private ECR or reach AWS APIs)
LAMBDA_MICROVM_EXECUTION_ROLE_ARN=arn:aws:iam::...:role/lightdash-sandbox

# Network connectors. Default to AWS-managed open ingress/egress; point these at
# your own VPC connectors to constrain traffic. We strongly recommend an egress
# connector that denies outbound traffic by default.
LAMBDA_MICROVM_INGRESS_CONNECTOR_ARN=arn:aws:lambda:<region>:aws:network-connector:aws-network-connector:ALL_INGRESS
LAMBDA_MICROVM_EGRESS_CONNECTOR_ARN=arn:aws:lambda:<region>:aws:network-connector:aws-network-connector:INTERNET_EGRESS

Azure Container Apps dynamic sessions (self-hosted production)

Azure Container Apps dynamic sessions run each sandbox as a Hyper-V-isolated custom container inside your own Azure subscription, so untrusted agent code and your repository contents never leave your infrastructure. Sessions are allocated on demand from a warm pool and reached through the pool’s management endpoint — your backend authenticates to it with a short-lived Microsoft Entra token, so no session ever has a public, unauthenticated ingress. This is the recommended sandbox provider for customers deploying Lightdash on Azure — it keeps the sandbox boundary inside your existing Azure subscription and avoids sending agent workloads or repository contents to a third-party service. It pairs naturally with AKS via workload identity.

Dynamic sessions have no native memory snapshot, so this provider suspends a sandbox by tarring its workspace to S3-compatible object storage (the same bucket Lightdash already uses) and destroying the session, then restores it on the next turn. Object storage is therefore required — see external object storage.

Prerequisites

Provision these with your own IaC, in the same Azure subscription your Lightdash backend runs in:

Two session-pool container images — one for data app generation and one for AI writeback (they bundle different toolchains). Build them from the Dockerfiles in the Lightdash repo (sandboxes/data-apps/, sandboxes/ai-writeback/, and the exec agent in sandboxes/microvm-agent/) for linux/amd64, and push them to a registry the pool can pull (e.g. Azure Container Registry).
A workload-profile Container Apps environment, and two custom-container dynamic session pools in it (one per image), each with its ingress target port set to 8080 (the exec agent’s port). Custom-container sessions require a workload-profile environment — a Consumption-only environment is rejected. Each pool exposes a pool management endpoint for the config below.
A managed identity assigned to each pool, granted the built-in Azure ContainerApps Session Executor role on the pool. Your backend authenticates as this identity (via workload identity on AKS, or any credential DefaultAzureCredential resolves) to allocate and drive sessions — so no client secret is configured.

Configure the provider

SANDBOX_PROVIDER=azure-container-sessions
ANTHROPIC_API_KEY=sk-ant-...   # the agent (Claude Code) runs inside the session

# The pool management endpoints for the two session pools above. Required.
AZURE_CONTAINER_SESSIONS_DATA_APP_POOL_ENDPOINT=https://<pool>.<env>.<region>.azurecontainerapps.io
AZURE_CONTAINER_SESSIONS_AI_WRITEBACK_POOL_ENDPOINT=https://<pool>.<env>.<region>.azurecontainerapps.io

The backend authenticates to the dynamic-sessions data plane with DefaultAzureCredential, so no keys or secrets are set here — grant the backend’s identity the Azure ContainerApps Session Executor role on each pool instead. In AKS, this is a user-assigned managed identity federated to the backend’s service account via workload identity.

Both AZURE_CONTAINER_SESSIONS_API_VERSION (dynamic-sessions data-plane API version) and AZURE_CONTAINER_SESSIONS_TOKEN_SCOPE (the Entra token scope) have sensible defaults and rarely need to be set.

Networking

Constrain a session’s outbound access on the pool’s own configuration (its egress settings / the environment’s network), not in Lightdash. As with any sandbox provider, we recommend denying outbound traffic by default and only allowing the destinations the agent needs (your dbt repository host, the Anthropic / Azure OpenAI API, your container registry).

Local Docker provider (development)

For local development you can run sandboxes as plain Docker containers on your own machine — no E2B account required. This is the recommended way to work on or try the AI features locally. It uses the same images E2B builds, but as plain local Docker images. Two separate images are used (different toolchains), mirroring the two E2B templates:

Image (default tag)	Built from	Used by
`lightdash-sandbox:local`	`sandboxes/data-apps/`	Data app generation
`lightdash-ai-writeback:local`	`sandboxes/ai-writeback/`	AI writeback

Prerequisites

Docker running locally, with the daemon reachable from the Lightdash backend.
S3-compatible object storage configured (locally this is MinIO). Suspended-sandbox snapshots are tarred to object storage so a conversation survives the container being destroyed — see external object storage.
An Anthropic API key (ANTHROPIC_API_KEY) for the agent.

Setup

Build the local sandbox images (each builds from sandboxes/<feature>/):
```
./sandboxes/data-apps/build-local-image.sh        # -> lightdash-sandbox:local
./sandboxes/ai-writeback/build-local-image.sh     # -> lightdash-ai-writeback:local
```
These are large (the writeback image bundles dbt, the Lightdash CLI and Claude Code) and only need rebuilding when the sandbox toolchain changes.

Point Lightdash at the Docker provider:

SANDBOX_PROVIDER=docker
# optional — these are the defaults:
SANDBOX_DOCKER_IMAGE=lightdash-sandbox:local
SANDBOX_AI_WRITEBACK_DOCKER_IMAGE=lightdash-ai-writeback:local

Restart the backend and the scheduler so both pick up the new environment. Data app generation runs in the scheduler worker, so a stale SANDBOX_PROVIDER there will keep it on E2B. (With PM2, a plain restart reuses the cached env — delete and re-start the processes, or restart with --update-env, to actually reload the env file.)

Snapshot lifecycle

Every turn suspends its own sandbox, so in steady state nothing sits idle. For the Lambda MicroVMs provider, two timers configure the AWS-side idle policy as a backstop — when to auto-suspend a sandbox left running and when to auto-terminate a suspended one:

SANDBOX_IDLE_TIMEOUT_MS=1800000        # auto-suspend a running-but-idle microVM (default 30 min)
SANDBOX_SNAPSHOT_RETENTION_MS=604800000 # auto-terminate a suspended microVM (default 7 days); also how long a thread stays resumable

These are read only by the Lambda MicroVMs provider. E2B manages idle sandboxes itself; Azure Container Apps dynamic sessions are reclaimed by the session pool’s own cooldown lifecycle (configured on the pool) and lose nothing, since the snapshot lives in object storage; and the Docker dev provider has no idle handling.

Environment variable reference

Variable	Default	Description
`SANDBOX_PROVIDER`	`e2b`	Sandbox backend: `e2b`, `lambda-microvm`, `azure-container-sessions`, or `docker`.
`ANTHROPIC_API_KEY`	—	API key for the Claude Code agent running inside the sandbox.
`E2B_API_KEY`	—	E2B API key (required when `SANDBOX_PROVIDER=e2b`).
`E2B_TEMPLATE_NAME`	`lightdash/lightdash-data-app`	E2B template for data app sandboxes.
`E2B_TEMPLATE_TAG`	Lightdash version	Tag of the data app template to launch.
`E2B_AI_WRITEBACK_TEMPLATE_NAME`	`lightdash/lightdash-ai-writeback`	E2B template for writeback sandboxes.
`E2B_AI_WRITEBACK_TEMPLATE_TAG`	Lightdash version	Tag of the writeback template to launch.
`LAMBDA_MICROVM_REGION`	`eu-west-1`	AWS region the microVMs run in (`lambda-microvm`).
`LAMBDA_MICROVM_DATA_APP_IMAGE_ARN`	—	Image ARN for data app microVMs (required when `SANDBOX_PROVIDER=lambda-microvm`).
`LAMBDA_MICROVM_AI_WRITEBACK_IMAGE_ARN`	—	Image ARN for writeback microVMs (required when `SANDBOX_PROVIDER=lambda-microvm`).
`LAMBDA_MICROVM_EXECUTION_ROLE_ARN`	—	Optional IAM role the microVM assumes.
`LAMBDA_MICROVM_INGRESS_CONNECTOR_ARN`	AWS-managed `ALL_INGRESS`	Optional ingress network connector.
`LAMBDA_MICROVM_EGRESS_CONNECTOR_ARN`	AWS-managed `INTERNET_EGRESS`	Optional egress network connector.
`AZURE_CONTAINER_SESSIONS_DATA_APP_POOL_ENDPOINT`	—	Pool management endpoint for data app sessions (required when `SANDBOX_PROVIDER=azure-container-sessions`).
`AZURE_CONTAINER_SESSIONS_AI_WRITEBACK_POOL_ENDPOINT`	—	Pool management endpoint for writeback sessions (required when `SANDBOX_PROVIDER=azure-container-sessions`).
`AZURE_CONTAINER_SESSIONS_API_VERSION`	`2025-02-02-preview`	Dynamic-sessions data-plane API version.
`AZURE_CONTAINER_SESSIONS_TOKEN_SCOPE`	`https://dynamicsessions.io/.default`	Microsoft Entra token scope for the dynamic-sessions data plane.
`SANDBOX_DOCKER_IMAGE`	`lightdash-sandbox:local`	Local image for data app sandboxes (`docker` provider).
`SANDBOX_AI_WRITEBACK_DOCKER_IMAGE`	`lightdash-ai-writeback:local`	Local image for writeback sandboxes (`docker` provider).
`SANDBOX_IDLE_TIMEOUT_MS`	`1800000` (30 min)	Lambda MicroVMs idle policy: auto-suspend a running-but-idle microVM. Ignored by `e2b`/`azure-container-sessions`/`docker`.
`SANDBOX_SNAPSHOT_RETENTION_MS`	`604800000` (7 days)	Lambda MicroVMs idle policy: auto-terminate a suspended microVM (also how long a thread stays resumable). Ignored by `e2b`/`azure-container-sessions`/`docker`.

​What sandboxes are for

​Sandbox providers

​E2B (production default)

​AWS Lambda MicroVMs (self-hosted production)

​Prerequisites

​Configure the provider

​Networking and IAM

​Azure Container Apps dynamic sessions (self-hosted production)

​Prerequisites

​Configure the provider

​Networking

​Local Docker provider (development)

​Prerequisites

​Setup

​Snapshot lifecycle

​Environment variable reference

What sandboxes are for

Sandbox providers

E2B (production default)

AWS Lambda MicroVMs (self-hosted production)

Prerequisites

Configure the provider

Networking and IAM

Azure Container Apps dynamic sessions (self-hosted production)

Prerequisites

Configure the provider

Networking

Local Docker provider (development)

Prerequisites

Setup

Snapshot lifecycle

Environment variable reference