Training observability for serious ML teams

The training tool you keep open all day.

InstantML is a W&B-style platform built for speed. Sub-second metric charts at 90k runs, instant run comparison, real artifacts and checkpoints, and predictable pricing — without tracked-hour billing.

Drop-in for W&B / MLflow / Neptune Self-host or hosted
Project summary p95
78 ms
Measured at 90,000 runs. Charts open before the cursor stops moving.
Indexed search p95
118 ms
Server-side search/sort over name, tags, config, and notes — at scale.
First useful render
387 ms
Time-to-pixels on the production web dashboard. No spinner-driven UX.

How it works

One pip install. Three SDK calls.

The Python SDK is a thin, non-blocking layer over a Rust/Postgres backend. Metrics buffer in-process and flush asynchronously, so your training loop never waits on the network. If the server is slow or offline, events spool to disk and replay on reconnect.

YOUR TRAINING JOBTrainerPyTorch · JAX · TRLSDKbuffered · asyncoffline spoolInstantML APIRust · Postgrestyped attributes · summariesindexed run searchDashboardruns · comparecharts · artifactsArtifactsS3 / R2GPU NODESPYTHON SDKHOT PATHNEXT/REACTOBJECT STORE
InstantML hot path
Your trainer
Object storage

Why teams switch

The old tools work. They're just slow.

We've watched serious ML teams put up with run-list pages that take five seconds to load, charts that lag behind the mouse, and pricing that punishes you for actually using the product. InstantML is the third option.

01Status quo

Wait on a slow run list

Every project entry costs you focus. Spinners are the dominant UI.

02Status quo

Pay per tracked hour

Pricing scales with how hard your team is working. The wrong incentive.

03InstantML

InstantML

Sub-second charts at 90k runs, flat predictable pricing, and a data model you can self-host.

What we fix

Three places the current tools hurt.

Jump to the SDK
01Today

Comparison is the killer

Side-by-side runs reload every chart. We render compare from materialized summaries, not raw scans.

02Today

Logging blocks training

Synchronous SDKs make your loop wait on HTTP. Ours buffers and spools — your trainer never blocks.

03Today

Your runs aren't yours

Export is a side-feature. Ours is a first-class GET /api/export with deterministic JSONL.

Capabilities

The daily workflow, measured in milliseconds.

Metric charts

Loss curves that keep up with your loop.

Streamed scalar series, grouped averages, smoothing, range zoom, hover tooltips. The chart you actually watch.

run · llm-7b-sft · loss
streaming
train
eval
step 28,400loss 0.52throughput 42k tok/s
Benchmarked at scale

90,000 runs. No spinner.

Project summary p95 78 ms · search p95 118 ms · metric-best sort p95 66 ms · chart series p95 22 ms. Measured locally, reproducible by the included benchmark.

project p95
78
ms · at 90k runs
Non-blocking SDK

Buffered. Async. Offline-safe.

init · log · artifact · checkpoint · finish.

sdk.tail · run r_a4e2
tailing
12:42:58run.init project=llm-7b-sft id=r_a4e2 config=24 keys
12:39:47run.metric step=4200 loss=1.82 lr=2.0e-4 thr=41k tok/s
12:36:36run.artifact name=eval/confusion bytes=128KiB mime=image/png
12:33:25run.metric step=4800 loss=1.74 grad_norm=0.94
12:30:14run.checkpoint step=5000 shard=0 bytes=12.4GiB sha=4f9c…
12:27:03sdk.flush queued=2,048 uploaded=2,048 spool=0
12:24:52run.metric step=5400 loss=1.66 vram_used=78.2GiB
12:21:41run.finish status=ok duration=4h12m events=18,420
12:18:30run.init project=llm-7b-sft id=r_a4e2 config=24 keys
12:15:19run.metric step=4200 loss=1.82 lr=2.0e-4 thr=41k tok/s
12:12:08run.artifact name=eval/confusion bytes=128KiB mime=image/png
12:09:57run.metric step=4800 loss=1.74 grad_norm=0.94
12:06:46run.checkpoint step=5000 shard=0 bytes=12.4GiB sha=4f9c…
12:03:35sdk.flush queued=2,048 uploaded=2,048 spool=0
12:00:24run.metric step=5400 loss=1.66 vram_used=78.2GiB
12:57:13run.finish status=ok duration=4h12m events=18,420
Real data model

Typed attributes, not stringly-typed dicts.

Configs, float series, string series, file series, histograms, and tags — first-class. Rich-object tables, audio, MP4 rollouts, and image artifacts come along for the ride.

floatsstringsfileshistogramstags
Drop-in

Import yesterday's runs.

First-class importers for W&B, MLflow, and Neptune JSON exports. Dual-log against your old tool during migration.

W&BMLflowNeptunePyTorchJAXTRL

For developers

pip install instantml. Three lines.

The SDK is intentionally small. Three calls — init, log, finish — cover the daily loop. Artifacts and checkpoints are just files. Imports replay history from W&B, MLflow, and Neptune so you don't lose a year of training when you switch.

Python 3.11+Rust APIPostgresOpen SDK
train.py
$ pip install instantml && python train.py
# Three calls. No daemon, no dashboard tab to babysit.
import instantml as im

run = im.init(project="llm-7b-sft", config=cfg)

for step, batch in enumerate(loader):
    loss = train_step(batch)
    run.log({"loss": loss, "step": step})

run.log_artifact("checkpoint", "./ckpt")
run.finish()

What ships today

A real product, not a roadmap deck.

Python SDK
init / log / finish
Run compare
Side-by-side
Artifacts
Files · checkpoints
W&B import
JSON · CLI
MLflow import
JSON · CLI
Neptune import
JSON · CLI
Docker Compose
One command
Self-hosted
Available
Hosted SaaS
Design partners
Dual-log to W&B
In testing
Flat pricing
No tracked hours
Data export
GET /api/export

Run your next experiment on a tool that keeps up.

We're onboarding a small first cohort of design partners. Send a real email, get a real engineer. No sales calls — just your real runs, ingested, with our team helping you compare.

[email protected] · we reply same business day