Training observability for serious ML teams

The training tool
you keep open
all day.

InstantML is a W&B-style platform built for speed. Sub-second metric charts at 90k runs, instant run comparison, real artifacts and checkpoints, and predictable pricing — without tracked-hour billing.

Become a design partner See the SDK

Drop-in for W&B / MLflow / Neptune Self-host or hosted

Project summary p95

78 ms

Measured at 90,000 runs. Charts open before the cursor stops moving.

Indexed search p95

118 ms

Server-side search/sort over name, tags, config, and notes — at scale.

First useful render

387 ms

Time-to-pixels on the production web dashboard. No spinner-driven UX.

How it works

One pip install. Three SDK calls.

The Python SDK is a thin, non-blocking layer over a Rust/Postgres backend. Metrics buffer in-process and flush asynchronously, so your training loop never waits on the network. If the server is slow or offline, events spool to disk and replay on reconnect.

InstantML hot path

Your trainer

Object storage

Why teams switch

The old tools work. They're just slow.

We've watched serious ML teams put up with run-list pages that take five seconds to load, charts that lag behind the mouse, and pricing that punishes you for actually using the product. InstantML is the third option.

01Status quo

Wait on a slow run list

Every project entry costs you focus. Spinners are the dominant UI.

02Status quo

Pay per tracked hour

Pricing scales with how hard your team is working. The wrong incentive.

03InstantML

InstantML

Sub-second charts at 90k runs, flat predictable pricing, and a data model you can self-host.

What we fix

Three places the current tools hurt.

Jump to the SDK

01Today

Comparison is the killer

Side-by-side runs reload every chart. We render compare from materialized summaries, not raw scans.

02Today

Logging blocks training

Synchronous SDKs make your loop wait on HTTP. Ours buffers and spools — your trainer never blocks.

03Today

Your runs aren't yours

Export is a side-feature. Ours is a first-class GET /api/export with deterministic JSONL.

Capabilities

The daily workflow, measured in milliseconds.

Metric charts

Loss curves that keep up with your loop.

Streamed scalar series, grouped averages, smoothing, range zoom, hover tooltips. The chart you actually watch.

run · llm-7b-sft · loss

streaming

train

eval

step 28,400loss 0.52throughput 42k tok/s

Benchmarked at scale

90,000 runs. No spinner.

Project summary p95 78 ms · search p95 118 ms · metric-best sort p95 66 ms · chart series p95 22 ms. Measured locally, reproducible by the included benchmark.

project p95

ms · at 90k runs

Non-blocking SDK

Buffered. Async. Offline-safe.

init · log · artifact · checkpoint · finish.

sdk.tail · run r_a4e2

tailing

12:42:58run.init project=llm-7b-sft id=r_a4e2 config=24 keys

12:39:47run.metric step=4200 loss=1.82 lr=2.0e-4 thr=41k tok/s

12:36:36run.artifact name=eval/confusion bytes=128KiB mime=image/png

12:33:25run.metric step=4800 loss=1.74 grad_norm=0.94

12:30:14run.checkpoint step=5000 shard=0 bytes=12.4GiB sha=4f9c…

12:27:03sdk.flush queued=2,048 uploaded=2,048 spool=0

12:24:52run.metric step=5400 loss=1.66 vram_used=78.2GiB

12:21:41run.finish status=ok duration=4h12m events=18,420

12:18:30run.init project=llm-7b-sft id=r_a4e2 config=24 keys

12:15:19run.metric step=4200 loss=1.82 lr=2.0e-4 thr=41k tok/s

12:12:08run.artifact name=eval/confusion bytes=128KiB mime=image/png

12:09:57run.metric step=4800 loss=1.74 grad_norm=0.94

12:06:46run.checkpoint step=5000 shard=0 bytes=12.4GiB sha=4f9c…

12:03:35sdk.flush queued=2,048 uploaded=2,048 spool=0

12:00:24run.metric step=5400 loss=1.66 vram_used=78.2GiB

12:57:13run.finish status=ok duration=4h12m events=18,420

Real data model

Typed attributes, not stringly-typed dicts.

Configs, float series, string series, file series, histograms, and tags — first-class. Rich-object tables, audio, MP4 rollouts, and image artifacts come along for the ride.

floatsstringsfileshistogramstags

Drop-in

Import yesterday's runs.

First-class importers for W&B, MLflow, and Neptune JSON exports. Dual-log against your old tool during migration.

W&BMLflowNeptunePyTorchJAXTRL

For developers

pip install instantml. Three lines.

The SDK is intentionally small. Three calls — init, log, finish — cover the daily loop. Artifacts and checkpoints are just files. Imports replay history from W&B, MLflow, and Neptune so you don't lose a year of training when you switch.

Python 3.11+Rust APIPostgresOpen SDK

train.py

$ pip install instantml && python train.py

# Three calls. No daemon, no dashboard tab to babysit.
import instantml as im

run = im.init(project="llm-7b-sft", config=cfg)

for step, batch in enumerate(loader):
    loss = train_step(batch)
    run.log({"loss": loss, "step": step})

run.log_artifact("checkpoint", "./ckpt")
run.finish()

What ships today

A real product, not a roadmap deck.

Python SDK

init / log / finish

Run compare

Side-by-side

Artifacts

Files · checkpoints

W&B import

JSON · CLI

MLflow import

JSON · CLI

Neptune import

JSON · CLI

Docker Compose

One command

Self-hosted

Available

Hosted SaaS

Design partners

Dual-log to W&B

In testing

Flat pricing

No tracked hours

Data export

GET /api/export

Talk to us about pricing·[email protected]

Run your next experiment on a tool that keeps up.

We're onboarding a small first cohort of design partners. Send a real email, get a real engineer. No sales calls — just your real runs, ingested, with our team helping you compare.

Become a design partner See the SDK

[email protected] · we reply same business day

The training tool you keep open all day.

One pip install. Three SDK calls.

The old tools work. They're just slow.

Wait on a slow run list

Pay per tracked hour

InstantML

Three places the current tools hurt.

Comparison is the killer

Logging blocks training

Your runs aren't yours

The daily workflow, measured in milliseconds.

Loss curves that keep up with your loop.

90,000 runs. No spinner.

Buffered. Async. Offline-safe.

Typed attributes, not stringly-typed dicts.

Import yesterday's runs.

pip install instantml. Three lines.

A real product, not a roadmap deck.

Run your next experiment on a tool that keeps up.

The training tool
you keep open
all day.