nbdistributed
Notebook friendly distributed training
citation
"""
citation:
@misc{zumot20205nbdistdemo,
title={NBDistributed walkthrough},
author={Zumot, Laith},
howpublished={\url{https://lazyevaluator.com/presentations/dist/nbdistributed.html}},
date = {2025-09-17},
note = {GitHub Gist}
}
"""
1) What is nbdistributed?
It is a small, pure-Python IPython extension that turns a single Jupyter notebook into a living distributed cluster. created by Zach Meuller (HuggingFace Accelerate).[https://pypi.org/project/nbdistributed/]
# Installation is simple with uv
uv pip install nbdistributed
# Load it once with
%load_ext nbdistributed
# spin up workers with
# num-processes === the number of gpus you want to use
# gpu-ids === device ids if you want to be picky
%dist_init --num-processes 2 --gpu-ids 0,1
Using GPU IDs: [0, 1]
Starting 2 distributed workers...
β Successfully started 2 workers
Rank 0 -> GPU 0
Rank 1 -> GPU 1
Available commands:
%%distributed - Execute code on all ranks (explicit)
%%rank [0,n] - Execute code on specific ranks
%sync - Synchronize all ranks
%dist_status - Show worker status
%dist_mode - Toggle automatic distributed mode
%dist_shutdown - Shutdown workers
π Distributed mode active: All cells will now execute on workers automatically!
Magic commands (%, %%) will still execute locally as normal.
π Below are auto-imported and special variables auto-generated into the namespace to use
`torch`
`dist`: `torch.distributed` import alias
`rank` (`int`): The local rank
`world_size` (`int`): The global world size
`gpu_id` (`int`): The specific GPU ID assigned to this worker
`device` (`torch.device`): The current PyTorch device object (e.g. `cuda:1`)
# see status
%dist_status
Distributed cluster status (2 processes):
============================================================
Rank 0: β PID 61856
ββ GPU: 0 (NVIDIA GeForce RTX 3090)
ββ Memory: 0.0GB / 24.0GB (0.0% used)
ββ Status: Running
Rank 1: β PID 61857
ββ GPU: 1 (NVIDIA GeForce RTX 3090)
ββ Memory: 0.0GB / 24.0GB (0.0% used)
ββ Status: Running
Every cell you run can be executed on any subset of ranks, or on all of them, while you keep the interactive prompt in your hand.
# When you are done you call
%dist_shutdown
# the GPUs are released again.
2) Tinkering cell by cell.
Early CUDA is allowed. You can probe
torch.cuda.device_count()
or allocate a tensor oncuda:3
before you ever start the workers. The plugin spawns thetorch.distributed
group later, so nothing is locked in advance.Cell-level targeting. Prefix a cell with
%%rank [0,1]
or%%distributed
and only the chosen ranks run it. You stay in the notebook UI the whole time.Fault isolation without full restart. If rank 1 throws a
NameError
, only that process shows the trace. Fix the variable in the next cell and rerunβno need to bring down the whole group.
3) Auto-created variables you can use right away
After %dist_init
each worker wakes up with these names already in its namespace:
torch
β the full PyTorch moduledist
β alias fortorch.distributed
rank
β local rank id (int)world_size
β total number of workers (int)gpu_id
β the exact GPU index assigned to this worker (int)device
β ready-madetorch.device(f'cuda:{gpu_id}')
4) A Classroom Analogy
With certain libraries you have to declare resources up-front, and after the actor starts you cannot move it to another GPU without rebuilding the job.
nbdistributed treats GPUs like seats in a classroom: you tell it βtake 2 GPUsβ and you can still walk over to seat 3 or seat 4, or even evict one worker mid-session, all from the same notebook kernel.