Peer-to-Peer acceleration for AI model distribution with Dragonfly

Posted on April 6, 2026
by Pavan Madduri, CNCF Kubestronaut

CNCF projects highlighted in this post

The problem: AI model distribution is broken at scale

Large-scale AI model distribution presents challenges in performance, efficiency, and cost.

Consider a typical scenario: an ML platform team manages a Kubernetes cluster with 200 GPU nodes. A new version of a 70B parameter model becomes available — for example, DeepSeek-V3 at approximately 130 GB. Each node requires a local copy, resulting in 26 TB of data transferred from a single model hub, often through shared origin infrastructure, network bandwidth, and rate limits.

The scale of modern model hubs highlights these challenges:

Hugging Face Hub serves over 1 million models, with individual files regularly exceeding 10 GB (safetensors, GGUF quantizations).
ModelScope Hub hosts over 10,000 models — including large models such as Qwen, Yi, and inclusionAI’s Ling series — supporting a rapidly growing global user base.

These platforms have significantly improved access to open models, but distributing large artifacts across many nodes introduces system-level constraints:

Git LFS, which underpins large file storage on these platforms, is optimized for versioning and access rather than large-scale fan-out distribution.
Rate limits can affect both unauthenticated and authenticated requests under burst traffic.
Network costs increase as the same data is transferred repeatedly across environments.

Existing approaches — such as NFS mounts, pre-built container images, or object storage mirrors — can help mitigate these issues, but may introduce operational complexity, stale-model risk, or additional storage overhead.

This raises an important question: how can infrastructure enable model distribution to scale efficiently, so that downloading to the 200th node is as fast as downloading to the first, regardless of the model hub?

That’s exactly what the new hf:// and modelscope:// protocol support in Dragonfly delivers.

What Is Dragonfly?

Dragonfly is a CNCF Graduated project that provides a P2P-based file distribution system. Originally built for container image distribution at Alibaba-scale (processing billions of requests daily), Dragonfly turns every downloading node into a seed for its peers.

Core Architecture:

Figure 1: End-to-end flow of the P2P model distribution in Dragonfly. The Seed Peer fetches the model from the origin hub once (Step 1), the Dragonfly Scheduler computes the P2P topology (Step 3), and GPU nodes share pieces via micro-task distribution (Step 5) — reducing origin traffic from 26 TB to ~130 GB across a 200-node cluster.

The magic: Dragonfly splits files into small pieces and distributes them across the P2P mesh. The origin (Hugging Face Hub or ModelScope Hub) is hit once by the seed peer. Critically, the Seed Peer does not need to finish downloading the entire model before sharing with other peers — as soon as any single piece is downloaded, it can be shared immediately. This piece-based streaming download means distribution begins in parallel with the initial fetch, dramatically reducing total transfer time. For a 130 GB model across 200 nodes, origin traffic drops from 26 TB to ~130 GB — a 99.5% reduction.

Until now, Dragonfly supported HTTP/HTTPS, S3, GCS, Azure Blob Storage, Alibaba OSS, Huawei OBS, Tencent COS, and HDFS backends. But the two largest sources of AI model artifacts — Hugging Face and ModelScope — required users to pre-resolve hub URLs into raw HTTPS links, losing authentication context, revision pinning, and repository structure awareness.

Not anymore.

Introducing native model hub protocols in Dragonfly

With two new backends merged into the Dragonfly client, dfget (Dragonfly’s download tool) now natively understands both Hugging Face and ModelScope URLs. No proxies. No URL rewriting. No wrapper scripts.

The hf:// Protocol — Hugging Face hub

Merged via PR #1665, this backend adds first-class support for downloading from the world’s largest open-source model repository.

URL format:

hf://[/]/[/]

Components:

ComponentRequiredDescriptionDefaultrepository_typeNomodels, datasets, or spacesmodelsowner/repositoryYesRepository identifier (e.g., deepseek-ai/DeepSeek-R1)—pathNoFile path within the repoEntire repo

Usage examples:

# Download a single model file with P2P acceleration
dfget hf://deepseek-ai/DeepSeek-R1/model.safetensors \
-O /models/DeepSeek-R1/model.safetensors

# Download an entire repository recursively
dfget hf://deepseek-ai/DeepSeek-R1 \
-O /models/DeepSeek-R1/ -r

# Download a specific dataset
dfget hf://datasets/huggingface/squad/train.json \
-O /data/squad/train.json

# Access private repositories with authentication
dfget hf://owner/private-model/weights.bin \
-O /models/private/weights.bin \
–hf-token=hf_xxxxxxxxxxxxx

# Pin to a specific model version
dfget hf://deepseek-ai/DeepSeek-R1/model.safetensors –hf-revision v2.0 \
-O /models/DeepSeek-R1/model.safetensors

The modelscope:// Protocol — ModelScope hub

Merged via PR #1673, this backend brings the same P2P-accelerated experience to ModelScope Hub — Alibaba’s open model platform hosting thousands of models, with particularly strong coverage of Chinese-origin LLMs and multimodal models.

URL Format:

modelscope://[/]/[/]

Components:

ComponentRequiredDescriptionDefaultrepo_typeNomodels or datasetsmodelsowner/repoYesRepository identifier (e.g., deepseek-ai/DeepSeek-R1)—pathNoFile path within the repoEntire repo

Usage examples

# Download a model repository with P2P acceleration
dfget modelscope://deepseek-ai/DeepSeek-R1 \
-O /models/DeepSeek-R1/ -r

# Download a single file
dfget modelscope://deepseek-ai/DeepSeek-R1/config.json \
-O /models/DeepSeek-R1/config.json

# Download with authentication for private repos
dfget modelscope://deepseek-ai/DeepSeek-R1/config.json \
-O /tmp/config.json –ms-token=

# Download a dataset
dfget modelscope://datasets/damo/squad-zh/train.json \
-O /data/squad-zh/train.json

# Download from a specific revision
dfget modelscope://deepseek-ai/DeepSeek-R1/config.json –ms-revision v2.0 \
-O /models/DeepSeek-R1/config.json

Under the hood: Technical deep dive

Both implementations live in the Dragonfly Rust client as new backend modules. Here’s how they work at the systems level.

1. Pluggable Backend Architecture

Dragonfly uses a pluggable backend system. Each URL scheme (http, s3, gs, hf, modelscope, etc.) maps to a backend that implements the Backend trait:

#[tonic::async_trait]
pub trait Backend {
fn scheme(&self) -> String;
async fn stat(&self, request: StatRequest) -> Result;
async fn get(&self, request: GetRequest) -> Result

Latest post

GitHub – shivampkumar/trellis-mac · GitHub

Google And Gucci Collaborate On Luxury AI-Enhanced Smart Glasses For 2027

Today’s NYT Strands Hints, Answer and Help for April 20 #778

Peer-to-Peer acceleration for AI model distribution with Dragonfly

The NSA is reportedly using Anthropic’s new model Mythos

SmartBear’s Swagger update targets the API drift problem AI coding tools created

OpenAI debuts GPT-Rosalind, a new limited access model for life sciences, and broader Codex plugin on Github

Is your internal platform ready to keep up with AI-accelerated development?

GitHub – shivampkumar/trellis-mac · GitHub

Google And Gucci Collaborate On Luxury AI-Enhanced Smart Glasses For 2027

Today’s NYT Strands Hints, Answer and Help for April 20 #778

The NSA is reportedly using Anthropic’s new model Mythos

GitHub – shivampkumar/trellis-mac · GitHub

Google And Gucci Collaborate On Luxury AI-Enhanced Smart Glasses For 2027

Today’s NYT Strands Hints, Answer and Help for April 20 #778

The NSA is reportedly using Anthropic’s new model Mythos

Latest post

Peer-to-Peer acceleration for AI model distribution with Dragonfly

The problem: AI model distribution is broken at scale

What Is Dragonfly?

Introducing native model hub protocols in Dragonfly

The hf:// Protocol — Hugging Face hub

The modelscope:// Protocol — ModelScope hub

Under the hood: Technical deep dive

1. Pluggable Backend Architecture

Related Posts