2026 Remote Mac CI: Hugging Face Hub Weights—HF Transfer, HF_ENDPOINT & Cache Matrix

Audience: snapshot_download, Transformers, or fine-tune jobs on a remote Mac over long-haul or an approved mirror / private Hub. Deliverables: bottleneck checklist, env table (HF_HUB_ENABLE_HF_TRANSFER, HF_ENDPOINT, caches, Xet concurrent range GETs), decision matrix, resume notes, CI cache keys. Links: homepage, blog index, instant AI model pulling, tiered model cache HowTo, Git/npm CI cache strategy—no login.

Scenarios & bottleneck checklist

Classify time first: metadata RTT to huggingface.co, Xet shard reconstruction, or APFS noise from symlink-heavy caches on shared runners.

Import-order env drift: huggingface_hub reads env vars at import time; exporting HF_HUB_CACHE after import huggingface_hub silently does nothing.
Wrong disk tier: pointing HF_HUB_CACHE at a network home folder turns every chunk into random SMB latency; NVMe local paths win for Xet parallel writes.
Mirror without contract: setting HF_ENDPOINT to a community mirror may violate compliance or break gated repos—obtain written allowance and test HF_TOKEN scopes.
Legacy transfer flags: cargo-culting HF_HUB_ENABLE_HF_TRANSFER=1 after the Hub migrated to Xet wastes review cycles; align with HF_XET_HIGH_PERFORMANCE instead when supported.
Cache poisoning: reusing one GitHub Actions cache key across branches that pin different revision hashes yields flaky “model works here, fails there” tickets.

Environment variable reference table

Export in the shell before import huggingface_hub (values are read at import). Defaults follow upstream docs.

Variable	Role	Typical CI value / note
`HF_HUB_ENABLE_HF_TRANSFER`	Legacy fast path via `hf_transfer`; deprecated in favor of Xet-backed transfers in modern `huggingface_hub`.	Leave unset unless you are pinned to an older stack; if your security team still mandates it, set `1` only after proving Xet is unavailable.
`HF_XET_HIGH_PERFORMANCE`	Raises CPU and network saturation for `hf-xet`; analogous intent to legacy high-throughput transfer mode.	`1` on dedicated M4 runners with spare CPU; keep unset on noisy neighbors to avoid starving Xcode compiles in the same pool.
`HF_XET_NUM_CONCURRENT_RANGE_GETS`	Concurrent byte-range fetches per Xet-backed file (default 16).	Try 8 on shared hosts; raise toward 16–24 only when `nettop` shows headroom and disk queue stays flat.
`HF_ENDPOINT`	Hub API base URL (default `https://huggingface.co`).	Private Hub or approved mirror base, e.g. org-provided host; verify LFS and Xet both honor the override in your SDK version.
`HF_HOME`	Root for token, default hub cache parent, Xet chunk cache, assets.	`/usr/local/ci/huggingface` on fast APFS; avoids cluttering portable home directories.
`HF_HUB_CACHE`	Snapshot and blob store for models, datasets, spaces (default `$HF_HOME/hub`).	`$HF_HOME/hub` explicitly; never SMB mount.
`HF_XET_CACHE`	Xet chunk storage (default `$HF_HOME/xet`).	Co-locate with `HF_HOME` on NVMe; large multi-repo pools may set a separate volume with monitoring.
`HF_HUB_DISABLE_SYMLINKS`	Disable symlink tricks in cache (duplicates files).	`1` when cache path is on NAS or cross-OS shares; prefer local APFS instead when possible.
`HF_HUB_DOWNLOAD_TIMEOUT`	Per-download HTTP timeout seconds (default 10).	120–300 for cross-border cold pulls; lower in preflight jobs that should fail fast.
`HF_HUB_ETAG_TIMEOUT`	Metadata / ETag probe timeout (default 10).	30–60 when warm caches exist but metadata calls still traverse a slow path.
`HF_TOKEN`	User access token for gated models.	Inject via CI secret store; file permission `600` if written to disk; never log.

Concurrent downloads: Xet uses HF_XET_NUM_CONCURRENT_RANGE_GETS; also cap snapshot_download(..., max_workers=4) on shared Macs until telemetry is flat.

# macOS remote agent — source before python -c "import transformers"
export HF_HOME="/usr/local/ci/huggingface"
export HF_HUB_CACHE="$HF_HOME/hub"
export HF_XET_CACHE="$HF_HOME/xet"
export HF_ENDPOINT="https://huggingface.co"
export HF_HUB_DOWNLOAD_TIMEOUT=180
export HF_HUB_ETAG_TIMEOUT=45
# export HF_XET_HIGH_PERFORMANCE=1
# export HF_XET_NUM_CONCURRENT_RANGE_GETS=12

Decision matrix: pick one column at a time

Change endpoint policy or concurrency per experiment—not both at once.

Scenario	Endpoint / auth	Transfer mode	Concurrency starter	Cache directory
Public OSS weights, weak cross-border	Default `HF_ENDPOINT` or org-approved mirror; no token	Default Xet; optional `HF_XET_HIGH_PERFORMANCE=1` off-peak	`HF_XET_NUM_CONCURRENT_RANGE_GETS=8`, `max_workers=4`	`HF_HUB_CACHE` on local NVMe
Gated commercial model	Default endpoint + `HF_TOKEN` from vault	Avoid unapproved mirrors; keep defaults until stable	Conservative: range GETs 8, workers 2–4	Dedicated `HF_HOME` per tenant if compliance requires isolation
Shared build pool + mixed jobs	Same as policy above	Skip `HF_XET_HIGH_PERFORMANCE`	Defaults or slightly lower than defaults	Single shared `HF_HUB_CACHE` with LRU pruning job
Legacy pin (pre-Xet stack)	Mirror only if legal approves	`HF_HUB_ENABLE_HF_TRANSFER=1` only when verified compatible	Let hf_transfer internal chunking run; watch CPU	Local SSD; monitor partial files manually

Resume behavior & CI cache key design

Resume: Clients reuse partial blobs under HF_HUB_CACHE when revision matches; wiping cache or swapping HF_ENDPOINT resets progress. Pin revision across retries.

CI cache (e.g. GitHub Actions): tarball $HF_HUB_CACHE after warm jobs; key parts:

Lockfile or manifest hash—e.g. hash of a checked-in models.lock.json listing repo_id@revision pairs.
Runner OS slice—macos-14-arm64 or your pool id so ARM caches are not mixed with x86.
Hub endpoint fingerprint—short SHA of the exact HF_ENDPOINT string to avoid cross-mirror collisions.

Weaker fallback: hf-hub-${{ hashFiles('**/requirements.txt', '**/pyproject.toml') }} only if those files gate weights. See CI cache strategy for sizing.

FAQ: failure retries & timeout thresholds

When should I raise HF_HUB_DOWNLOAD_TIMEOUT versus HF_HUB_ETAG_TIMEOUT on a remote Mac runner?

Raise HF_HUB_DOWNLOAD_TIMEOUT first when large shard or LFS-style downloads stall mid-stream; values such as 120 to 300 seconds are common on congested cross-border paths. Raise HF_HUB_ETAG_TIMEOUT when metadata probes to huggingface.co time out but blobs are already local—start near 30 to 60 seconds so cold jobs still resolve revisions, while warm cache hits stay fast.

Is HF_HUB_ENABLE_HF_TRANSFER still the right switch in 2026?

Official huggingface_hub documentation marks HF_HUB_ENABLE_HF_TRANSFER as deprecated because Hub transfers increasingly use the hf-xet stack. Treat it as a legacy compatibility knob only; prefer HF_XET_HIGH_PERFORMANCE=1 plus tuned HF_XET_NUM_CONCURRENT_RANGE_GETS when hf-xet is installed, and fall back to default hub downloads if your org forbids saturating CPU or disk.

How do I design a CI cache key so branches do not poison each other's Hugging Face snapshots?

Never key only on runner OS: include a content hash of revision selectors such as a pinned commit SHA in your model card workflow, the digest of a manifest file listing repo@revision pairs, or the hash of poetry.lock or requirements.txt when those files gate which weights you fetch. Pair that with a stable HF_HUB_CACHE path on NVMe on the runner plus optional Actions cache restore keyed the same way.

What should I do when HF_HUB_CACHE lives on a network volume or mixed OS clients?

Set HF_HUB_DISABLE_SYMLINKS=1 to avoid broken symlink semantics across SMB or Linux to macOS mounts at the cost of duplicated large files. Better: keep HF_HUB_CACHE on local APFS NVMe for each remote Mac agent and export only metrics, not the cache directory, to observability stacks.

Summary

Put HF_HOME / HF_HUB_CACHE on NVMe; use HF_ENDPOINT only with approved mirrors; prefer Xet over legacy HF_HUB_ENABLE_HF_TRANSFER; raise HF_HUB_DOWNLOAD_TIMEOUT before blaming the network. Manifest-based cache keys prevent branch poisoning. More ops detail: tiered model cache, instant model pulling.

Rent remote Mac capacity for warm HF_HUB_CACHE and stable egress. Next: homepage, pricing, purchase, help, blog.

Treat Hugging Face pulls like any other CI artifact: pin revisions, measure metadata versus payload time separately, and spend parallelism budget where telemetry proves it helps.

Remote Mac for HF Hub & ML CI

Dedicated M-series hosts with fast local cache paths—useful when large weights and Xcode jobs share the same pool. Browse the homepage, purchase options, help articles, or blog without signing in.

Homepage Plans & purchase Help center Blog

2026 Remote Mac CI: Hugging Face Hub Weights—Transfer, Endpoint & Cache Matrix