Audience: snapshot_download, Transformers, or fine-tune jobs on a remote Mac over long-haul or an approved mirror / private Hub. Deliverables: bottleneck checklist, env table (HF_HUB_ENABLE_HF_TRANSFER, HF_ENDPOINT, caches, Xet concurrent range GETs), decision matrix, resume notes, CI cache keys. Links: homepage, blog index, instant AI model pulling, tiered model cache HowTo, Git/npm CI cache strategy—no login.

Scenarios & bottleneck checklist

Classify time first: metadata RTT to huggingface.co, Xet shard reconstruction, or APFS noise from symlink-heavy caches on shared runners.

  • Import-order env drift: huggingface_hub reads env vars at import time; exporting HF_HUB_CACHE after import huggingface_hub silently does nothing.
  • Wrong disk tier: pointing HF_HUB_CACHE at a network home folder turns every chunk into random SMB latency; NVMe local paths win for Xet parallel writes.
  • Mirror without contract: setting HF_ENDPOINT to a community mirror may violate compliance or break gated repos—obtain written allowance and test HF_TOKEN scopes.
  • Legacy transfer flags: cargo-culting HF_HUB_ENABLE_HF_TRANSFER=1 after the Hub migrated to Xet wastes review cycles; align with HF_XET_HIGH_PERFORMANCE instead when supported.
  • Cache poisoning: reusing one GitHub Actions cache key across branches that pin different revision hashes yields flaky “model works here, fails there” tickets.

Environment variable reference table

Export in the shell before import huggingface_hub (values are read at import). Defaults follow upstream docs.

Variable Role Typical CI value / note
HF_HUB_ENABLE_HF_TRANSFER Legacy fast path via hf_transfer; deprecated in favor of Xet-backed transfers in modern huggingface_hub. Leave unset unless you are pinned to an older stack; if your security team still mandates it, set 1 only after proving Xet is unavailable.
HF_XET_HIGH_PERFORMANCE Raises CPU and network saturation for hf-xet; analogous intent to legacy high-throughput transfer mode. 1 on dedicated M4 runners with spare CPU; keep unset on noisy neighbors to avoid starving Xcode compiles in the same pool.
HF_XET_NUM_CONCURRENT_RANGE_GETS Concurrent byte-range fetches per Xet-backed file (default 16). Try 8 on shared hosts; raise toward 16–24 only when nettop shows headroom and disk queue stays flat.
HF_ENDPOINT Hub API base URL (default https://huggingface.co). Private Hub or approved mirror base, e.g. org-provided host; verify LFS and Xet both honor the override in your SDK version.
HF_HOME Root for token, default hub cache parent, Xet chunk cache, assets. /usr/local/ci/huggingface on fast APFS; avoids cluttering portable home directories.
HF_HUB_CACHE Snapshot and blob store for models, datasets, spaces (default $HF_HOME/hub). $HF_HOME/hub explicitly; never SMB mount.
HF_XET_CACHE Xet chunk storage (default $HF_HOME/xet). Co-locate with HF_HOME on NVMe; large multi-repo pools may set a separate volume with monitoring.
HF_HUB_DISABLE_SYMLINKS Disable symlink tricks in cache (duplicates files). 1 when cache path is on NAS or cross-OS shares; prefer local APFS instead when possible.
HF_HUB_DOWNLOAD_TIMEOUT Per-download HTTP timeout seconds (default 10). 120–300 for cross-border cold pulls; lower in preflight jobs that should fail fast.
HF_HUB_ETAG_TIMEOUT Metadata / ETag probe timeout (default 10). 30–60 when warm caches exist but metadata calls still traverse a slow path.
HF_TOKEN User access token for gated models. Inject via CI secret store; file permission 600 if written to disk; never log.

Concurrent downloads: Xet uses HF_XET_NUM_CONCURRENT_RANGE_GETS; also cap snapshot_download(..., max_workers=4) on shared Macs until telemetry is flat.

# macOS remote agent — source before python -c "import transformers"
export HF_HOME="/usr/local/ci/huggingface"
export HF_HUB_CACHE="$HF_HOME/hub"
export HF_XET_CACHE="$HF_HOME/xet"
export HF_ENDPOINT="https://huggingface.co"
export HF_HUB_DOWNLOAD_TIMEOUT=180
export HF_HUB_ETAG_TIMEOUT=45
# export HF_XET_HIGH_PERFORMANCE=1
# export HF_XET_NUM_CONCURRENT_RANGE_GETS=12

Decision matrix: pick one column at a time

Change endpoint policy or concurrency per experiment—not both at once.

Scenario Endpoint / auth Transfer mode Concurrency starter Cache directory
Public OSS weights, weak cross-border Default HF_ENDPOINT or org-approved mirror; no token Default Xet; optional HF_XET_HIGH_PERFORMANCE=1 off-peak HF_XET_NUM_CONCURRENT_RANGE_GETS=8, max_workers=4 HF_HUB_CACHE on local NVMe
Gated commercial model Default endpoint + HF_TOKEN from vault Avoid unapproved mirrors; keep defaults until stable Conservative: range GETs 8, workers 2–4 Dedicated HF_HOME per tenant if compliance requires isolation
Shared build pool + mixed jobs Same as policy above Skip HF_XET_HIGH_PERFORMANCE Defaults or slightly lower than defaults Single shared HF_HUB_CACHE with LRU pruning job
Legacy pin (pre-Xet stack) Mirror only if legal approves HF_HUB_ENABLE_HF_TRANSFER=1 only when verified compatible Let hf_transfer internal chunking run; watch CPU Local SSD; monitor partial files manually

Resume behavior & CI cache key design

Resume: Clients reuse partial blobs under HF_HUB_CACHE when revision matches; wiping cache or swapping HF_ENDPOINT resets progress. Pin revision across retries.

CI cache (e.g. GitHub Actions): tarball $HF_HUB_CACHE after warm jobs; key parts:

  • Lockfile or manifest hash—e.g. hash of a checked-in models.lock.json listing repo_id@revision pairs.
  • Runner OS slicemacos-14-arm64 or your pool id so ARM caches are not mixed with x86.
  • Hub endpoint fingerprint—short SHA of the exact HF_ENDPOINT string to avoid cross-mirror collisions.

Weaker fallback: hf-hub-${{ hashFiles('**/requirements.txt', '**/pyproject.toml') }} only if those files gate weights. See CI cache strategy for sizing.

FAQ: failure retries & timeout thresholds

When should I raise HF_HUB_DOWNLOAD_TIMEOUT versus HF_HUB_ETAG_TIMEOUT on a remote Mac runner?

Raise HF_HUB_DOWNLOAD_TIMEOUT first when large shard or LFS-style downloads stall mid-stream; values such as 120 to 300 seconds are common on congested cross-border paths. Raise HF_HUB_ETAG_TIMEOUT when metadata probes to huggingface.co time out but blobs are already local—start near 30 to 60 seconds so cold jobs still resolve revisions, while warm cache hits stay fast.

Is HF_HUB_ENABLE_HF_TRANSFER still the right switch in 2026?

Official huggingface_hub documentation marks HF_HUB_ENABLE_HF_TRANSFER as deprecated because Hub transfers increasingly use the hf-xet stack. Treat it as a legacy compatibility knob only; prefer HF_XET_HIGH_PERFORMANCE=1 plus tuned HF_XET_NUM_CONCURRENT_RANGE_GETS when hf-xet is installed, and fall back to default hub downloads if your org forbids saturating CPU or disk.

How do I design a CI cache key so branches do not poison each other's Hugging Face snapshots?

Never key only on runner OS: include a content hash of revision selectors such as a pinned commit SHA in your model card workflow, the digest of a manifest file listing repo@revision pairs, or the hash of poetry.lock or requirements.txt when those files gate which weights you fetch. Pair that with a stable HF_HUB_CACHE path on NVMe on the runner plus optional Actions cache restore keyed the same way.

What should I do when HF_HUB_CACHE lives on a network volume or mixed OS clients?

Set HF_HUB_DISABLE_SYMLINKS=1 to avoid broken symlink semantics across SMB or Linux to macOS mounts at the cost of duplicated large files. Better: keep HF_HUB_CACHE on local APFS NVMe for each remote Mac agent and export only metrics, not the cache directory, to observability stacks.

Summary

Put HF_HOME / HF_HUB_CACHE on NVMe; use HF_ENDPOINT only with approved mirrors; prefer Xet over legacy HF_HUB_ENABLE_HF_TRANSFER; raise HF_HUB_DOWNLOAD_TIMEOUT before blaming the network. Manifest-based cache keys prevent branch poisoning. More ops detail: tiered model cache, instant model pulling.

Rent remote Mac capacity for warm HF_HUB_CACHE and stable egress. Next: homepage, pricing, purchase, help, blog.

Treat Hugging Face pulls like any other CI artifact: pin revisions, measure metadata versus payload time separately, and spend parallelism budget where telemetry proves it helps.

Remote Mac for HF Hub & ML CI

Dedicated M-series hosts with fast local cache paths—useful when large weights and Xcode jobs share the same pool. Browse the homepage, purchase options, help articles, or blog without signing in.