davefriedman.substack.com
Hugging Face and the Illusion of Infrastructure - Dave Friedman
Excerpt
At first glance, Hugging Face looks like a treasure trove: tens of thousands of language models, sprawling across every use case imaginable. But this abundance is an illusion. Just as the crypto world is littered with thousands of coins no one trades, Hugging Face is bloated with models no one uses. The power law is extreme: a few models like Llama, Mistral, and GPT-2 clones account for nearly all meaningful usage, while the rest serve as digital detritus: dead forks, vanity fine-tunes, or models that never worked in the first place. One Llama variant with elevent downloads sits next to Mistral with millions. The UX treats them as equals. That’s not openness; it’s entropy. So what, then, is the value? Why does Hugging Face exist? The answer is deceptively simple: optionality. Hosting every model under the sun makes Hugging Face the default namespace for open AI. If a model exists, chances are it lives on Hugging Face. That optionality is not worthless. It creates a surface area for innovation, remixing, and serendipity. But it is far from a business model. Right now, Hugging Face is burn-heavy. It's a high-traffic, low-monetization platform subsidized by venture capital and driven by developer goodwill. Free users consume bandwidth and GPU cycles without paying for them. Enterprises poke around but are slow to commit. Like GitHub in its early days, or Reddit for most of its history, Hugging Face sits atop an ocean of usage with very little monetized throughput. The problem isn't traffic; it's capture. … To get there, Hugging Face needs to pivot from being a library to being a runtime. Right now, most developers treat it like an archive: a place to browse models, download weights, and tinker. But a runtime mindset means building, deploying, and *serving* production-grade AI applications directly from within the Hugging Face ecosystem. It means offering guarantees: uptime, latency, performance, cost predictability. It means turning usage into throughput, not just traffic. This pivot is structurally difficult. Hugging Face lacks proprietary IP. It has not trained any foundation model of note since BLOOM, and that effort was more symbolic than strategic. Without a vertically integrated model stack, Hugging Face depends entirely on others for core capability. This makes them fragile: if Meta, Mistral, or OpenRouter decide to host their own endpoints or build better APIs, Hugging Face becomes a middleman who can be disintermediated at any time. It gets worse. The cloud hyperscalers are circling. AWS, Azure, and GCP all offer their own LLM platforms, increasingly bundled with model registries, inference endpoints, fine-tuning workflows, and enterprise governance layers. Hugging Face may partner with these providers today, but in the long run it risks being swallowed by them. If you're a Fortune 500 CIO already embedded in AWS, why would you trust your LLM stack to a thin layer of Python wrappers? Then there is the branding paradox. Hugging Face is beloved by the open-source community precisely because it is open, chaotic, and free. But enterprise buyers don't want chaos. They want SLAs, audit logs, reproducibility, and compliance. They want control planes, not playgrounds. The GitHub comparison breaks down here. GitHub succeeded not just by hosting code but by embedding itself into CI/CD pipelines, IDEs, and permission hierarchies. Hugging Face hasn’t crossed that Rubicon. To make the leap, Hugging Face needs to own more of the development loop. Today, developers build elsewhere and come to Hugging Face to publish. Tomorrow, Hugging Face must become the place where you train, tune, deploy, and monitor your models end-to-end. That means native agent frameworks, first-class support for RAG architectures, and a deeply integrated CI/CD pipeline for model workflows. It means real-time evaluation tooling, live inference dashboards, and version-controlled APIs for app deployment. … Because remaining a platform of zombie models and free-tier usage is a slow death. The only path forward is to operationalize. Hugging Face has to become the operating layer for enterprise AI. Not the storage layer. Not the archive. The runtime. This transition is existential. If it fails, Hugging Face will go the way of SourceForge: a once-beloved host of open artifacts, slowly abandoned as serious users migrate to better-integrated, professionally managed alternatives. If it succeeds, it becomes the Docker, the Stripe, or the GitHub of AI. But only if it earns it.
Related Pain Points
Lack of integrated end-to-end development environment
8Hugging Face functions primarily as an archive/storage layer rather than a runtime; developers must build models elsewhere and only publish on Hugging Face, lacking native support for training, deployment, monitoring, CI/CD pipelines, and RAG architectures in a unified platform.
No quality guarantee for community-contributed models
7Models on Hugging Face Hub are community-contributed without formal vetting, leading to inconsistent quality, bugs, biases, and security issues. Models that work for research may not be suitable for production business use.
Limited enterprise features and SLA guarantees without paid plan
7Hugging Face lacks enterprise-grade features, SLAs, audit logs, reproducibility guarantees, and compliance controls that enterprise customers require, forcing paid upgrades.
Growing ecosystem competition fragmenting developer attention
5Hugging Face faces intensifying competition from specialized tools and platforms across the AI stack, including OpenXLA, PyTorch, LangChain, Ray, AWS Bedrock, Vertex AI, CivitAI, and Replicate. Developers increasingly choose focused tools better integrated with enterprise systems over Hugging Face's general-purpose platform.