Hugpy — Inference You Own
A self-hosted LLM console. Pull models from the Hugging Face Hub, chat with them, mint API keys yourself, and pool GPUs across machines — all on your hardware.
OpenAI-compatible API on your own box
Serve /v1/chat/completions and /v1/models on your own box. Point any OpenAI SDK at it. Mint and revoke API keys from the console — or run it open.
GPU worker fleet
Models too big for any single GPU split across the fleet via llama.cpp RPC — a deterministic allocator picks the placement, and hugpy runs the lead.
Phone & edge workers
Add phones as workers, not just GPUs. The phone-brick pool runs an ONNX-YOLO vision worker on each handset and resolves a live consensus verdict across the fleet.
Read the docs · Open the console