GPU Guide
NeoRun supports NVIDIA GPU passthrough for ML inference, image generation, fine-tuning, and more.
Free Tier GPU Limits
| Limit | Value |
|---|---|
| Max VRAM | 8 GB |
| GPU jobs per day | 3 |
| Max runtime per session | 2 hours |
| Idle auto-stop | 30 minutes |
Enabling GPU
- In the deployment wizard, toggle the GPU switch
- Select your VRAM requirement: 8 GB, 16 GB, or 24 GB
- Deploy — NeoRun will schedule your job on a GPU-equipped worker
Supported GPU Types
NeoRun workers are equipped with NVIDIA GPUs. GPU detection uses pynvml with an nvidia-smi fallback:
| GPU | VRAM | Compute Capability |
|---|---|---|
| RTX 3060 | 12 GB | 8.6 |
| RTX 3090 | 24 GB | 8.6 |
| RTX 4090 | 24 GB | 8.9 |
| A100 | 40/80 GB | 8.0 |
| H100 | 80 GB | 9.0 |
GPU + Docker
Containers get NVIDIA GPU access via the NVIDIA Container Toolkit . NeoRun automatically:
- Adds
--gpus alldevice requests to the container - Sets
NVIDIA_VISIBLE_DEVICES=allandNVIDIA_DRIVER_CAPABILITIES=compute,utility - Allocates larger shared memory (
--shm-size=2g) for PyTorch DataLoaders - Scales CPU/memory limits for GPU workloads
Container Isolation
GPU containers use the default runc runtime because gVisor does not support GPU passthrough.
To compensate, NeoRun applies a seccomp profile that blocks dangerous syscalls
(module loading, mount operations, namespace creation, ptrace).
Example GPU Projects
These templates are pre-configured for GPU use:
- Stable Diffusion WebUI — AUTOMATIC1111 image generation
- Open WebUI + Ollama — Local LLM chat interface
- ComfyUI — Node-based diffusion workflows
- FastAPI + PyTorch — ML model serving API
- Jupyter + CUDA — GPU-accelerated notebooks
Find them in the Template Gallery under the GPU category.
Idle Detection
GPU pods are expensive. NeoRun monitors network I/O and automatically stops idle pods:
- Network tracking: Measures incoming/outgoing bytes per 60-second interval
- Idle threshold: Less than 1 KB of traffic (filters out DNS/healthchecks)
- Warning: After 25 minutes idle, a notification is sent
- Auto-stop: At 30 minutes idle,
desired_stateis set tostopped - Max runtime: Hard cutoff at 2 hours (free tier)
To prevent auto-stop, keep your pod actively serving requests.
Troubleshooting GPU
”No GPU available”
- Check that the worker machine has NVIDIA drivers installed
- Verify
nvidia-smireturns GPU information - Ensure
nvidia-container-toolkitis installed
”CUDA out of memory”
- Select a higher VRAM tier in the deployment wizard
- Reduce batch size or model precision in your code
- Use
torch.cuda.empty_cache()to free unused GPU memory
GPU container starts but model doesn’t load
- Ensure your
requirements.txtincludes CUDA-compatible PyTorch:torch --index-url https://download.pytorch.org/whl/cu121 - Check that the base image has CUDA runtime (NeoRun uses
nvidia/cuda:12.1.0-runtime-ubuntu22.04for GPU builds)