Skip to Content
GPU Guide

GPU Guide

NeoRun supports NVIDIA GPU passthrough for ML inference, image generation, fine-tuning, and more.

Free Tier GPU Limits

LimitValue
Max VRAM8 GB
GPU jobs per day3
Max runtime per session2 hours
Idle auto-stop30 minutes

Enabling GPU

  1. In the deployment wizard, toggle the GPU switch
  2. Select your VRAM requirement: 8 GB, 16 GB, or 24 GB
  3. Deploy — NeoRun will schedule your job on a GPU-equipped worker

Supported GPU Types

NeoRun workers are equipped with NVIDIA GPUs. GPU detection uses pynvml with an nvidia-smi fallback:

GPUVRAMCompute Capability
RTX 306012 GB8.6
RTX 309024 GB8.6
RTX 409024 GB8.9
A10040/80 GB8.0
H10080 GB9.0

GPU + Docker

Containers get NVIDIA GPU access via the NVIDIA Container Toolkit . NeoRun automatically:

  • Adds --gpus all device requests to the container
  • Sets NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES=compute,utility
  • Allocates larger shared memory (--shm-size=2g) for PyTorch DataLoaders
  • Scales CPU/memory limits for GPU workloads

Container Isolation

GPU containers use the default runc runtime because gVisor does not support GPU passthrough. To compensate, NeoRun applies a seccomp profile that blocks dangerous syscalls (module loading, mount operations, namespace creation, ptrace).

Example GPU Projects

These templates are pre-configured for GPU use:

  • Stable Diffusion WebUI — AUTOMATIC1111 image generation
  • Open WebUI + Ollama — Local LLM chat interface
  • ComfyUI — Node-based diffusion workflows
  • FastAPI + PyTorch — ML model serving API
  • Jupyter + CUDA — GPU-accelerated notebooks

Find them in the Template Gallery  under the GPU category.

Idle Detection

GPU pods are expensive. NeoRun monitors network I/O and automatically stops idle pods:

  1. Network tracking: Measures incoming/outgoing bytes per 60-second interval
  2. Idle threshold: Less than 1 KB of traffic (filters out DNS/healthchecks)
  3. Warning: After 25 minutes idle, a notification is sent
  4. Auto-stop: At 30 minutes idle, desired_state is set to stopped
  5. Max runtime: Hard cutoff at 2 hours (free tier)

To prevent auto-stop, keep your pod actively serving requests.

Troubleshooting GPU

”No GPU available”

  • Check that the worker machine has NVIDIA drivers installed
  • Verify nvidia-smi returns GPU information
  • Ensure nvidia-container-toolkit is installed

”CUDA out of memory”

  • Select a higher VRAM tier in the deployment wizard
  • Reduce batch size or model precision in your code
  • Use torch.cuda.empty_cache() to free unused GPU memory

GPU container starts but model doesn’t load

  • Ensure your requirements.txt includes CUDA-compatible PyTorch:
    torch --index-url https://download.pytorch.org/whl/cu121
  • Check that the base image has CUDA runtime (NeoRun uses nvidia/cuda:12.1.0-runtime-ubuntu22.04 for GPU builds)
Last updated on