About

We think tutorials are broken.

DevLeep exists because most DevOps training teaches you commands but never puts you in a situation where something is actually broken and the clock is ticking.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

The Problem

You can finish a 40-hour Kubernetes course and still freeze the first time a production pod enters a crash loop. That is not your fault — it is the fault of how the training was designed.

Sandboxed environments give you a clean, predictable system. Production gives you a system that has been running for months, touched by a dozen people, and is failing in a way the documentation never described.

The gap between "I know the commands" and "I can handle an incident" is experience. DevLeep is how you get that experience without waiting for it to happen to you at 2 AM.

How It Works

Your infrastructure

Labs run in your own AWS account. We provision a real EC2 instance, real VPC, real disk. When you fix the problem, you are fixing it on actual infrastructure — not a simulator.

Real incident scenarios

Every lab is based on a class of production incident: disk exhaustion, misconfigured nginx, OOM kills, failed deployments, certificate expiry. Each one has a root cause and a correct fix.

Automated validation

When you think you have fixed the incident, you run the validation suite. It executes real checks against your environment — no self-reporting, no honour system.

Open lab definitions

The lab definitions — YAML files that describe the scenario, objectives, hints, and validation rules — are open source. Anyone can read them, improve them, or contribute new ones.

What We Cover

Three tracks, each progressing from orientation through intermediate diagnostics to advanced failure modes.

Linux Core

File systems, processes, networking, permissions, systemd, cron, disk management

Containers

Docker internals, failed builds, network isolation, volume mounts, compose failures

Kubernetes

Pod failures, scheduler issues, RBAC, ingress, persistent volumes, cluster degradation

Open Source

The lab definition format is public. Every scenario, every validation check, every hint is a YAML file in a public repository. We believe incident training improves when the community can inspect and improve the material.

If you have been through a real outage and want to turn it into a lab, the contributing guide explains the format in full.

Contact