Community

Open Source Lab Definitions

Every lab scenario on DevLeep is a YAML file in a public repository. The community writes, reviews, and improves them. If you have been through a real incident, you can turn it into a lab that helps thousands of engineers who haven't.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

github.com/devleep/labs

YAML lab definitions · MIT Licence · PRs welcome

Ways to Contribute

HIGH VALUE

Submit a new lab based on a real incident

Turn an outage you have been through into a lab. The most effective scenarios come from real production failures — not synthetic exercises. Write the YAML, test the validation checks, include all three hint levels.

GOOD FIRST ISSUE

Improve existing hint quality

Many labs have sparse Level 2 and Level 3 hints. If you solved a lab the hard way and found the hints were not helpful, you are exactly the right person to improve them. Open the YAML, make it clearer, submit a PR.

ONGOING

Review open pull requests

Every new lab needs a technical review. Did you read the YAML, provision the lab, and verify the validation checks actually catch what they should? A confirmed test result in the PR is worth more than a comment.

DISCUSSION

Propose new scenario ideas

If you have an incident pattern that would make a good lab but don't have time to implement it yet, open an issue describing the scenario. Include: what broke, what the operator would need to do, and what a validation check might look like.

How to Create a Lab

The complete field reference is in the documentation. These are the steps:

1.Fork github.com/devleep/labs and clone your fork
2.Add your lab to labdefinations.yaml following the YAML schema
3.Run the seed script locally to verify the YAML parses correctly
4.Provision the target module via terraform apply and reproduce the broken scenario
5.Apply the fix and confirm every validation check passes — screenshot the output
6.Write Level 1 (directional), Level 2 (commands), and Level 3 (walkthrough) hints
7.Open a pull request with: the YAML changes, a one-paragraph description of the real incident, and your validation screenshot

What Makes a Good Lab

Based on a real failure pattern

Not a contrived exercise. There should be a class of production incidents that this lab teaches the operator to recognise and resolve.

Validation tests the outcome

Checks should verify the fix worked — service is running, config is correct, resource is available — not just that a file exists.

All three hint levels written

Level 1 directional, Level 2 specific, Level 3 full walkthrough. Someone completely stuck at 2 AM should be able to get unstuck.

Scenario is realistic

The broken state should look like something that could realistically end up in production — misconfiguration, version drift, resource exhaustion.

Objectives are specific

List what the operator will learn, not what they will do. 'Diagnose and repair a misconfigured systemd unit' beats 'Fix the service'.

Validation is idempotent

Running the validation twice should give the same result. Checks should not depend on timing or side effects from a previous check.

Track Roadmap

These are the tracks planned or in progress. Labs within each track are prioritised by how commonly the failure pattern appears in production.

Linux CoreACTIVE
· Server orientation· Process management· Disk exhaustion· Permission errors· Systemd unit failures· Cron debugging· Log investigation
ContainersPLANNED
· Docker build failures· Container networking isolation· Volume mount issues· Compose startup order· Registry auth failures· Resource limits
KubernetesPLANNED
· Pod crash loops· ImagePullBackOff· RBAC permission denied· Ingress misconfiguration· PVC binding failures· Node pressure eviction· Network policy blocks

If you want to accelerate a particular area, open an issue or submit a lab.

Scenarios vs Labs

A scenario is the broken environment — an EC2 instance set up in a specific failed state. A lab is the task within that scenario.

One scenario can host multiple labs at different difficulty levels. For example, a scenario_id: nginx-config-error might have a beginner lab that asks the operator to find and fix a syntax error, and an intermediate lab that adds a failing upstream and missing SSL cert to the same environment.

scenario_id in the lab YAML links the lab to a scenario. The scenario defines the Terraform userdata that breaks the environment. The lab defines what the operator needs to do to pass.

Get in Touch