Why Your Homelab Makes You a Better Cloud Architect

Published on 25 February 2026 | Homelab / Career | 10 min read

The Question That Started This

I’ve been interviewing infrastructure engineers on and off for the best part of a decade. Mostly Azure and AWS roles – cloud architects, platform engineers, DevOps leads. The kind of people who should know how the internet works, because it’s literally their job to build on top of it.

And I’ve noticed a shift. It’s been gradual, but it’s undeniable.

Fewer candidates can walk me through what happens between typing a URL into a browser and seeing a page load. Not at a deep, packet-level detail. Just the basics. DNS resolution. TCP handshake. TLS negotiation. The request hitting a web server. The response coming back.

This used to be a warm-up question. Now it trips people up.

I want to be clear – I’m not having a go at anyone specific. These are smart, capable engineers who can deploy complex infrastructure with Terraform, build CI/CD pipelines in their sleep, and architect solutions across multiple cloud regions. They’re good at their jobs.

But somewhere along the way, the foundations got skipped. And I think that matters more than the industry wants to admit.

How Did We Get Here?

Here’s the thing – this wasn’t an accident. The industry pushed hard toward abstraction, and for good reason. Abstraction is how you scale. It’s how you let businesses focus on their product instead of managing servers. It’s how a team of five can run infrastructure that would have needed fifty people twenty years ago.

I’ve spent most of my career on the cloud consulting side of this. I’ve sat in rooms and told CTOs, honestly and correctly, that they should move to Azure. That managed services would save them money, reduce risk, and let their teams focus on what actually matters. And in the vast majority of cases, I was right. Cloud is brilliant for what it does.

But the messaging that came with it created a culture. “You don’t need to know how it works underneath.” “Just use the managed service.” “Why would you run your own database when RDS exists?”

And then DevOps tooling layered on top of that. Terraform means you don’t need to understand what you’re provisioning – you just describe the desired state and let the provider figure it out. Kubernetes abstracts away the machines entirely. Serverless abstracts away the containers. Each layer removes another reason to understand what’s below it.

Again – these are good tools. I use them. I recommend them. But when you combine “you don’t need to know” with tools that mean you genuinely never have to learn, you end up with engineers who can operate the abstraction but couldn’t rebuild it from scratch.

And that’s a problem, because the abstraction is someone else’s product. When you don’t understand what’s underneath, you can’t evaluate it. You can’t troubleshoot it properly. And you certainly can’t replace it.

What a Homelab Actually Teaches You

I run what’s probably an unreasonable amount of infrastructure at home. Five Beelink mini PCs, eight-odd Raspberry Pis, thirty-plus services across containers and VMs. It grew organically from a single Pi running Pi-hole to something that occasionally makes my electricity bill uncomfortable.

But here’s what that journey taught me – and keeps teaching me – that cloud certifications and managed services never could.

DNS and Networking

When you run services at home, DNS isn’t something that “just works.” You have to understand it, because you’re the one making it work. Split-horizon DNS, internal resolution, wildcard records, reverse proxy configuration – none of it happens automatically.

I’ve configured more DNS records at home than I have in most client projects. I run split-horizon DNS across my network so that internal services resolve to local IPs without traffic ever leaving the house – the same pattern Azure Private DNS Zones use, just without the managed service doing it for you. And every time something breaks – because it will break – I have to actually understand what went wrong. There’s no “have you tried recreating the Private DNS Zone?” There’s just me, dig, nslookup, and whatever I misconfigured at midnight.

That understanding carries over directly. I once diagnosed a client’s DNS resolution issue with Azure Private Endpoints in about ten minutes because I’d spent an entire weekend debugging the exact same split-horizon problem at home. The symptoms were identical – services resolving to public IPs instead of private ones, traffic hairpinning through the internet when it should have stayed internal. When you’ve felt that pain personally, you recognise it immediately.

When I’m designing Azure landing zones with Private DNS and hub-spoke networking, I’m not just following a reference architecture. I actually know what the DNS resolution flow looks like, because I’ve built equivalent setups from scratch.

Storage and Backups

At home, there’s no Azure Backup. No geo-redundant storage with automatic failover. There’s just drives in machines, and the knowledge that if something fails, nobody is going to recover it for you.

That changes how you think about data. You start asking questions that managed services let you skip. How many copies? Where are they? What’s the recovery time if this drive dies at 3am? What if the machine the backups are on also dies?

I use a mix of rsync, rclone, and some manual processes that I’m not entirely proud of. But the point is – I’ve had to think about it properly. Retention policies, offsite copies, testing restores. That’s the same thinking I apply to client backup strategies, except at home there’s no SLA promising someone else will sort it out.

Security Hardening

There’s no NSG wizard in a homelab. No Azure Firewall Premium with managed rule sets. It’s UFW, iptables, fail2ban, and actually understanding why you’re opening port 443 and not just doing it because a tutorial said so.

Running CrowdSec across my home network taught me more about real-world threat intelligence than any security certification. I’ve got it deployed alongside a Wazuh SIEM with 16 agents across every host – mini PCs, Pis, the lot. I can see the actual attack patterns – the SSH brute force attempts, the automated scans hitting every port, the occasional something that makes me genuinely uncomfortable. That’s not theoretical. That’s happening to my network, and I’m the one dealing with it.

I also use Ansible to push identical security baselines across all 13 hosts in a single command. Firewall rules, SSH hardening, log forwarding – one playbook, consistent everywhere. That’s the same principle as Azure Policy or AWS Config, except I had to build the enforcement myself instead of toggling a compliance setting.

When I design security for clients now, I’m not just ticking compliance boxes. I’ve felt what it’s like to look at logs and see someone trying to get in. That context matters.

Monitoring and Observability

Setting up Prometheus, Grafana, and Uptime Kuma from scratch is a very different experience from clicking “Enable monitoring” in the Azure portal.

When you build your own monitoring stack, you learn what metrics actually matter. You learn how to write PromQL queries that tell you something useful instead of generating dashboard noise. You learn the difference between monitoring for the sake of it and monitoring because you genuinely need to know when something is broken.

I’ve got Prometheus scraping 13 hosts at home, with Grafana dashboards that are, frankly, overkill for a home network. But building them meant understanding cardinality, retention, scrape intervals, and alerting thresholds at a level that’s directly applicable to production monitoring design. The number of times I’ve seen organisations with Datadog bills the size of a small country but no useful alerting would depress you.

Debugging Without a Support Ticket

This is probably the biggest one. When something breaks at home, there’s no support ticket to raise. There’s no P1 bridge call where someone from Microsoft walks you through a resolution. It’s just you, the logs, and whatever documentation the open-source project has – which varies, shall we say, considerably.

You learn to read logs properly. You learn to narrow things down systematically instead of shotgunning restarts and hoping for the best. You learn to check the obvious things first, because there’s nobody to escalate to when you’ve spent two hours on something that turned out to be a typo in a YAML file.

That debugging muscle is the single most transferable skill from self-hosting to professional infrastructure work. I’ve lost count of the number of times I’ve been in an incident call and been the person who found the problem – not because I’m smarter, but because I’ve got thousands of hours of practice working through issues where giving up isn’t an option. Last year a client had intermittent container failures that their team had been chasing for days. I recognised the symptoms immediately – it was a Docker storage driver issue I’d hit on one of my Pis when the SD card was dying. Different scale, same root cause, same fix.

The Dependency Question

Here’s where this goes beyond hobby and skill development. There’s a professional risk angle that I think doesn’t get enough airtime.

I ask this question in a lot of consulting engagements now: “If you had to leave your cloud provider, could you?”

It’s not a gotcha. I’m not trying to convince anyone to go on-prem. It’s a genuine risk assessment question, and the honest answer from most organisations is some variation of “probably not” or “it would take years.”

That’s partly a technology problem – lock-in through proprietary services, data gravity, architectural coupling. But increasingly, it’s also a skills problem. The people who built the original infrastructure, who understood what was being abstracted away, have often moved on. The current team knows Azure or AWS inside out, but couldn’t stand up an equivalent environment on bare metal.

That’s not a criticism of those engineers. It’s a criticism of an industry that told them they didn’t need those skills.

When the skills to run your own infrastructure disappear from your organisation, your relationship with your cloud provider changes. You’re not choosing to be there anymore. You’re there because you can’t be anywhere else. And that’s not a customer relationship – that’s dependency.

I’m not saying every company needs to maintain bare-metal skills. For plenty of organisations, committing fully to cloud is the right strategy. But they should make that choice knowingly, understanding what they’re giving up, not discover it when they need to leave and realise they can’t.

Where to Start

If any of this resonates and you’re thinking about starting a homelab, the good news is you don’t need much. You definitely don’t need five mini PCs and a drawer full of Raspberry Pis. That’s where I ended up, not where I started.

A single Raspberry Pi and a bit of curiosity is genuinely enough. Or an old laptop. Or a cheap mini PC from eBay. The hardware barely matters at the beginning.

Start with Docker. Get it running. Deploy one container – something useful like Pi-hole or a reverse proxy. Get it working, break it, fix it, understand why it broke. That’s the learning loop.

If you want a practical starting point, I wrote a post on the five containers every homelab should run. It covers the boring-but-essential infrastructure that makes everything else work – reverse proxy, monitoring, dashboards, VPN. Start there if you want a structured path in.

From there, it’s about following your curiosity. Want to understand networking? Set up a reverse proxy with proper DNS and TLS. Want to learn about databases? Deploy PostgreSQL and break the replication. Want to understand monitoring? Build a Grafana dashboard from scratch instead of importing one.

Every service you run at home is a skill you’ve tested under real conditions. That’s worth more than any lab environment in a certification course, because it’s yours, it’s persistent, and it breaks in the same unpredictable ways that production does.

The Point

I’m not saying leave the cloud. That would be hypocritical – I’ll be designing Azure architectures again on Monday and genuinely enjoying it.

What I’m saying is: understand what you’re standing on.

The best cloud architects I know are the ones who could, in theory, build the whole thing from scratch. They don’t need to. They probably never will. But the understanding informs everything – their designs are better, their troubleshooting is faster, their risk assessments are more honest.

A homelab is how you build that understanding. Not by reading about it. By doing it, breaking it, and fixing it at midnight because nobody else is going to.

The cloud isn’t going anywhere. Neither is abstraction. But skills that nobody maintains eventually disappear. And once they’re gone, getting them back is a lot harder than keeping them in the first place.

Run something yourself. See what it teaches you. You might be surprised.

This post is part of the self-hosting and digital sovereignty series on readthemanual.tech.

About the author – Eric Lonsdale is an Azure and Infrastructure Architect who designs cloud platforms for businesses and self-hosts everything personal. He writes about both because they are not mutually exclusive. Connect on LinkedIn.

If you are the kind of person who reads man pages before Stack Overflow, you might appreciate the RTFM store. Just saying.