homelab
Intro
2 main things started my journey into tinkering with servers. One, was my initial interest in Raspberry Pis, wanting to set up a simple cluster and experiment with Linux. Two, was my time at Shopee in which I got exposure to distributed systems that served meaningful applications.
I’ve built this slowly over 2 years and I’ll try to go into details from bottom up, starting from the hardware and until the applications and use cases.
Hardware
Shoutout to Taobao. Without Taobao, the server-grade hardware would have been absolutely unaffordable.
UPS - Huawei UPS2000-G
I had planned to operate highly available services and therefore need to ensure there is no downtime when doing maintenance such as when shifting the rack around.
The two 9kA batteries can sustain a 400W load for about 50 minutes, and also the 2kVA rating is more than enough for the current draw of the system.
Network Switch - H3C S1226FX
All RJ45 ports are 1GbE, with 2 10GbE SFP+ ports. The 10GbE ports serve as buffer for upgrading local network bandwidth. Although this is a possible single point of failure, it was not worth the cost in getting redundant switches.
I decided to go with an unmanaged switch as it uses less power, is cheaper, is easier to setup, and I had not planned to manage VLANs any time soon. In the future, I can use a smaller managed switch to configure and route VLANs if I want to.
Router - ASUS RT-AC88U
Nothing interesting here. Just documenting it. Connects to WAN with 1GbE.
Servers - Orbis, Ludibrium, Magatia
'Orbis' is a small-sized low power x86 machine that handles homelab administration such as networking and provisioning.
'Ludibrium', is a 5U rack-mount x86 machine that would all the user/public-facing services and should have enough hardware resources to support that. It also handles hard-drives.
'Magatia' is a cluster of super low-power ARM-based Raspberry Pis that would help me learn distributed-computing technologies.
UPS - Huawei UPS2000-G
The two 9kA batteries can sustain a 400W load for about 50 minutes, and also the 2kVA rating is more than enough for the current draw of the system.
Network Switch - H3C S1226FX
I decided to go with an unmanaged switch as it uses less power, is cheaper, is easier to setup, and I had not planned to manage VLANs any time soon. In the future, I can use a smaller managed switch to configure and route VLANs if I want to.
Router - ASUS RT-AC88U
Servers - Orbis, Ludibrium, Magatia
'Orbis' is a small-sized low power x86 machine that handles homelab administration such as networking and provisioning.
'Ludibrium', is a 5U rack-mount x86 machine that would all the user/public-facing services and should have enough hardware resources to support that. It also handles hard-drives.
'Magatia' is a cluster of super low-power ARM-based Raspberry Pis that would help me learn distributed-computing technologies.
Networking
Public Domain Hosting - Vodien
Chances are, you are reading this on "dannylty.com". I bought the domain off Vodien and point their public DNS servers to my public IP.
Port Forwarding - Router
Port forwarding is minimal since I use reverse proxy to route HTTP traffic to services.
Exposed ports are 80 (HTTP), 443 (HTTPS). Sometimes, non-HTTP applications like games connect only to a specific port - e.g. Minecraft is on 25565 - but I'm currently not running those.
OpenVPN - Router
I do a lot of remote development/administration e.g. when I was in Germany for university exchange semester. I believe that OpenVPN is a more secure option than SSH as OpenVPN compromise only exposes the network resources. In comparison if SSH were compromised, attackers gain direct access to machines.
DHCP - Router
There are some manually-assigned DHCP leases which I prefer over setting static-IP one by one for all the end-devices since all these 'static IP' configurations can be done in one place in the router. Also, by using DHCP, end devices can still listen out automatically for the address of the internal DNS servers.
Internal DNS - Pi-hole - Orbis
'Orbis' serves as the primary local DNS, with backup DNS served by the router.
I use Pi-hole on 'Orbis' which also supports ad-filtering.
HTTPS Reverse Proxy - NGINX - Orbis
The reverse proxy is crucial for delivering my web-services. The current NGINX configuration is much improved from when I started using it a year ago where I had put everything into one .conf file. Now, individual subdomains are handled by their respective .conf rules.
The above image does not include routing rules for internal services and endpoints. Creating reverse proxies for internal HTTP endpoints helps add a layer of abstraction over the services' exact IP and ports.
Lastly, I use Let's Encrypt to obtain TLS certificates for my individual subdomains.
Chances are, you are reading this on "dannylty.com". I bought the domain off Vodien and point their public DNS servers to my public IP.
Port Forwarding - Router
Port forwarding is minimal since I use reverse proxy to route HTTP traffic to services.
Exposed ports are 80 (HTTP), 443 (HTTPS). Sometimes, non-HTTP applications like games connect only to a specific port - e.g. Minecraft is on 25565 - but I'm currently not running those.
OpenVPN - Router
I do a lot of remote development/administration e.g. when I was in Germany for university exchange semester. I believe that OpenVPN is a more secure option than SSH as OpenVPN compromise only exposes the network resources. In comparison if SSH were compromised, attackers gain direct access to machines.
DHCP - Router
There are some manually-assigned DHCP leases which I prefer over setting static-IP one by one for all the end-devices since all these 'static IP' configurations can be done in one place in the router. Also, by using DHCP, end devices can still listen out automatically for the address of the internal DNS servers.
Internal DNS - Pi-hole - Orbis
Pi-hole webui.
HTTPS Reverse Proxy - NGINX - Orbis
Public web apps routing.
The above image does not include routing rules for internal services and endpoints. Creating reverse proxies for internal HTTP endpoints helps add a layer of abstraction over the services' exact IP and ports.
Lastly, I use Let's Encrypt to obtain TLS certificates for my individual subdomains.
Infrastructure
Containerisation - Docker
Nothing much to say here. While most of my applications and services run as containers, there are still some that can be migrated but I have not gotten to doing...
Software Provisioning - Ansible - Magatia
Before ChatGPT became a thing, I managed my Raspberry Pi cluster using a simple SSH wrapper called fabric2 that could execute commands on remote machines in parallel. While Ansible was a thing, I didn't feel the need to learn and use it for simple things like executing apt-get update in parallel.
Well thanks to ChatGPT I can now churn out playbooks in seconds so I have been managing the Magatia cluster with Ansible now (running on Orbis).
Container Orchestration - Docker Swarm - Magatia
For the Magatia cluster, I considered k3s and Docker Swarm, and decided on Swarm since it was a lot simpler to manage and configure.
I set up Orbis as the swarm manager in 'drain' mode so that containers do not run on it, and instead will run on worker Magatia nodes.
Container Management - Portainer - Orbis
Portainer provides a simple graphical webui to manage my containers, services, and swarm.
Hypervisor - Proxmox - Ludibrium
Proxmox runs directly as a Debian-based type-1 hypervisor on the bare-metal. It has support for running LXCs but I mainly use it for running VMs.
I normally favour VMs to run applications as I think they're more secure. Some of my applications (like storage hosting) require direct access to machine resources that can create a vulnerability when run on privileged Linux containers. There are also many open-source docker images that I run that also serve as attack surfaces.
Some of my applications also involve several highly cohesive services/dependancies that I prefer to put together into one VM so that they do not affect other applications and services. The tradeoff of virtualisation overhead is not severe.
Lastly, OS virtualisation provides me a playground to test, learn, and experiment. Severe crashes and errors in guest OSes will not affect the host OS.
Apart from the applications shown in pictures on this page, I run a lot of random small applications, scripts, and servers.
Network Storage - TrueNAS Scale - Ludibrium
Scale runs as a virtual machine in Proxmox. However, it is given the PCIe storage controller card directly via hardware-based I/O passthrough. Scale is an OS and environment built to manage storage.
I setup a simple 4-drive RAIDZ1 to serve as my main storage pool. RAIDZ1 requires the use of one parity disk for redundancy. I find this a good balance as with 4 drives as I have 3 drives of usable storage, and that upon failure, I have 2 cold-spare disks I can use to replace the faulty one. I prefer ZFS over ext4 mainly due to automatic corruption detection and healing, as well as ability to snapshot efficiently.
Datasets with appropriate ACLs are defined within the pool. Some of these datasets are shared via Samba to the local network. These shares are used for anything from family use to storing scheduled backups to being mounted by machines as shared storage space.
One additional benefit of using Scale is that there is an ecosystem of "applications" that are provided by the community. These applications are actually Helm charts that will deploy to the k3s Kubernetes platform bundled natively within Scale.
Nothing much to say here. While most of my applications and services run as containers, there are still some that can be migrated but I have not gotten to doing...
Software Provisioning - Ansible - Magatia
Before ChatGPT became a thing, I managed my Raspberry Pi cluster using a simple SSH wrapper called fabric2 that could execute commands on remote machines in parallel. While Ansible was a thing, I didn't feel the need to learn and use it for simple things like executing apt-get update in parallel.
Well thanks to ChatGPT I can now churn out playbooks in seconds so I have been managing the Magatia cluster with Ansible now (running on Orbis).
Container Orchestration - Docker Swarm - Magatia
For the Magatia cluster, I considered k3s and Docker Swarm, and decided on Swarm since it was a lot simpler to manage and configure.
I set up Orbis as the swarm manager in 'drain' mode so that containers do not run on it, and instead will run on worker Magatia nodes.
Container Management - Portainer - Orbis
Inspecting a service on the Magatia swarm in Portainer.
Hypervisor - Proxmox - Ludibrium
Proxmox runs directly as a Debian-based type-1 hypervisor on the bare-metal. It has support for running LXCs but I mainly use it for running VMs.
I normally favour VMs to run applications as I think they're more secure. Some of my applications (like storage hosting) require direct access to machine resources that can create a vulnerability when run on privileged Linux containers. There are also many open-source docker images that I run that also serve as attack surfaces.
Some of my applications also involve several highly cohesive services/dependancies that I prefer to put together into one VM so that they do not affect other applications and services. The tradeoff of virtualisation overhead is not severe.
Lastly, OS virtualisation provides me a playground to test, learn, and experiment. Severe crashes and errors in guest OSes will not affect the host OS.
Apart from the applications shown in pictures on this page, I run a lot of random small applications, scripts, and servers.
Network Storage - TrueNAS Scale - Ludibrium
Scale runs as a virtual machine in Proxmox. However, it is given the PCIe storage controller card directly via hardware-based I/O passthrough. Scale is an OS and environment built to manage storage.
I setup a simple 4-drive RAIDZ1 to serve as my main storage pool. RAIDZ1 requires the use of one parity disk for redundancy. I find this a good balance as with 4 drives as I have 3 drives of usable storage, and that upon failure, I have 2 cold-spare disks I can use to replace the faulty one. I prefer ZFS over ext4 mainly due to automatic corruption detection and healing, as well as ability to snapshot efficiently.
Datasets with appropriate ACLs are defined within the pool. Some of these datasets are shared via Samba to the local network. These shares are used for anything from family use to storing scheduled backups to being mounted by machines as shared storage space.
One additional benefit of using Scale is that there is an ecosystem of "applications" that are provided by the community. These applications are actually Helm charts that will deploy to the k3s Kubernetes platform bundled natively within Scale.
Applications
There are two kinds of applications that I run. What I call 'public apps' are ones that are available... well publicly. You could download it or deploy it somehow, and you immediately have it up and running. I feel that these applications are a bit trivial as long as the deployment environment has been established; anyone could run a new webservice by running a simple 'docker run' command.
So instead I will briefly talk about my two 'personal' applications/services that I've written, before simply listing out some of the public apps that I have integrated into my system.
www.dannylty.com
This website is statically generated by Jekyll. However, apart from the organisational framework that Jekyll provides, such as support for templates, pages, etc, all of the styling and scripting was designed by me (and written by ChatGPT).
The site itself is served by NGINX, with a simple rsync script that syncs the statically generated code into the web root directory.
holoscrape
This is a Python application that scrapes live chat logs from live-streams from YouTube. It is an open-source project I worked on for fun and can be found at https://github.com/dannylty/holoscrape.
I run this within a spare Proxmox container, and it connects to a MariaDB instance for data storage.
That's all folks. Everything else is community/open-source applications. Let's list them:
*Arr media stack - Media Server
The *Arr media stack is a wellknown ecosystem of applications for server enthusiasts to deploy for media management and media streaming.
At the front is Jellyfin, a server that scans for media on disk and streams them over the web.
Nextcloud - Cloud Storage
A cloud storage platform that operates similarly to Google Drive.
Stable Diffusion - AI Image Generation
Local AI image processing executed on the graphics card of my main machine. Usually offline since my computer is not on 24/7.
Prometheus/Grafana - Metrics and Monitoring
Exporters are installed in nodes, producing metrics that are pulled by Prometheus. Grafana uses these metrics to create dashboards.
Pi-hole - DNS Server
MariaDB - SQL Database
Game Servers
I have hosted a few game servers in which clients can connect over the internet to play together. These games include Minecraft, Rust, Project Zomboid. All of them require port forwarding in the router.
Homepage - Quite Literally a Homepage
A dashboard that can show certain information from different services via web APIs.
So instead I will briefly talk about my two 'personal' applications/services that I've written, before simply listing out some of the public apps that I have integrated into my system.
www.dannylty.com
This website is statically generated by Jekyll. However, apart from the organisational framework that Jekyll provides, such as support for templates, pages, etc, all of the styling and scripting was designed by me (and written by ChatGPT).
The site itself is served by NGINX, with a simple rsync script that syncs the statically generated code into the web root directory.
holoscrape
This is a Python application that scrapes live chat logs from live-streams from YouTube. It is an open-source project I worked on for fun and can be found at https://github.com/dannylty/holoscrape.
I run this within a spare Proxmox container, and it connects to a MariaDB instance for data storage.
That's all folks. Everything else is community/open-source applications. Let's list them:
*Arr media stack - Media Server
The *Arr media stack is a wellknown ecosystem of applications for server enthusiasts to deploy for media management and media streaming.
At the front is Jellyfin, a server that scans for media on disk and streams them over the web.
Nextcloud - Cloud Storage
A cloud storage platform that operates similarly to Google Drive.
Stable Diffusion - AI Image Generation
Local AI image processing executed on the graphics card of my main machine. Usually offline since my computer is not on 24/7.
Prometheus/Grafana - Metrics and Monitoring
Exporters are installed in nodes, producing metrics that are pulled by Prometheus. Grafana uses these metrics to create dashboards.
Pi-hole - DNS Server
MariaDB - SQL Database
Game Servers
I have hosted a few game servers in which clients can connect over the internet to play together. These games include Minecraft, Rust, Project Zomboid. All of them require port forwarding in the router.
Homepage - Quite Literally a Homepage
A dashboard that can show certain information from different services via web APIs.