knowledge-base/network/opnsense-checklist.md

3.9 KiB

OPNsense Network Validation Checklist

Overview

ZeroLagHub uses two OPNsense routers. This document tracks what needs to be verified, validated, and tested before production launch.

Current state: routers exist and have been minimally touched. This checklist should be worked through systematically.


Router Inventory

Router Role Status
Primary OPNsense Core network routing, WireGuard Needs audit
Secondary OPNsense Failover / redundancy Needs audit

Network Model

ZeroLagHub does not use VLANs. Each network is a separate subnet tied to a dedicated Linux bridge (vmbr) on the Proxmox host. OPNsense routes between these bridges.

Bridge Subnet Purpose
vmbr? 10.60.0.0/24 Core services — API, portal, DNS, artifacts
vmbr? 10.70.0.0/24 Proxy/edge — Traefik, zpack-proxy
vmbr? 10.100.0.0/24 Game/dev containers — LXC
vmbr? WireGuard range Admin VPN access

Action: Fill in actual vmbr numbers from Proxmox network config. Check: Proxmox → Node → Network tab.


Firewall Rules (Verify)

Must be confirmed open:

  • 10.60.0.245:4000 (API) reachable from 10.70.0.242 (Traefik)
  • 10.100.0.x:6000 (dev containers) reachable from 10.60.0.245 (API)
  • 10.100.0.x:18888 (agent) reachable from 10.60.0.245 (API)
  • 10.70.0.242:443 (Traefik) reachable from internet (public IP)
  • DNS resolution working from containers → 10.60.0.x (Technitium)

Must be confirmed blocked:

  • Direct container access from internet (no public IP on container VMs)
  • Containers cannot reach management subnet directly (should go via API)

WireGuard (Verify)

  • Admin WireGuard tunnel active and stable
  • Peer configs for all admin machines documented
  • Confirm WireGuard survives router restart
  • Test: disconnect and reconnect from admin machine

Failover / Redundancy

  • Confirm whether CARP/VRRP is configured between primary and secondary
  • Test failover: shut down primary router, verify traffic continues
  • Confirm secondary has identical interface and firewall config
  • Document failover behavior — is it automatic or manual?

DNS Resolution

  • Containers resolve internal hostnames via Technitium (zlh-dns)
  • dev-*.zerolaghub.dev resolves to Traefik IP (10.70.0.242) internally
  • External DNS (zerolaghub.dev) resolves correctly via Cloudflare
  • Reverse DNS for container IPs (nice to have, not blocking)

Routing Validation

Quick connectivity matrix to verify end-to-end:

# From API host (10.60.0.245)
curl http://10.100.0.x:18888/status        # Agent reachable
curl http://10.100.0.x:6000                # code-server reachable
curl http://10.70.0.242                    # Traefik reachable

# From Traefik host (10.70.0.242)
curl http://10.60.0.245:4000/api/health    # API reachable

# From dev container
curl http://10.60.0.251:8080               # Artifact server reachable
nslookup zerolaghub.dev                    # DNS working

Known Issues / History

  • Routers have not been systematically audited
  • Basic routing is confirmed working (cross-subnet curl tests pass)
  • WireGuard access confirmed working for admin
  • No formal failover test has been performed

Action Items (Priority Order)

  1. Fill in actual vmbr numbers in network model table above
  2. Audit both router configs — document actual interfaces and firewall rules
  3. Run connectivity matrix above and confirm all pass
  4. Test WireGuard reconnect after router restart
  5. Test failover between primary and secondary
  6. Document any gaps found and remediate

Notes

OPNsense dashboard is accessible via WireGuard. Do not expose OPNsense management interface to the internet.

Configuration backups: OPNsense has built-in XML config export. Export and store in a secure location before making any changes.