diff --git a/network/opnsense-checklist.md b/network/opnsense-checklist.md new file mode 100644 index 0000000..4639eab --- /dev/null +++ b/network/opnsense-checklist.md @@ -0,0 +1,127 @@ +# OPNsense Network Validation Checklist + +## Overview + +ZeroLagHub uses two OPNsense routers. This document tracks what needs to be +verified, validated, and tested before production launch. + +Current state: routers exist and have been minimally touched. This checklist +should be worked through systematically. + +--- + +## Router Inventory + +| Router | Role | Status | +|--------|------|--------| +| Primary OPNsense | Core network routing, VLANs, WireGuard | Needs audit | +| Secondary OPNsense | Failover / redundancy | Needs audit | + +--- + +## VLAN Structure (Verify) + +Expected VLANs — confirm these match actual config: + +| VLAN | Subnet | Purpose | +|------|--------|---------| +| Management | 10.60.0.0/24 | API, portal, core services | +| Game/Dev containers | 10.100.0.0/24 | LXC containers | +| Proxy/Edge | 10.70.0.0/24 | Traefik, zpack-proxy | +| WireGuard | 10.x.x.x/24 | Admin VPN access | + +**Action:** Log into OPNsense dashboard and confirm actual VLAN assignments +match the above. Update this table if different. + +--- + +## Firewall Rules (Verify) + +### Must be confirmed open: + +- [ ] `10.60.0.245:4000` (API) reachable from `10.70.0.242` (Traefik) +- [ ] `10.100.0.x:6000` (dev containers) reachable from `10.60.0.245` (API) +- [ ] `10.100.0.x:18888` (agent) reachable from `10.60.0.245` (API) +- [ ] `10.70.0.242:443` (Traefik) reachable from internet (public IP) +- [ ] DNS resolution working from containers → `10.60.0.x` (Technitium) + +### Must be confirmed blocked: + +- [ ] Direct container access from internet (no public IP on container VMs) +- [ ] Cross-VLAN access that bypasses API (containers cannot reach API directly) + +--- + +## WireGuard (Verify) + +- [ ] Admin WireGuard tunnel active and stable +- [ ] Peer configs for all admin machines documented +- [ ] Confirm WireGuard survives router restart +- [ ] Test: disconnect and reconnect from admin machine + +--- + +## Failover / Redundancy + +- [ ] Confirm whether CARP/VRRP is configured between primary and secondary +- [ ] Test failover: shut down primary router, verify traffic continues +- [ ] Confirm secondary has identical VLAN and firewall config +- [ ] Document failover behavior — is it automatic or manual? + +--- + +## DNS Resolution + +- [ ] Containers resolve internal hostnames via Technitium (`zlh-dns`) +- [ ] `dev-*.zerolaghub.dev` resolves to Traefik IP (`10.70.0.242`) internally +- [ ] External DNS (`zerolaghub.dev`) resolves correctly via Cloudflare +- [ ] Reverse DNS for container IPs (nice to have, not blocking) + +--- + +## Routing Validation + +Quick connectivity matrix to verify end-to-end: + +```bash +# From API host (10.60.0.245) +curl http://10.100.0.x:18888/status # Agent reachable +curl http://10.100.0.x:6000 # code-server reachable +curl http://10.70.0.242 # Traefik reachable + +# From Traefik host (10.70.0.242) +curl http://10.60.0.245:4000/api/health # API reachable + +# From dev container +curl http://10.60.0.251:8080 # Artifact server reachable +nslookup zerolaghub.dev # DNS working +``` + +--- + +## Known Issues / History + +- Routers have not been systematically audited +- Basic routing is confirmed working (cross-subnet curl tests pass) +- WireGuard access confirmed working for admin +- No formal failover test has been performed + +--- + +## Action Items (Priority Order) + +1. Audit both router configs — document actual VLAN and firewall rules +2. Run connectivity matrix above and confirm all pass +3. Test WireGuard reconnect after router restart +4. Test failover between primary and secondary +5. Document any gaps found and remediate + +--- + +## Notes + +OPNsense dashboard is accessible via WireGuard. Do not expose OPNsense +management interface to the internet. + +Configuration backups: OPNsense has built-in XML config export. +Export and store in a secure location before making any changes.