knowledge-base/network/opnsense-checklist.md

131 lines
3.9 KiB
Markdown

# OPNsense Network Validation Checklist
## Overview
ZeroLagHub uses two OPNsense routers. This document tracks what needs to be
verified, validated, and tested before production launch.
Current state: routers exist and have been minimally touched. This checklist
should be worked through systematically.
---
## Router Inventory
| Router | Role | Status |
|--------|------|--------|
| Primary OPNsense | Core network routing, WireGuard | Needs audit |
| Secondary OPNsense | Failover / redundancy | Needs audit |
---
## Network Model
ZeroLagHub does not use VLANs. Each network is a separate subnet tied to a
dedicated Linux bridge (`vmbr`) on the Proxmox host. OPNsense routes between
these bridges.
| Bridge | Subnet | Purpose |
|--------|--------|---------|
| vmbr? | 10.60.0.0/24 | Core services — API, portal, DNS, artifacts |
| vmbr? | 10.70.0.0/24 | Proxy/edge — Traefik, zpack-proxy |
| vmbr? | 10.100.0.0/24 | Game/dev containers — LXC |
| vmbr? | WireGuard range | Admin VPN access |
**Action:** Fill in actual `vmbr` numbers from Proxmox network config.
Check: Proxmox → Node → Network tab.
---
## Firewall Rules (Verify)
### Must be confirmed open:
- [ ] `10.60.0.245:4000` (API) reachable from `10.70.0.242` (Traefik)
- [ ] `10.100.0.x:6000` (dev containers) reachable from `10.60.0.245` (API)
- [ ] `10.100.0.x:18888` (agent) reachable from `10.60.0.245` (API)
- [ ] `10.70.0.242:443` (Traefik) reachable from internet (public IP)
- [ ] DNS resolution working from containers → `10.60.0.x` (Technitium)
### Must be confirmed blocked:
- [ ] Direct container access from internet (no public IP on container VMs)
- [ ] Containers cannot reach management subnet directly (should go via API)
---
## WireGuard (Verify)
- [ ] Admin WireGuard tunnel active and stable
- [ ] Peer configs for all admin machines documented
- [ ] Confirm WireGuard survives router restart
- [ ] Test: disconnect and reconnect from admin machine
---
## Failover / Redundancy
- [ ] Confirm whether CARP/VRRP is configured between primary and secondary
- [ ] Test failover: shut down primary router, verify traffic continues
- [ ] Confirm secondary has identical interface and firewall config
- [ ] Document failover behavior — is it automatic or manual?
---
## DNS Resolution
- [ ] Containers resolve internal hostnames via Technitium (`zlh-dns`)
- [ ] `dev-*.zerolaghub.dev` resolves to Traefik IP (`10.70.0.242`) internally
- [ ] External DNS (`zerolaghub.dev`) resolves correctly via Cloudflare
- [ ] Reverse DNS for container IPs (nice to have, not blocking)
---
## Routing Validation
Quick connectivity matrix to verify end-to-end:
```bash
# From API host (10.60.0.245)
curl http://10.100.0.x:18888/status # Agent reachable
curl http://10.100.0.x:6000 # code-server reachable
curl http://10.70.0.242 # Traefik reachable
# From Traefik host (10.70.0.242)
curl http://10.60.0.245:4000/api/health # API reachable
# From dev container
curl http://10.60.0.251:8080 # Artifact server reachable
nslookup zerolaghub.dev # DNS working
```
---
## Known Issues / History
- Routers have not been systematically audited
- Basic routing is confirmed working (cross-subnet curl tests pass)
- WireGuard access confirmed working for admin
- No formal failover test has been performed
---
## Action Items (Priority Order)
1. Fill in actual `vmbr` numbers in network model table above
2. Audit both router configs — document actual interfaces and firewall rules
3. Run connectivity matrix above and confirm all pass
4. Test WireGuard reconnect after router restart
5. Test failover between primary and secondary
6. Document any gaps found and remediate
---
## Notes
OPNsense dashboard is accessible via WireGuard. Do not expose OPNsense
management interface to the internet.
Configuration backups: OPNsense has built-in XML config export.
Export and store in a secure location before making any changes.