Add Dec 26 session - SSH/bastion work and public access blocker

This commit is contained in:
jester 2025-12-28 00:31:55 +00:00
parent 7e926dd12a
commit f1d03b376d

View File

@ -52,3 +52,54 @@ Do not rewrite or reorder past entries.
Status: **Root cause resolved; implementation pending agent patch & installer updates.** Status: **Root cause resolved; implementation pending agent patch & installer updates.**
--- ---
## 2025-12-26
- Goal: Enable SSH access to dev containers via bastion for external users.
- Split DNS architecture confirmed working: Cloudflare (public) + Technitium (internal).
### Dev container SSH (internal success)
- Root cause of initial SSH failures: **missing SSH host keys** in unprivileged LXC containers.
- Fixed by ensuring host keys exist before sshd starts.
- Verified internal SSH access works from WireGuard/LAN:
```
ssh -J zlh@10.100.0.48 root@dev-6038.zerolaghub.dev
```
- Confirmed not a firewall issue (iptables default ACCEPT, sshd listening on :22).
### Agent SSH provisioning requirements identified
- Agent must automate SSH setup for new containers:
- Install and enable sshd
- Generate SSH host keys if missing (add to `common.sh` or bootstrap)
- Create `devuser` with sudo access
- Configure authorized_keys for key-based auth
### zlh-cli progress
- Built successfully in Go, deployed to bastion at `/usr/local/bin/zlh`.
- **Known bugs when running ON bastion**:
- Incorrectly attempts to jump via `zlh-bastion.zerolaghub.dev` (should use localhost/direct)
- User/host targeting logic needs fixes (was targeting bastion instead of dev container)
- Goal: reduce `ssh -J` complexity to simple `zlh ssh 6038` command.
### Bastion public SSH blocker (ACTIVE)
- **Critical issue**: Public SSH to bastion fails immediately.
- Symptoms:
- TCP connection succeeds (SYN/ACK completes)
- SSH handshake never proceeds: `kex_exchange_identification: Connection closed by remote host`
- Not authentication failure - connection closes before banner exchange
- Verified:
- Cloudflare DNS: `zlh-bastion.zerolaghub.dev``139.64.165.248` (DNS-only)
- OPNsense NAT rule forwards :22 to 10.100.0.48
- Internal SSH to bastion works perfectly
- Fail2ban shows no blocks
- Tried disabling OPNsense SSH, changing ports - same failure
- **This is NOT the container host-key issue** (that was internal and fixed).
### Next debug steps for bastion blocker
1. tcpdump on bastion during external connection attempt
2. OPNsense live firewall log during external attempt
3. Confirm NAT truly reaches bastion sshd (not terminating upstream)
4. Check for ISP/modem interference or hairpin NAT issues
Status: **Dev container SSH working internally; bastion public access blocked at network layer.**
---