diff --git a/SCRATCH/service-discovery.md b/SCRATCH/service-discovery.md new file mode 100644 index 0000000..d7e00dc --- /dev/null +++ b/SCRATCH/service-discovery.md @@ -0,0 +1,77 @@ +# Service Discovery — Env-Based Approach + +## Problem with internal.zlh DNS + +Service-to-service calls in the API and agent currently use internal.zlh FQDNs +(e.g. `zpack-velocity.internal.zlh:8081`). This introduces: + +- DNS resolution overhead on every call +- Silent failure if Technitium is down or slow +- Hard-to-diagnose timing issues during provisioning +- An extra dependency in the critical path for container creation + +The provisioning + DB consistency issues may be partly caused by DNS resolution +delays or failures during rapid create/delete cycles. + +## Recommended Approach — Env File + +Replace internal.zlh FQDNs in service-to-service calls with env vars backed by IPs. +Each VM reads from its own `.env` or `services.env` file. + +### Example services.env + +```env +ZPACK_API_URL=http://10.60.0.18:4000 +ZLH_ARTIFACTS_URL=http://10.60.0.17 +ZPACK_VELOCITY_URL=http://10.70.0.10:8081 +ZLH_PBS_URL=http://10.60.0.24 +ZLH_MONITOR_URL=http://10.60.0.25 +ZLH_DNS_URL=http://10.60.0.14 +``` + +### Benefits + +- Zero DNS resolution — direct IP, always predictable +- No dependency on Technitium for service calls +- If IPs ever change, update one env file per VM — no DNS record changes needed +- Easier to debug — no DNS layer to troubleshoot +- Faster provisioning — no resolution delay in the critical path + +### Scope + +Services that need to change: + +| Service | Current | Change to | +|---------|---------|-----------| +| zpack-api | `zpack-velocity.internal.zlh:8081` | `ZPACK_VELOCITY_URL` env var | +| zpack-api | `zlh-artifacts.internal.zlh` | `ZLH_ARTIFACTS_URL` env var | +| zpack-portal | `zpack-api.internal.zlh` | `ZPACK_API_URL` env var (server-side only) | +| zlh-agent | artifact server FQDN | `ZLH_ARTIFACTS_URL` env var | + +### What internal.zlh is still useful for + +- Human navigation: SSH, browser access to admin tools, Proxmox +- Non-critical paths where DNS resolution timing doesn't matter +- Documentation and reference + +internal.zlh should NOT be used in: +- Provisioning hot path +- Agent → artifact server calls +- API → Velocity calls +- Any call that happens during container create/delete + +## Current IPs (Detroit host) + +See INFRASTRUCTURE.md for full IP table. Key service IPs: + +| Service | IP | +|---------|----| +| zpack-api | 10.60.0.18:4000 | +| zpack-portal | 10.60.0.19 | +| zlh-artifacts | 10.60.0.17 | +| zpack-velocity | 10.70.0.10:8081 | +| zlh-dns (Technitium) | 10.60.0.14 | +| zlh-proxy (Caddy) | 10.60.0.16 | +| zpack-proxy (Traefik) | 10.70.0.11 | +| zlh-monitor | 10.60.0.25 | +| zlh-back (PBS) | 10.60.0.24 |