From 3d462f264d99e0fc7a4bec02a38e3dfee48d661b Mon Sep 17 00:00:00 2001 From: jester Date: Sat, 2 May 2026 22:03:00 +0000 Subject: [PATCH] Add PBS R2 offsite backup handoff --- .../2026-05-02-pbs-r2-offsite-handoff.md | 167 ++++++++++++++++++ 1 file changed, 167 insertions(+) create mode 100644 Session_Summaries/2026-05-02-pbs-r2-offsite-handoff.md diff --git a/Session_Summaries/2026-05-02-pbs-r2-offsite-handoff.md b/Session_Summaries/2026-05-02-pbs-r2-offsite-handoff.md new file mode 100644 index 0000000..66ef281 --- /dev/null +++ b/Session_Summaries/2026-05-02-pbs-r2-offsite-handoff.md @@ -0,0 +1,167 @@ +# Session Handoff — PBS to Cloudflare R2 Offsite Backup + +Date: 2026-05-02 + +## Summary + +Worked through the Proxmox Backup Server offsite backup path for ZeroLagHub. + +Current decision: use PBS local datastore as the primary restore source, and use `rclone copy` to push a clean datastore baseline to Cloudflare R2 for offsite disaster recovery. + +Native PBS S3 datastore was investigated but is not the chosen path for now because the PBS/R2 S3 endpoint path was not behaving cleanly enough and PBS S3 datastore support should not be treated as the stable production path yet. + +## Confirmed architecture + +```text +PVE / production Proxmox + -> PBS local datastore: z-back + -> rclone copy + -> Cloudflare R2 bucket: z-back-remote +``` + +Roles: +- PBS local datastore = primary restore-ready infrastructure backup layer +- Cloudflare R2 = offsite disaster recovery copy +- Agent backups = local app-aware rollback only, not platform DR + +## R2 / rclone details + +Configured rclone remote: + +```text +remote name: zback-remote +provider: Cloudflare R2 / S3 +endpoint: https://526f4df41bcce7267d5d4a39883cdd21.r2.cloudflarestorage.com +region: auto +bucket: z-back-remote +working path: zback-remote:z-back-remote +``` + +Important naming distinction: +- `zback-remote` = rclone remote name +- `z-back-remote` = Cloudflare R2 bucket name + +Connectivity was validated with a test write/list: + +```bash +echo "r2 test from zlh-pbs $(date)" > /tmp/r2-test.txt + +rclone copy /tmp/r2-test.txt zback-remote:z-back-remote/test/ \ + --s3-no-check-bucket \ + --progress + +rclone lsf zback-remote:z-back-remote/test/ \ + --s3-no-check-bucket +``` + +Expected/observed result: + +```text +r2-test.txt +``` + +## PBS datastore dry run + +Ran a dry run against the current datastore: + +```bash +rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \ + --dry-run \ + --s3-no-check-bucket \ + --progress \ + --log-file=/var/log/zlh-pbs-r2-copy.log \ + --log-level=INFO +``` + +Dry run result: + +```text +Transferred: 131.169 GiB / 131.169 GiB, 100% +Transferred: 97513 / 97513, 100% +Elapsed time: 6.8s +``` + +This only proved rclone would copy the datastore. It did not upload because `--dry-run` was used. + +## Important current blocker / decision + +The current PBS datastore contents are old migration-era backups from March. + +User stated these backups are not useful for current production recovery and likely can be removed because the environment is far past migration. + +Decision: +- Do not upload the current migration-era datastore to R2. +- First clean PBS by removing old March backups. +- Then create fresh production backups. +- Then copy the clean baseline to R2. + +## Next steps + +1. In PBS, remove the old March backup snapshots/groups from datastore `z-back`. + - Prefer PBS UI: Datastore -> z-back -> Content -> remove/forget old snapshots. + - Be careful to delete only migration-era backups that are not needed. + +2. Run garbage collection on `z-back` after old snapshots are forgotten. + +```bash +proxmox-backup-manager garbage-collection start z-back +``` + +3. From Proxmox VE, run fresh backups of current production VMs/LXCs to PBS datastore `z-back`. + +4. Verify the fresh PBS backups. + +5. Dry-run the R2 copy again and confirm it reflects only the clean baseline. + +```bash +rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \ + --dry-run \ + --s3-no-check-bucket \ + --progress \ + --log-file=/var/log/zlh-backups/pbs-r2-z-back-dryrun.log \ + --log-level=INFO +``` + +6. Run the real offsite copy once PBS is quiet. + +```bash +rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \ + --s3-no-check-bucket \ + --progress \ + --transfers=8 \ + --checkers=16 \ + --log-file=/var/log/zlh-backups/pbs-r2-z-back-$(date +%F-%H%M).log \ + --log-level=INFO +``` + +7. Perform a restore test from R2 before considering offsite DR proven. + +## Operational constraints + +Do not run rclone while PBS is: +- writing backups +- pruning +- garbage collecting +- verifying + +Use `rclone copy`, not `rclone sync`, until restore-from-R2 has been proven. `copy` avoids remote deletions and is safer while establishing the first offsite baseline. + +## Security note + +The R2 access key and secret were pasted during the session. Treat them as compromised. + +Before real backup upload: +- rotate/recreate the Cloudflare R2 access key and secret +- update `/root/.config/rclone/rclone.conf` +- verify test upload/list still works + +Recommended rclone config additions: + +```ini +acl = private +no_check_bucket = true +``` + +## Session stopping point + +rclone transport to R2 is working. The remaining work is PBS cleanup, fresh baseline backup, R2 copy, and restore validation.