zlh-grind/Session_Summaries/2026-05-02-pbs-r2-offsite-handoff.md

4.5 KiB

Session Handoff — PBS to Cloudflare R2 Offsite Backup

Date: 2026-05-02

Summary

Worked through the Proxmox Backup Server offsite backup path for ZeroLagHub.

Current decision: use PBS local datastore as the primary restore source, and use rclone copy to push a clean datastore baseline to Cloudflare R2 for offsite disaster recovery.

Native PBS S3 datastore was investigated but is not the chosen path for now because the PBS/R2 S3 endpoint path was not behaving cleanly enough and PBS S3 datastore support should not be treated as the stable production path yet.

Confirmed architecture

PVE / production Proxmox
  -> PBS local datastore: z-back
  -> rclone copy
  -> Cloudflare R2 bucket: z-back-remote

Roles:

  • PBS local datastore = primary restore-ready infrastructure backup layer
  • Cloudflare R2 = offsite disaster recovery copy
  • Agent backups = local app-aware rollback only, not platform DR

R2 / rclone details

Configured rclone remote:

remote name: zback-remote
provider: Cloudflare R2 / S3
endpoint: https://526f4df41bcce7267d5d4a39883cdd21.r2.cloudflarestorage.com
region: auto
bucket: z-back-remote
working path: zback-remote:z-back-remote

Important naming distinction:

  • zback-remote = rclone remote name
  • z-back-remote = Cloudflare R2 bucket name

Connectivity was validated with a test write/list:

echo "r2 test from zlh-pbs $(date)" > /tmp/r2-test.txt

rclone copy /tmp/r2-test.txt zback-remote:z-back-remote/test/ \
  --s3-no-check-bucket \
  --progress

rclone lsf zback-remote:z-back-remote/test/ \
  --s3-no-check-bucket

Expected/observed result:

r2-test.txt

PBS datastore dry run

Ran a dry run against the current datastore:

rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
  --dry-run \
  --s3-no-check-bucket \
  --progress \
  --log-file=/var/log/zlh-pbs-r2-copy.log \
  --log-level=INFO

Dry run result:

Transferred:      131.169 GiB / 131.169 GiB, 100%
Transferred:        97513 / 97513, 100%
Elapsed time:         6.8s

This only proved rclone would copy the datastore. It did not upload because --dry-run was used.

Important current blocker / decision

The current PBS datastore contents are old migration-era backups from March.

User stated these backups are not useful for current production recovery and likely can be removed because the environment is far past migration.

Decision:

  • Do not upload the current migration-era datastore to R2.
  • First clean PBS by removing old March backups.
  • Then create fresh production backups.
  • Then copy the clean baseline to R2.

Next steps

  1. In PBS, remove the old March backup snapshots/groups from datastore z-back.

    • Prefer PBS UI: Datastore -> z-back -> Content -> remove/forget old snapshots.
    • Be careful to delete only migration-era backups that are not needed.
  2. Run garbage collection on z-back after old snapshots are forgotten.

proxmox-backup-manager garbage-collection start z-back
  1. From Proxmox VE, run fresh backups of current production VMs/LXCs to PBS datastore z-back.

  2. Verify the fresh PBS backups.

  3. Dry-run the R2 copy again and confirm it reflects only the clean baseline.

rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
  --dry-run \
  --s3-no-check-bucket \
  --progress \
  --log-file=/var/log/zlh-backups/pbs-r2-z-back-dryrun.log \
  --log-level=INFO
  1. Run the real offsite copy once PBS is quiet.
rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
  --s3-no-check-bucket \
  --progress \
  --transfers=8 \
  --checkers=16 \
  --log-file=/var/log/zlh-backups/pbs-r2-z-back-$(date +%F-%H%M).log \
  --log-level=INFO
  1. Perform a restore test from R2 before considering offsite DR proven.

Operational constraints

Do not run rclone while PBS is:

  • writing backups
  • pruning
  • garbage collecting
  • verifying

Use rclone copy, not rclone sync, until restore-from-R2 has been proven. copy avoids remote deletions and is safer while establishing the first offsite baseline.

Security note

The R2 access key and secret were pasted during the session. Treat them as compromised.

Before real backup upload:

  • rotate/recreate the Cloudflare R2 access key and secret
  • update /root/.config/rclone/rclone.conf
  • verify test upload/list still works

Recommended rclone config additions:

acl = private
no_check_bucket = true

Session stopping point

rclone transport to R2 is working. The remaining work is PBS cleanup, fresh baseline backup, R2 copy, and restore validation.