168 lines
4.5 KiB
Markdown
168 lines
4.5 KiB
Markdown
# Session Handoff — PBS to Cloudflare R2 Offsite Backup
|
|
|
|
Date: 2026-05-02
|
|
|
|
## Summary
|
|
|
|
Worked through the Proxmox Backup Server offsite backup path for ZeroLagHub.
|
|
|
|
Current decision: use PBS local datastore as the primary restore source, and use `rclone copy` to push a clean datastore baseline to Cloudflare R2 for offsite disaster recovery.
|
|
|
|
Native PBS S3 datastore was investigated but is not the chosen path for now because the PBS/R2 S3 endpoint path was not behaving cleanly enough and PBS S3 datastore support should not be treated as the stable production path yet.
|
|
|
|
## Confirmed architecture
|
|
|
|
```text
|
|
PVE / production Proxmox
|
|
-> PBS local datastore: z-back
|
|
-> rclone copy
|
|
-> Cloudflare R2 bucket: z-back-remote
|
|
```
|
|
|
|
Roles:
|
|
- PBS local datastore = primary restore-ready infrastructure backup layer
|
|
- Cloudflare R2 = offsite disaster recovery copy
|
|
- Agent backups = local app-aware rollback only, not platform DR
|
|
|
|
## R2 / rclone details
|
|
|
|
Configured rclone remote:
|
|
|
|
```text
|
|
remote name: zback-remote
|
|
provider: Cloudflare R2 / S3
|
|
endpoint: https://526f4df41bcce7267d5d4a39883cdd21.r2.cloudflarestorage.com
|
|
region: auto
|
|
bucket: z-back-remote
|
|
working path: zback-remote:z-back-remote
|
|
```
|
|
|
|
Important naming distinction:
|
|
- `zback-remote` = rclone remote name
|
|
- `z-back-remote` = Cloudflare R2 bucket name
|
|
|
|
Connectivity was validated with a test write/list:
|
|
|
|
```bash
|
|
echo "r2 test from zlh-pbs $(date)" > /tmp/r2-test.txt
|
|
|
|
rclone copy /tmp/r2-test.txt zback-remote:z-back-remote/test/ \
|
|
--s3-no-check-bucket \
|
|
--progress
|
|
|
|
rclone lsf zback-remote:z-back-remote/test/ \
|
|
--s3-no-check-bucket
|
|
```
|
|
|
|
Expected/observed result:
|
|
|
|
```text
|
|
r2-test.txt
|
|
```
|
|
|
|
## PBS datastore dry run
|
|
|
|
Ran a dry run against the current datastore:
|
|
|
|
```bash
|
|
rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
|
|
--dry-run \
|
|
--s3-no-check-bucket \
|
|
--progress \
|
|
--log-file=/var/log/zlh-pbs-r2-copy.log \
|
|
--log-level=INFO
|
|
```
|
|
|
|
Dry run result:
|
|
|
|
```text
|
|
Transferred: 131.169 GiB / 131.169 GiB, 100%
|
|
Transferred: 97513 / 97513, 100%
|
|
Elapsed time: 6.8s
|
|
```
|
|
|
|
This only proved rclone would copy the datastore. It did not upload because `--dry-run` was used.
|
|
|
|
## Important current blocker / decision
|
|
|
|
The current PBS datastore contents are old migration-era backups from March.
|
|
|
|
User stated these backups are not useful for current production recovery and likely can be removed because the environment is far past migration.
|
|
|
|
Decision:
|
|
- Do not upload the current migration-era datastore to R2.
|
|
- First clean PBS by removing old March backups.
|
|
- Then create fresh production backups.
|
|
- Then copy the clean baseline to R2.
|
|
|
|
## Next steps
|
|
|
|
1. In PBS, remove the old March backup snapshots/groups from datastore `z-back`.
|
|
- Prefer PBS UI: Datastore -> z-back -> Content -> remove/forget old snapshots.
|
|
- Be careful to delete only migration-era backups that are not needed.
|
|
|
|
2. Run garbage collection on `z-back` after old snapshots are forgotten.
|
|
|
|
```bash
|
|
proxmox-backup-manager garbage-collection start z-back
|
|
```
|
|
|
|
3. From Proxmox VE, run fresh backups of current production VMs/LXCs to PBS datastore `z-back`.
|
|
|
|
4. Verify the fresh PBS backups.
|
|
|
|
5. Dry-run the R2 copy again and confirm it reflects only the clean baseline.
|
|
|
|
```bash
|
|
rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
|
|
--dry-run \
|
|
--s3-no-check-bucket \
|
|
--progress \
|
|
--log-file=/var/log/zlh-backups/pbs-r2-z-back-dryrun.log \
|
|
--log-level=INFO
|
|
```
|
|
|
|
6. Run the real offsite copy once PBS is quiet.
|
|
|
|
```bash
|
|
rclone copy /mnt/datastore/z-back zback-remote:z-back-remote/pbs/z-back \
|
|
--s3-no-check-bucket \
|
|
--progress \
|
|
--transfers=8 \
|
|
--checkers=16 \
|
|
--log-file=/var/log/zlh-backups/pbs-r2-z-back-$(date +%F-%H%M).log \
|
|
--log-level=INFO
|
|
```
|
|
|
|
7. Perform a restore test from R2 before considering offsite DR proven.
|
|
|
|
## Operational constraints
|
|
|
|
Do not run rclone while PBS is:
|
|
- writing backups
|
|
- pruning
|
|
- garbage collecting
|
|
- verifying
|
|
|
|
Use `rclone copy`, not `rclone sync`, until restore-from-R2 has been proven. `copy` avoids remote deletions and is safer while establishing the first offsite baseline.
|
|
|
|
## Security note
|
|
|
|
The R2 access key and secret were pasted during the session. Treat them as compromised.
|
|
|
|
Before real backup upload:
|
|
- rotate/recreate the Cloudflare R2 access key and secret
|
|
- update `/root/.config/rclone/rclone.conf`
|
|
- verify test upload/list still works
|
|
|
|
Recommended rclone config additions:
|
|
|
|
```ini
|
|
acl = private
|
|
no_check_bucket = true
|
|
```
|
|
|
|
## Session stopping point
|
|
|
|
rclone transport to R2 is working. The remaining work is PBS cleanup, fresh baseline backup, R2 copy, and restore validation.
|