CYBER RESILIENCE AND BACKUP STANDARD

Recovery Tiers · Backup Protection · Restoration Testing · Regulated Overlays · BCP Interface


Document ID CERG-STD-RES-001
Version 1.21
Status Approved
Classification Public
Owner Engineering Pillar Leader (Resilience)
Parent Policy CERG-POL-001 - Cybersecurity Policy
Supporting Standards CERG-STD-IT-001 · CERG-STD-OT-001 · CERG-STD-CUI-001 · CERG-STD-CFG-001 · CERG-STD-CR-001 · CERG-STD-LM-001
Review Cycle Annual / On major platform change
Frameworks NIST 800-53r5 (CP) · NIST CSF 2.0 (RECOVER) · NIST 800-184 · ISO/IEC 27031
Regulations NERC-CIP CIP-009 · CMMC L2 (3.8 / 3.6) · SOX ITGC (Backups / Operations)
Environments Owned data center · IaaS / PaaS · SaaS Tier 1 · OT (BES and non-BES) · CUI scopes

Table of Contents

  1. Purpose and Scope
  2. Boundary: Cyber Resilience vs. Enterprise BCP
  3. Recovery Tiers and RTO/RPO Model
  4. Backup Protection and Immutability
  5. Restoration Testing
  6. Cloud and SaaS Recovery
  7. OT Recovery and CIP-009
  8. CUI Recovery
  9. SOX Availability Evidence
  10. Roles and BCP Interface
  11. Recovery Plan Template
  12. Document Control

1. Purpose and Scope

The policy, IT standard, OT standard, CUI standard, compliance matrix, and RMF all require documented and tested recovery plans, backup protection, and restoration evidence. Until this standard, those obligations had no shared executable definition, RTO/RPO tiers, immutability requirements, restoration evidence formats, and the boundary with enterprise BCP were left for each subordinate document to imply.

This standard closes that gap. It applies to every in-scope asset that has data, configuration, or workload state worth recovering.

Cyber Resilience Is Not BCP

Business continuity owns the business recovery: order processing keeps moving, customers are notified, alternate sites stand up. CERG owns the cyber recovery: the systems and data are restored from clean, immutable backups onto known-good baselines, with identity and detection re-enabled. The two programs coordinate; they do not substitute for each other.


2. Boundary: Cyber Resilience vs. Enterprise BCP

Question Owner
Are critical backups taken, protected, and restorable? CERG (this standard).
Are systems restored to a clean, hardened state with known-good baselines? CERG (this standard).
Is identity, detection, and logging brought back online on the recovered system? CERG (this standard).
Are operational workarounds and customer communications in place during the outage? Enterprise BCP.
Is the business processing relocated to an alternate site / process? Enterprise BCP.
Are non-cyber dependencies (people, facilities, supplies) recovered? Enterprise BCP.

CERG produces inputs to Enterprise BCP: tested cyber recovery plans, RTO/RPO actuals, and clean-restore preconditions. Enterprise BCP produces inputs to CERG: business criticality tiering, alternate-site cyber dependencies, and recovery priority calls during an event.


3. Recovery Tiers and RTO/RPO Model

Every in-scope asset has a CERG recovery tier. The tier drives RTO/RPO targets, backup frequency, immutability requirements, and restoration test cadence. Tiering is owned by Engineering in partnership with the business and Enterprise BCP, the business sets criticality, CERG sets the cyber tier consistent with criticality.

Tier Description RTO (Cyber) RPO Backup Frequency Restoration Test
T1 - Mission-Critical Outage causes safety, regulatory, or material financial loss within hours. ≤ 4 hours ≤ 15 minutes Continuous / near-continuous Quarterly full restore; annual full DR exercise
T2 - Business-Critical Outage materially impairs operations within one business day. ≤ 24 hours ≤ 4 hours Hourly to every 4 hours Semi-annual restore; annual exercise
T3 - Important Outage degrades operations but is tolerable for a few days. ≤ 72 hours ≤ 24 hours Daily Annual restore
T4 - Standard Outage is inconvenient; standard business hours recovery acceptable. ≤ 1 week ≤ 24 hours Daily Annual restore
T5 - Non-Production Dev/test/sandbox. Recovery is rebuild from code/IaC. Best effort n/a n/a (rebuild from IaC) n/a

The RTO Number Is Honest or It Is Useless

A T1 RTO of 4 hours that has never been demonstrated by a real restoration is a number on a slide. CERG only certifies an asset to a tier after a restoration test demonstrates the RTO/RPO. Until then the asset is classified Tier “Aspirational” and tracked in the resilience register as a Planned control.


4. Backup Protection and Immutability

Every backup of an in-scope asset meets the following requirements at minimum. Tier-specific tightening is named where it applies.

4.1 The Backup Protection Checklist

Requirement All Tiers T1 / T2 / BES / CUI
Backups taken at the frequency in Section 3
Backups encrypted at rest (CMK where Restricted/CUI per CERG-STD-CR-001)
At least one copy stored on immutable storage (object-lock / WORM / air-gap)
Backups stored in a separately-credentialled tenancy / account / domain
Backup admin role separate from production admin role; MFA enforced
Backup deletion / shortening requires two-person approval -
Backup retention period meets the longer of: business need · regulatory minimum · 13 months default
Backups validated for integrity at write and on a sample cadence
Backup of configurations / firmware / logic / historian (OT scope) per Section 7 n/a ✓ (BES)
One offline / “cold” copy of critical baselines and recovery preconditions -

4.2 The Air-Gap and Immutability Rationale

Ransomware tooling routinely attacks backup systems before triggering encryption. The immutability and credential separation requirements above exist specifically because production-attached, production-credentialled backups are not a defense.

If the Backup Lives on Production, It’s Not a Backup

A snapshot taken on the same storage system as production, accessible by the same credentials, is a convenience feature. It is not a cyber backup. The cyber backup lives somewhere a compromised production-domain identity cannot reach.


5. Restoration Testing

A backup that has never been restored is a hope, not a control. Restoration tests produce evidence per CERG-GOV-CB-001 control-to-evidence mapping.

5.1 Cadence

Tier Backup Integrity Check Sample Restore Full Restore Test Full DR Exercise
T1 Continuous Monthly Quarterly Annual
T2 Continuous Quarterly Semi-annual Annual
T3 Daily Quarterly Annual n/a
T4 Daily Annual Annual n/a
BES Daily Quarterly Annual (CIP-009 R3) Annual (CIP-009 R2)
CUI Daily Quarterly Annual Annual

5.2 Restoration Test Procedure

Each full restoration test follows the procedure below and produces the evidence artifact in Section 5.3.

  1. Pre-test brief, scope, success criteria, RTO/RPO targets, dependencies, business / OT owner notified.
  2. Isolate the recovery environment from production where applicable (no risk of overwriting production).
  3. Restore from immutable copy onto a clean baseline (DISH profile per CERG-STD-CFG-001).
  4. Bring up identity, logging, detection, and monitoring before declaring the system available.
  5. Validate data integrity against last known-good (hash, checksum, sample query).
  6. Measure RTO actual and RPO actual against targets. Document deltas.
  7. Tear down or promote per scope.
  8. Lessons learned captured into the resilience register and, where applicable, the risk register.

5.3 Restoration Test Evidence

Field Required? Notes
Asset / System ID From asset inventory
Tier Per Section 3
Backup Source (system, location, immutability evidence) -
Date / Duration -
Participants and roles -
RTO Target / Actual -
RPO Target / Actual -
Baseline applied during recovery DISH tier reference
Identity / logging / detection re-enabled Named systems
Integrity check method and result -
Issues encountered -
Lessons learned and follow-up actions Risk register IDs if any
Approver sign-off Engineering Pillar Leader + Asset Owner

6. Cloud and SaaS Recovery

Cloud and SaaS recovery is shaped by the shared-responsibility model. CERG does not assume provider recovery covers customer-side obligations.

6.1 Cloud-Native Workloads (IaaS / PaaS)

  • IaC reconstruction: production environments rebuildable from version-controlled IaC plus data backups.
  • Account / subscription / project failure modes: tested at the in-scope cadence, including region failure, account compromise, and IAM root account loss.
  • Provider-side recovery contacts and escalation paths: documented in the Recovery Plan; verified annually.
  • Cross-region or cross-cloud failover: where required by Tier, tested and evidenced.

6.2 Tier 1 SaaS

Provider native restore is the first line; CERG-managed SaaS backup is added where provider native restore is insufficient or where regulator requires customer-controlled backup.

Tier 1 SaaS Provider Native Restore Sufficient? CERG-Managed SaaS Backup?
M365 (Exchange / SharePoint / OneDrive / Teams) Limited (retention / point-in-time) Yes - for CUI workspaces and SOX-relevant mailboxes/sites
Salesforce Limited Yes - daily export for SOX in-scope orgs
ServiceNow Yes (provider managed) Audit-only export quarterly
Other Tier 1 Decision per onboarding review Documented in shared responsibility matrix

7. OT Recovery and CIP-009

OT recovery is uniquely prescriptive. CERG implements CIP-009 fully and extends it where multiple operating segments demand more.

7.1 OT Artifacts That Must Be Backed Up

CIP-009 R1 specifies recovery plans; CERG requires that the following OT artifacts are explicitly in backup scope:

  • Configuration files (SCADA, HMI, historian, jump server, network device, firewall)
  • Firmware images and versions in service (RTU, relay, OT switch, controller)
  • Software installers for the in-service OS, application, and engineering tool versions
  • Logic files (PLC ladder logic, RTU configuration, relay setting groups)
  • Historian data sets at the retention required for operational and regulatory needs
  • Relay setting groups with version, date, and operator metadata
  • Engineering tool images and licensing artifacts
  • Jump server images / known-good baselines
  • Documentation: as-built diagrams, ESP/EAP topology, vendor compatibility matrices

7.2 CIP-009 R1 / R2 / R3 Implementation

  • R1, Recovery Plan Specifications: documented per asset class; reviewed annually; includes roles, processes for backup management, and methods to preserve data essential for restoration.
  • R2, Recovery Plan Implementation and Testing: tested at least once every 15 months; one full operational exercise every 36 months (per CIP-009 R2.2).
  • R3, Lessons Learned and Plan Updates: documented within 90 days of any plan implementation or test.

7.3 OT Restoration Cautions

OT Recovery Has Safety Implications

Restoring a misconfigured relay setting group from backup can shed load or trip a substation. OT restoration is supervised by an operator with substation-engineering authority, never by Cyber alone. The Recovery Plan names the operator role explicitly.


8. CUI Recovery

CUI recovery aligns with NIST 800-171r3 3.8 (Media Protection, backups) and 3.6 (Incident Response, recovery from incident affecting CUI).

  • Backup encryption uses FIPS-validated cryptography per CERG-STD-CR-001.
  • Backup storage stays within the CUI enclave or its provider-equivalent boundary; no cross-spillage to non-CUI environments.
  • Restoration test validates that CUI labeling, access control, and logging are intact after recovery.
  • Post-incident updates to SSP and POA&M per CERG-PLN-CUI-001.
  • Incident-driven recovery: if a CUI system is recovered following an incident, the recovery itself is documented as POA&M follow-up.

9. SOX Availability Evidence

SOX-relevant systems (per CERG-PLN-SOX-001 SOX-Relevant System Register) require evidence of backup, restoration, and availability controls reusable in the SOX cycle.

SOX ITGC - Backup / Operations Reused CERG Evidence
Backups are taken Backup tool report; failure logs
Backups can be restored Restoration test evidence (this standard, Section 5.3)
Restoration is timely RTO actual against tier target
Failed jobs are remediated Backup job ticket history
Availability incidents are tracked Incident records (per CERG-PLN-IR-001) with cyber annotation

SOX evidence reuses these artifacts; no duplicate “SOX-only” restoration test is performed.


10. Roles and BCP Interface

Role Resilience Responsibilities
CERG - Engineering (Resilience) Owns this standard. Maintains the resilience register (tier per asset, last test result, next test date). Coordinates restoration tests. Maintains Recovery Plans (Section 11) for in-scope assets. Provides clean-restore preconditions to BCP.
CERG - Risk Tracks resilience-related risks (untested plans, expired tests, T1 gaps). Adversarial validation may include backup/recovery as an attack target.
CERG - Governance Tracks evidence currency (control CP-2, CP-4, CP-9, CP-10 in CERG-GOV-CB-001). Liaises with auditors.
Asset / Business Owners Define business criticality. Participate in restoration tests. Approve Recovery Plan changes.
OT Operations Lead OT restoration; consume Recovery Plan; supervise restorations with operational safety authority.
Enterprise BCP Owns business recovery. Consumes CERG cyber readiness signals; provides recovery priority decisions during an event.
Incident Response Activates Recovery Plans during incidents; CERG provides clean-restore baselines and recovery support.

11. Recovery Plan Template

Every in-scope asset (or asset cluster) has a Recovery Plan in the format below.

RECOVERY PLAN - <System / Cluster Name>           PLAN-RES-YYYY-NNNN

1. ASSET CONTEXT
   System ID(s)              :
   Tier                      :  (T1–T5)
   Business Owner            :
   Engineering Owner         :
   OT Operator (if OT)       :
   Regulatory Scope          :  CUI / BES / SOX / None
   Dependencies              :  (named upstream/downstream systems)

2. TARGETS
   RTO (cyber)               :
   RPO                       :
   Last Demonstrated RTO     :
   Last Demonstrated RPO     :
   Test Cadence              :

3. BACKUP CONFIGURATION
   Backup System(s)          :
   Frequency                 :
   Immutability Mechanism    :
   Retention                 :
   Backup Credentials/Owner  :  (separately credentialled)
   Encryption / Key Source   :  (CMK ref per [CERG-STD-CR-001](CERG-STD-CR-001_Cryptography_and_Key_Management_Standard.md))

4. RESTORATION PROCEDURE
   Pre-test brief            :  (process / template)
   Restore steps             :  (ordered; reference runbooks)
   Identity / logging / detection re-enable steps
   Validation steps          :
   Tear-down / promotion     :

5. CIP-009 NOTES (if BES)
   R1 / R2 / R3 references   :
   Operator role with restore authority :

6. CUI NOTES (if CUI)
   FIPS crypto evidence      :
   POA&M update path         :

7. SOX NOTES (if SOX in-scope)
   Evidence reuse mapping    :  (per Section 9)

8. INTERFACE TO ENTERPRISE BCP
   BCP Plan reference        :
   Hand-off triggers         :

9. CHANGE LOG

A populated example is maintained in the resilience program library; the template is reused without modification.


12. Document Control

Document ID CERG-STD-RES-001
Version 1.21
Approved By CISO
Next Review Annual / major platform change
Change Log 1.0 - Initial publication. Establishes recovery tiers, backup protection, restoration testing, regulated overlays, and BCP interface.

Source: standards/CERG-STD-RES-001_Cyber_Resilience_and_Backup_Standard.md · Download .md · View on GitHub