The full disaster recovery policy and drills are logged under Feature 210382: Add a system for testing disaster recovery for Azure client production data. Once complete, this article will be updated with the "in place" policy.
Contents
- Azure SQL Databases
- Azure Storage Accounts
- Reporting Back to Clients
- Client Access to Storage Account
- Client Access to Geo-Replication (UK West) Database
- Client Business Continuity
Azure SQL Databases
Hosted in Microsoft Azure UK South and replicated to UK West.
Backups
Current Policy
- Automated Backups: Azure SQL Database provides automatic backups by default (no action required).
- Point-in-Time Restore Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool).
- Frequency: Restore verification (monthly).
- Reporting: Store results - for distribution to clients on request.
Planned Improvement
- Point-In-Time-Restore Verification Reporting: Added to client specific audit events - User Story 275891.
- Routine Backup Shipping: Copies of the latest backup are shipped to the client-specific storage account - User Story 254857.
- Restore Stored Backup Verification: Test restoring backups to verify data integrity and recovery processes (to split SQL pool) - User Story 254857.
- Frequency: Backup shipping (weekly), restore verification (monthly).
- Reporting: Store results - for distribution to clients on request, added to client specific audit events - User Story 254857.
Performance Monitoring and Tuning
Current Policy
- Performance Monitoring: Use Azure SQL Analytics, Query Performance Insight, or other monitoring tools to track performance metrics.
- Index Maintenance: Rebuild indexes (5 databases chosen to re-index per night).
- Statistics: UpdateStats.
- Query Optimisation: Identify and optimise long-running queries.
- Frequency: Update Stats (daily), index maintenance (weekly), query optimisation (monthly).
- Reporting: Log slow-running queries to the Compucare 8 team.
Planned Improvements
- Index Maintenance: Rebuild Indexes carried out more frequently per database (re-indexing targets chosen using client audits of last re-indexing and scheduled every week consistently) - User Story 275903.
- Alerting: Expanded Query Performance alerting alongside Database and Elastic Pool Alerts - User Story 275907.
Security Management
TBC
Database Maintenance
Current Policy
- Full Integrity Check: Run DBCC CHECKDB.
- Update Statistics: Ensure statistics are updated to maintain query performance.
- Frequency: Full integrity check (monthly), Update statistics (daily).
Disaster Recovery Planning
Planned Policy
- DR Drills: Conduct disaster recovery drills to test failover and recovery procedures - User Story 212928.
- Review DR Plan: Update and review the "disaster recovery plan" based on drill outcomes.
- Frequency: DR drills (annually), DR plan review (annually).
RTO / RPO Objectives
- Recovery Time Objective (RTO): Using Azure SQL, the combination of Point in Time Restore (PITR) capabilities and real-time replication to UK West from UK South ensures that our databases experience minimal downtime. This setup allows for rapid recovery in the event of a disruption, keeping the RTO within a target of less than 15 minutes.
- Recovery Point Objective (RPO): With our Azure SQL we adopt Point in Time Restore (PITR) capabilities with a 31-day retention period, coupled with real-time replication to UK West from UK South. This configuration minimises data loss by ensuring that data can be restored to any point within the last 31 days, targeting the RPO of less than 1 hour.
Azure Storage Accounts
- Replication: Replicated across different geographical locations (Geo-Redundant Storage).
- Frequency: One-time setup with periodic review (annually).
Regular Backups
- Backup Strategy: Storage accounts are backed up to an Azure Backup Vault with a 30-day retention policy. Compucare also has soft delete enabled, allowing for user-configurable retention.
- Frequency: As per RPO (Recovery Point Objective) requirements.
Monitoring and Alerts
- Metrics and Logs: Enable and review metrics and logs for storage accounts to monitor usage and performance, and to detect anomalies.
- Alerts: Set up alerts for critical metrics and events (e.g., storage capacity, transaction rates).
- Frequency: Review of alerts and logs (weekly).
Data Integrity Checks
- Azure Blob Storage: Use features like Azure Blob Storage's lifecycle management policies to automatically check and maintain data integrity.
- Frequency: As per policy schedule (weekly).
Disaster Recovery Drills
- Failover Testing: Conduct failover testing to ensure that data can be successfully replicated and accessed from the secondary region.
- Recovery Procedures: Document and test the recovery procedures to ensure they are effective and up-to-date
- Frequency: Drill (annually).
Geo-Replication Testing
- Read-Access Geo-Redundant Storage (RA-GRS): Regularly test accessing data from the secondary region in read-only mode to ensure it is available.
- Frequency: Test data access (quarterly).
Reporting Back to Clients
- Frequency: Report of checks and results of all of the above (quarterly).
Client Access to Storage Account
- In the unlikely event of Streets Heaver becoming insolvent, all Streets Heaver Azure Subscriptions will enter a Disabled state due to non-payment. This state is read-only, allowing data to still be downloaded.
- Streets Heaver will provide monthly Shared Access Signature (SAS) keys to authorized parties. These SAS keys grant read access to the clients' Azure storage accounts, including the latest shipped backup. Each month, newly generated SAS keys will be shared with authorised parties, ensuring continuous access to the storage account.
- Clients can use these SAS keys to download their data at any time, both before and after the resources are disabled. However, it is important to act quickly in the event of insolvency, as your data will only be retained for a limited period, typically up to 90 days. After this retention period, the data may be permanently deleted.
Client Access to Geo-Replication (UK West) Database
- Read access is available to the Compucare (and other) database(s) on request.
Client Business Continuity
- It is recommended that a client uses Streets Heaver's Disaster Recovery Policy in conjunction with their own to build standard operating procedures which cover theoretical incidents of varying escalating severity.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article