It’s happened a few times, I go out to assist with recovery from some sort of failure and the question of database backups results in an uncomfortable pause. One group thought the other group was taking the backups and the other group didn’t know that the backups needed to be taken. And neither group previously attempted to execute the recovery plan before a failure had occurred. The end result is that the miscommunication resulted in a lack of necessary backups.
The most common scenario where this occurs is when the network or backup team is taking the SQL Server backups through a third-party backup tool removing the DBA’s from the “backup” process. In these cases, the DBAs tend to relinquish their responsibility for the backup process placing complete trust that the databases are being backed up.
Trust No On With Backups
If you are a DBA in this situation, don’t trust that the backups are just happening. Confirm it. Every day. Verify it. Every day. It doesn’t matter the level of trust that exists between the DBAs and other groups, as a DBA it is critical that when a database failure occurs that the database can be recovered and that there is full confidence that the needed backups exist. Users will only blame the DBAs when data can’t be recovered because that is what the DBAs are there for. The users won’t care who should have taken the backups only that it doesn’t exist and the DBA let that happen.
As a disclaimer, this isn’t an admonishment of other groups having backup responsibilities. But the reality when other groups are active in the backup process,
Now checking on backups every day can be tedious. In the scenario I’m outlining in this post a separate team is executing the backups. Should the DBAs check each database to make certain that backups are occurring on schedule? Yes, but that doesn’t necessarily mean that each database needs to be opened in SSMS to verify this.
Instead, I have an stored procedure that is executed multiple times a day, typically every few hours, to alert if a full and/or differential backup hasn’t been taken in the last XX number of hours. This way the only concern is the databases who’s backup schedules have fallen outside of SLA.
First the Setup
Before unveiling the new stored procedure for checking backups, there are some other object that will be needed first. These tables are used to configure this and other alerts and maintenance procedures. I’m a big fan of having a single database management code base between all servers, rather than customizing stuff and losing changes from time to time.
IF NOT EXISTS(SELECT * FROM sys.schemas WHERE name = 'Maintenance') EXEC sys.sp_executesql 'CREATE SCHEMA [Maintenance] AUTHORIZATION [dbo]' GO IF OBJECT_ID('Maintenance.ExcludeType') IS NULL BEGIN CREATE TABLE [Maintenance].[ExcludeType] ( [ExcludeTypeID] [tinyint] NOT NULL IDENTITY(1,1), [Description] [varchar](500) NULL, [CreateDate] [smalldatetime] NOT NULL CONSTRAINT DF_ExcludeType_CreateDate DEFAULT GETDATE(), [CreateBy] [varchar](128) NOT NULL CONSTRAINT DF_ExcludeType_CreateBy DEFAULT SUSER_SNAME(), [UpdateDate] [smalldatetime] NOT NULL CONSTRAINT DF_ExcludeType_UpdateDate DEFAULT GETDATE(), [UpdateBy] [varchar](128) NOT NULL CONSTRAINT DF_ExcludeType_UpdateBy DEFAULT SUSER_SNAME(), CONSTRAINT [PK_ExcludeType] PRIMARY KEY CLUSTERED ([ExcludeTypeID] ASC) ON [Data] ) ON [Data] CREATE UNIQUE NONCLUSTERED INDEX UIX_ExcludeType_Description ON Maintenance.ExcludeType ([Description]) ON Data END GO IF NOT EXISTS(SELECT * FROM [Maintenance].[ExcludeType] WHERE [Description] = 'BackupCheck') BEGIN INSERT INTO [Maintenance].[ExcludeType] ([Description]) VALUES ('BackupCheck') END GO IF OBJECT_ID('Maintenance.ExcludeDatabase') IS NULL BEGIN CREATE TABLE [Maintenance].[ExcludeDatabase] ( [DatabaseName] [sysname] NOT NULL, [ExcludeTypeID] [tinyint] NOT NULL, [ExcludeReason] [varchar](255) NOT NULL, [CreateDate] [smalldatetime] NOT NULL CONSTRAINT DF_ExcludeDatabase_CreateDate DEFAULT GETDATE(), [CreateBy] [varchar](128) NOT NULL CONSTRAINT DF_ExcludeDatabase_CreateBy DEFAULT SUSER_SNAME(), [UpdateDate] [smalldatetime] NOT NULL CONSTRAINT DF_ExcludeDatabase_UpdateDate DEFAULT GETDATE(), [UpdateBy] [varchar](128) NOT NULL CONSTRAINT DF_ExcludeDatabase_UpdateBy DEFAULT SUSER_SNAME(), CONSTRAINT [PK_ExcludeDatabase] PRIMARY KEY CLUSTERED ([DatabaseName] ASC, [ExcludeTypeID] ASC) ON [Data] ) ON [Data] END GO IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE name = 'FK_ExcludeDatabase_ExcludeTypeID_FROM_ExcludeType') BEGIN ALTER TABLE [Maintenance].[ExcludeDatabase] WITH CHECK ADD CONSTRAINT [FK_ExcludeDatabase_ExcludeTypeID_FROM_ExcludeType] FOREIGN KEY([ExcludeTypeID]) REFERENCES [Maintenance].[ExcludeType] ([ExcludeTypeID]) END
And then the Store Procedure
The next code window has the stored procedure code for this alert. A few things to point out for the alert. First, the alert assumes that all databases have the same SLA for checking backups. This point of this was that the alert would be configured for the most crucial databases and take the less crucial databases along for the ride. Overly cautious is better than being under prepared when it comes to backups.
Next, the procedure utilizes the table Maintenance.ExcludeDatabases. This table serves as an exclusion table for this alert and some other maintenance processes that will be introduced in later posts. On the cautionary side, I prefer processes that maintain all databases by default and excludes those that a conscious decision to exclude has been made. I’d rather be safe than sorry. Easier to keep my job this way.
To make certain the databases the alert isn’t checking databases that should be naturally ignored, the stored procedure ignore the following items as well:
- tempdb and model are ignored.
- Any database with a source_database_id which indicates that it is a snapshot database.
- Any database in standyby mode which represents scenario where logs are being restored.
- Any database with a RESTORING or OFFLINE state. These databases cannot be backed up in their current state.
Lastly, the procedure accepts a few parameters to configure the alert. The parameters are the following:
- @ToAddress: the person, group, or mailbox that needs to receive the alert.
- @SubjectWarning: a customizable message that can be added to the subject line of the e-mail alert. This was added so that when it is setup on pre-production and production servers the same alert code can be used on both servers but the subject line can include text informing the severity of the issue.
- @HoursSinceBackup: the number of hours that the backups can age before an alert should be generated.
Without further ado, here is the procedure:
IF NOT EXISTS(SELECT * FROM sys.schemas WHERE name = 'Alert') EXEC('CREATE SCHEMA [Alert] AUTHORIZATION [dbo]') GO IF EXISTS(SELECT * FROM sys.procedures WHERE object_id = OBJECT_ID('Alert.BackupCheck')) DROP PROCEDURE [Alert].[BackupCheck] GO /*================================================================================ Procedure: Alert.BackupCheck Author: Jason Strate Date: June 1, 2009 Synopsis: This procedure check the datbase backup is done within a number of hours passed into the procedure. ================================================================================ Revision History: Date: By Description ---------------------------------------------------------------------------------- ================================================================================*/ CREATE PROCEDURE [Alert].[BackupCheck] ( @ToAddress varchar(255) ,@SubjectWarning varchar(255) = '' ,@HoursSinceBackup smallint = 36 ) AS SET NOCOUNT ON DECLARE @SQL varchar(max) ,@Subject nvarchar(255) ,@Body nvarchar(max) ,@RowCount int ,@DBCount int SELECT @Subject = @SubjectWarning + Space(1) + @@Servername + ': Check Database Backup Not Done in the past ' + Convert(varchar, @HoursSinceBackup) + ' Hours.' ,@Body = 'The following database files are not backed up with in ' + Convert(varchar, @HoursSinceBackup) + ' hour(s). Please determine the cause of missing Backup(s).' + Char(13) ,@SQL = 'Set NoCount On; Select * From tempdb..tmpAlertResults' BEGIN TRY IF OBJECT_ID('tempdb..tmpAlertResults') IS NOT NULL DROP TABLE tempdb..tmpAlertResults ;WITH Excludes AS ( SELECT DatabaseName FROM Maintenance.ExcludeDatabase ed INNER JOIN Maintenance.ExcludeType et ON ed.ExcludeTypeID = et.ExcludeTypeID WHERE et.Description = 'BackupCheck' ) SELECT CONVERT(varchar, d.Name) AS [Database Name], MAX(backup_start_date) AS [Backup Start Date], MAX(backup_finish_date) AS [Backup Finish Date], DATEDIFF(hour, MAX(backup_start_date),GETDATE()) AS [Hour Since Backup] INTO tempdb..tmpAlertResults FROM sys.databases d LEFT OUTER JOIN msdb..backupset bs On d.Name = bs.database_name ANd bs.type IN ('D','I') LEFT OUTER JOIN Excludes e On d.Name = e.DatabaseName WHERE d.Name NOT IN ('tempdb', 'model') -- Back ups not necessarily needed AND source_database_id IS NULL -- If not NULL then database is a snapshot AND is_in_standby = 0 -- is_in_standby is read-only log shipped AND state_desc NOT IN ('RESTORING','OFFLINE') -- mirroring flag, offline check GROUP BY d.Name HAVING DATEDIFF(hour, MAX(backup_start_date),GETDATE()) > @HoursSinceBackup OR MAX(backup_start_date) IS NULL ORDER BY MAX(backup_start_date) ASC IF EXISTS (SELECT * FROM tempdb..tmpAlertResults) BEGIN Exec msdb.dbo.sp_send_dbmail @recipients = @ToAddress, @subject = @Subject, @body = @Body, @execute_query_database = 'DBADiagnostics', @query = @SQL, @query_result_separator = '|' END END TRY BEGIN CATCH DECLARE @ErrState int, @ErrSev int, @ErrMsg varchar(max) SELECT @ErrSev = ERROR_SEVERITY(), @ErrState = ERROR_STATE(), @ErrMsg = ERROR_MESSAGE() RAISERROR (@ErrMsg, @ErrSev, @ErrState) END CATCH
If anyone knows how to get the last modified date for a database that would be very helpful. I’d like to add additional logic to the procedure that ignores databases that have been backed up at least once since being made read-only.