Automatically Enable or Disable Jobs Based on HADR Role

Eitan Blumin

8 years ago

Imagine the following: You have a couple or more SQL Servers with some sort of High Availability solution between them (AlwaysOn Availability Groups, Database Mirroring, etc.). You also have a bunch of scheduled jobs which you need to be run on the Primary server.

But wait…. How would you make these jobs run only on the Primary server? After all, if they try to run on the Secondary, they would fail (whether because the database is inaccessible or because it’s read-only). Additionally, you would need to consider the possibility of a failover where the Primary and Secondary servers would switch roles.

This means that you would have to, first, create these jobs on both servers, and implement some sort of mechanism that would detect, for each job, whether the instance it’s being executed at is currently the Primary.

There are a few ways to go about doing this.

[For the sake of this article, let’s ignore the fact that most people don’t even think about it, and leave all of their important jobs on the Primary server only, while ignoring the risks of what would happen when their Primary server crashes and fails over to the Secondary]

The “Meh…” Solution: Adding an IF check in each job

One way is to add some sort of IF in the beginning of each job, which would check whether the current server is the Primary, otherwise stop the job. I personally don’t like this approach at all, because it means we’d have these jobs executing on all servers all the time, filling up the MSDB job history with a lot of useless junk (who cares about thousands of executions that didn’t do anything? We want to see what the jobs actually DID!).

I would prefer not to have these jobs run at all, if the current server isn’t the Primary. Also, I would prefer to have some sort of solution which would be easy to maintain. Could I really guarantee that whoever’s writing and/or updating the jobs would remember to put that IF check every time?

So, the best I could come up with is the following solution:

The “Awesome!” Solution: The Master Control Job

Instead of modifying the jobs themselves, we would have a sort of “Master Control Job” which would exist on both servers, and that job would know to automatically disable or enable the jobs, based on the current HADR role of the server.

Ideally, this Master Control Job would use some sort of predefined list of job names, which would be the specific jobs that it would control (more often than not, you would also have jobs which you would not want to disable or enable automatically).

For this purpose, I created a couple of scripts (click on the following links to see the code in my GitHub Gist):

Change Job Status Based on Availability Group Role

Change Job Status Based on Database Mirroring Role

They both use the same principles, essentially. Only difference is the type of HADR solution being in use.

These two scripts contain a stored procedure, which you would need to create in some sort of impartial database (which is not part of the HADR solution). It could also be one of the system databases (such as MSDB or Master)… TEMPDB is not a good choice for this, because it would be dropped every time the SQL service is restarted.

Configuring The Jobs List

Within these procedures, there’s a list of job names that would be automatically disabled or enabled. For example:

DECLARE @CurrJob NVARCHAR(500)
DECLARE JobsToUpdate CURSOR
READ_ONLY FORWARD_ONLY
FOR
select name
from msdb..sysjobs
where name in
(
'Job name 1',
'Job name 2',
'Job name 3',
'Job name 4',
'Job name 5'
)
and [enabled] <> @NeedToEnableJobs

Before deploying and using these stored procedures, you would need to update this list of job names, to whatever relevant jobs that you’re using in your servers (don’t forget to create them in BOTH servers).

PRO TIP

If you know how to use Table-Valued Parameters, you could make a few changes to the stored procedures so that they would receive the jobs list as a parameter, instead of being hard-coded within the procedure itself.

Choosing the Main HADR Database

The stored procedures both receive a parameter called @DBName. This parameter should receive the name of the database which should be used for determining which server is currently the Primary.

The reason for this is that it’s possible to have several different HADR solutions on the same SQL Server instance. For example, you could have a SQL Server which has both Availability Groups, as well as Database Mirroring. Or, you could have a SQL Server which has several different Availability Groups, each with their own database(s).

This is why we need to choose which database should serve as the “main” database. Because it would be the database after which all the jobs would “follow”.

This also means that, if we have several different HADR solutions, we could create several different “Master Control Jobs”, each controlling a different set of jobs, based on a different “main” database.

Creating the Master Control Job

Next step would be to create a scheduled job (again, on BOTH servers) which would run the relevant stored procedure every so often.

The schedule of this job would mostly depend on the most frequent schedule of the jobs you’re controlling. For example, let’s say that one of your jobs is running every 10 minutes. Then the Master Control Job would need to be run at least as often as every 10 minutes. But if you also have a job which runs, let’s say, every 1 minute, then it means you’re at risk of missing scheduled runs of this job, between when the HADR fails over, and until the Master Control Job enables all the jobs on the new Primary server. So this means that, in this case, you’d need to schedule the Master Control Job to be run at least as often as every 1 minute.

Here is an example create script of such a job:

USE msdb
GO
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N'HADR Control Jobs', 
		@enabled=1, 
		@notify_level_eventlog=0, 
		@notify_level_email=0, 
		@notify_level_netsend=0, 
		@notify_level_page=0, 
		@delete_level=0, 
		@description=N'No description available.', 
		@category_name=N'[Uncategorized (Local)]', 
		@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'control', 
		@step_id=1, 
		@cmdexec_success_code=0, 
		@on_success_action=1, 
		@on_success_step_id=0, 
		@on_fail_action=2, 
		@on_fail_step_id=0, 
		@retry_attempts=0, 
		@retry_interval=0, 
		@os_run_priority=0, @subsystem=N'TSQL', 
		@command=N'EXEC msdb..ChangeJobStatusBasedOnMirroring @DBName = ''MyMainDatabase''', 
		@database_name=N'master', 
		@flags=4
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Master Control Schedule', 
		@enabled=1, 
		@freq_type=4, 
		@freq_interval=1, 
		@freq_subday_type=4, 
		@freq_subday_interval=10, 
		@freq_relative_interval=0, 
		@freq_recurrence_factor=0, 
		@active_start_date=20180101, 
		@active_end_date=99991231, 
		@active_start_time=0, 
		@active_end_time=235959
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
    IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

This job would run the ChangeJobStatusBasedOnMirroring procedure within the MSDB database, with “MyMainDatabase” as the name of the “main” database, and it would be scheduled to run every 10 minutes.

Another PRO TIP

Theoretically, it should be possible to implement this logic even better: Instead of a scheduled job, the “automatic fail over” of jobs could be “event-based”. In other words: The jobs would “fail over” only when the HADR solution itself fails over.

In the case of Database Mirroring, you could create an SQL Alert which would start the same “Master Control Job” when a State Change event is fired.

Similarly, in the case of Availability Group, you could also create an SQL Alert which would start the “Master Control Job” when a Role Change event is fired.

Starting the “Master Control Job” based on events instead of schedules should be something preferable, if you want to avoid cluttering MSDB with useless job history junk.

Conclusion

Once you create the procedure and scheduled (or event-based) job on both servers, you would have a list of jobs that know how to automatically “fail over” whenever the HADR fails over.

You can find these scripts here:

See the gist on github.

The scripts are available in my GitHub Gists

And in my TechNet Gallery here:

https://gallery.technet.microsoft.com/scriptcenter/Automatically-Enable-or-ae2c7ec2