Site icon Eitan Blumin's blog

Hidden Risks in SQL Agent Job Logging: How Retry Settings Can Compromise Your Logs

In many SQL Server environments, DBAs configure SQL Agent jobs with retry attempts (which is a smart move to handle transient issues gracefully). But with it comes a hidden pitfall that often goes unnoticed:

Retry-enabled job steps can silently overwrite logs or cause runaway log growth, depending on how the output file is configured.

The culprit? Improper use of append settings and missing time-based tokens in output file paths.

What exactly is the risk?

When an Agent job step:

… you’re at risk of:

In other words, for job steps that have retries and an output file configured:

If append is disabled, then every failure and retry will clear out the previous attempt’s output. Basically, causing a “diagnostic data-loss” for when you want to investigate and troubleshoot your occasionally failing job steps.

If append is enabled, but there are no time-based tokens in the output file path, then every execution and every retry of this job step will append into the exact same output file, causing it to grow and grow without end every time the job runs.

Here’s how it looks like in SSMS when retry is enabled but append is disabled:

Or, when append is enabled, but the output file doesn’t contain time-based tokens:

Why this matters: Real-world impact

If unaddressed, this configuration can lead to:

Wait, what are “Time-Based Tokens”?

Every SQL Agent job step can have an “Output file” setting, and this output file path supports a group of built-in “tokens“. These “tokens” are essentially placeholders for a variety of system-generated information, such as the job name, job ID, step name, step ID, and also time-based tokens such as the current date, current time, job start date, and job start time.

To learn more about SQL Agent job tokens, please check out Use Tokens in Job Steps at Microsoft Learn.

If this is your first time learning about these tokens, you should also know that different versions of SQL Server support different tokens. With almost every new version of SQL Server, new tokens are being added to improve the usefulness of automatically generated output file paths.

Here is an example of a “standardized” output file path with tokens suitable for SQL Server 2016 and newer:

$(ESCAPE_SQUOTE(SQLLOGDIR))\$(ESCAPE_SQUOTE(JOBNAME))_$(ESCAPE_SQUOTE(STEPNAME))_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt

It just so happens that we have in our GitHub repository several examples of “standardized” output file paths that make use of tokens, depending on the SQL Server version that you have. You can find it here: standard_jobstep_output_logfile

Okay, so how do I detect it?

We developed a SQL script that scans msdb.dbo.sysjobsteps and identifies all steps with potentially dangerous logging configurations. Check it out:

The script is also available in our GitHub repository at microsoft-dbas-club.

This T-SQL script scans your msdb metadata and flags risky configurations. It returns a detailed description, and even a RemediationCommand column that you can use to quickly and easily fix each case.

Safe remediation

Each row in the output includes a sp_update_jobstep remediation command.

But please be sure to review the commands before executing them, since adding missing time-based tokens is a bit flaky (specifically, when your file path doesn’t end with .txt).

This allows you to:

These commands:

For example:

-- REVIEW BEFORE EXECUTION:
    EXEC msdb.dbo.sp_update_jobstep 
        @job_name = N'ARCHIVE_JOB', 
        @step_id = 1, 
        @output_file_name = N'C:\MSSQL\Log\ArchiveJob_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt', 
        @flags = 2

Continuing our example from the previous screenshots, this is how a “proper” job step configuration would look like after applying the fix:

Once your job step is configured in this way:

By the way, don’t forget to have a job that cleans up your .txt output files!
Ola Hallengren’s maintenance solution creates such a job by default, but you gotta schedule it for it to work.

For more details, see Keep Your MSDB Clean – Output .txt Files.

Conclusion

This kind of subtle misconfiguration is exactly the sort of thing that slips through routine job audits and when creating jobs without much thinking. But by proactively detecting and remediating it, you ensure:

I hope I helped you learn a new not-so-obvious pitfall that you may want to watch out for.

Have you encountered similar issues with SQL Agent? Got a war story or improvement idea? Drop it in the comments below!

Exit mobile version