Hidden Risks in SQL Agent Job Logging: How Retry and Output Settings Can Compromise Your Logs

Eitan Blumin

2 months ago

In many SQL Server environments, DBAs configure SQL Agent jobs with retry attempts (which is a smart move to handle transient issues gracefully). But with it comes a hidden pitfall that often goes unnoticed:

Retry-enabled job steps can silently overwrite logs.

And improper output file settings can cause runaway file growth.

The culprit? Improper use of append settings and missing time-based tokens in output file paths.

What exactly is the risk?

When an Agent job step:

Writes to an output file
Either:
- Has retries enabled and doesn’t append to the output file, or
- Uses a file name without time-based tokens

… you’re at risk of:

Losing logs from earlier attempts (due to overwrite), or:
Creating massive logs that grow uncontrollably (if appending without date/time tokens)

In other words, if append is disabled and retries are enabled, then every failure and retry will clear out the previous attempt’s output. Basically, causing a “diagnostic data-loss” for when you want to investigate and troubleshoot your occasionally failing job steps.

If there are no time-based tokens in the output file path, then every execution and every retry of this step will append into the exact same output file, causing it to grow and grow without end every time the job runs.

Here’s how it looks like in SSMS when retry is enabled but append is disabled:

Or, when append is enabled, but the output file doesn’t contain time-based tokens:

Why this matters: Real-world impact

If unaddressed, this configuration can lead to:

Log data loss during retries or successive job runs
Unmanageable log file sizes when append is enabled without time-based tokens
Delayed troubleshooting when logs are missing or bloated

Wait, what are “Time-Based Tokens”?

Every SQL Agent job step can have an “Output file” setting, and this output file path supports a group of built-in “tokens“. These “tokens” are essentially placeholders for a variety of system-generated information, such as the job name, job ID, step name, step ID, and also time-based tokens such as the current date, current time, job start date, and job start time.

To learn more about SQL Agent job tokens, please check out Use Tokens in Job Steps at Microsoft Learn.

If this is your first time learning about these tokens, you should also know that different versions of SQL Server support different tokens. With almost every new version of SQL Server, new tokens are being added to improve the usefulness of automatically generated output file paths.

Here is an example of a “standardized” output file path with tokens suitable for SQL Server 2016 and newer:

$(ESCAPE_SQUOTE(SQLLOGDIR))\$(ESCAPE_SQUOTE(JOBNAME))_$(ESCAPE_SQUOTE(STEPNAME))_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt

It just so happens that we have in our GitHub repository several examples of “standardized” output file paths that make use of tokens, depending on the SQL Server version that you have. You can find it here: standard_jobstep_output_logfile

Okay, so how do I detect it?

We developed a SQL script that scans msdb.dbo.sysjobsteps and identifies all steps with potentially dangerous logging configurations. Check it out:

See the gist on github.

The script is also available in our GitHub repository at microsoft-dbas-club.

This T-SQL script scans your msdb metadata and finds risky configurations. It returns a detailed description, and even a RemediationCommand column that you can use to quickly and easily fix each case.

Safe remediation

Each row in the output includes a sp_update_jobstep remediation command.

But please be sure to review the commands before executing them, since adding missing time-based tokens is a bit flaky (specifically, when your file path doesn’t end with .txt).

This allows you to:

Review each fix
Batch approve changes
Prevent accidental mass modifications

These commands:

Enable append if it’s not already enabled, and
Insert safe, time-based tokens before the .txt suffix (if they’re missing)

For example:

-- REVIEW BEFORE EXECUTION:
    EXEC msdb.dbo.sp_update_jobstep 
        @job_name = N'ARCHIVE_JOB', 
        @step_id = 1, 
        @output_file_name = N'C:\MSSQL\Log\ArchiveJob_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt', 
        @flags = 2

Continuing our example from the previous screenshots, this is how a “proper” job step configuration would look like after applying the fix:

Once your job step is configured in this way:

✔ You’ll be able to see output of job steps that failed, even if they eventually succeeded thanks to retries.
✔ Your output will not bloat a single file but spread across different files based on time of execution.

Don’t forget about retention depth!

By the way, don’t forget to have a job that cleans up your .txt output files!
Ola Hallengren’s maintenance solution creates such a job by default, but you gotta schedule it for it to work.

For more details, see Keep Your MSDB Clean – Output .txt Files.

Conclusion

This kind of subtle misconfiguration is exactly the sort of thing that slips through routine job audits and when creating jobs without much thinking. But by proactively detecting and remediating it, you ensure:

Safer, more reliable job logging
Faster troubleshooting during failures
Lower operational risk in production

I hope I helped you learn a new not-so-obvious pitfall that you may want to watch out for.

Have you encountered similar issues with SQL Agent? Got a war story or improvement idea? Drop it in the comments below!

NOTE: AI was used for some formatting of this post. But the content is original 😊.