In many SQL Server environments, DBAs configure SQL Agent jobs with retry attempts (which is a smart move to handle transient issues gracefully). But with it comes a hidden pitfall that often goes unnoticed:
Retry-enabled job steps can silently overwrite logs or cause runaway log growth, depending on how the output file is configured.
The culprit? Improper use of append settings and missing time-based tokens in output file paths.
What exactly is the risk?
When an Agent job step:
- Has retries enabled
- Writes to an output file
- Either:
- Doesn’t append to the output file (default behavior), or
- Appends, but uses a file name without time-based tokens
… you’re at risk of:
- Losing logs from earlier attempts (due to overwrite), or:
- Creating massive logs that grow uncontrollably (if appending without date/time tokens)
In other words, for job steps that have retries and an output file configured:
If append is disabled, then every failure and retry will clear out the previous attempt’s output. Basically, causing a “diagnostic data-loss” for when you want to investigate and troubleshoot your occasionally failing job steps.
If append is enabled, but there are no time-based tokens in the output file path, then every execution and every retry of this job step will append into the exact same output file, causing it to grow and grow without end every time the job runs.
Here’s how it looks like in SSMS when retry is enabled but append is disabled:
Or, when append is enabled, but the output file doesn’t contain time-based tokens:
Why this matters: Real-world impact
If unaddressed, this configuration can lead to:
- 🚫 Log data loss during retries or successive job runs
- 🐘 Unmanageable log file sizes when append is enabled without time-based tokens
- 🕵️ Delayed troubleshooting when logs are missing or tangled
Wait, what are “Time-Based Tokens”?
Every SQL Agent job step can have an “Output file” setting, and this output file path supports a group of built-in “tokens“. These “tokens” are essentially placeholders for a variety of system-generated information, such as the job name, job ID, step name, step ID, and also time-based tokens such as the current date, current time, job start date, and job start time.
To learn more about SQL Agent job tokens, please check out Use Tokens in Job Steps at Microsoft Learn.
If this is your first time learning about these tokens, you should also know that different versions of SQL Server support different tokens. With almost every new version of SQL Server, new tokens are being added to improve the usefulness of automatically generated output file paths.
Here is an example of a “standardized” output file path with tokens suitable for SQL Server 2016 and newer:
$(ESCAPE_SQUOTE(SQLLOGDIR))\$(ESCAPE_SQUOTE(JOBNAME))_$(ESCAPE_SQUOTE(STEPNAME))_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt
It just so happens that we have in our GitHub repository several examples of “standardized” output file paths that make use of tokens, depending on the SQL Server version that you have. You can find it here: standard_jobstep_output_logfile
Okay, so how do I detect it?
We developed a SQL script that scans msdb.dbo.sysjobsteps
and identifies all steps with potentially dangerous logging configurations. Check it out:
This T-SQL script scans your msdb
metadata and flags risky configurations. It returns a detailed description, and even a RemediationCommand
column that you can use to quickly and easily fix each case.
Safe remediation
Each row in the output includes a sp_update_jobstep
remediation command.
But please be sure to review the commands before executing them, since adding missing time-based tokens is a bit flaky (specifically, when your file path doesn’t end with .txt
).
This allows you to:
- Review each fix
- Batch approve changes
- Prevent accidental mass modifications
These commands:
- Enable append if it’s not already enabled, and
- Insert safe, time-based tokens before the
.txt
suffix (if they’re missing)
For example:
-- REVIEW BEFORE EXECUTION: EXEC msdb.dbo.sp_update_jobstep @job_name = N'ARCHIVE_JOB', @step_id = 1, @output_file_name = N'C:\MSSQL\Log\ArchiveJob_$(ESCAPE_SQUOTE(DATE))_$(ESCAPE_SQUOTE(TIME)).txt', @flags = 2
Continuing our example from the previous screenshots, this is how a “proper” job step configuration would look like after applying the fix:
Once your job step is configured in this way:
- ✔ You’ll be able to see output of job steps that failed, even if they eventually succeeded thanks to retries.
- ✔ Your output will not bloat a single file but spread across different files based on time of execution.
By the way, don’t forget to have a job that cleans up your .txt
output files!
Ola Hallengren’s maintenance solution creates such a job by default, but you gotta schedule it for it to work.
For more details, see Keep Your MSDB Clean – Output .txt Files.
Conclusion
This kind of subtle misconfiguration is exactly the sort of thing that slips through routine job audits and when creating jobs without much thinking. But by proactively detecting and remediating it, you ensure:
- Safer, more reliable job logging
- Faster troubleshooting during failures
- Lower operational risk in production
I hope I helped you learn a new not-so-obvious pitfall that you may want to watch out for.
Have you encountered similar issues with SQL Agent? Got a war story or improvement idea? Drop it in the comments below!