I think its safe to say that cron is relied upon at almost every tech company to schedule mission critical scripts. Whether that be hourly log cleanup, or monthly financial reports, cron is critical to the system’s function. Traditionally scripts are scheduled down to the minute level. Today I want to talk about how we sometimes leverage cron to behave in a slightly different way: to replace long running daemons.
“What?” You say. “A cronjob can’t be a daemon! It’s not always running! It can’t share state among runs and definitely can’t run more often than once a minute!” And yes. You are right. I am not here to advocate switching all of your daemons to cronjobs. Also, there are some applications that definitely can’t or shouldn’t be cronjobs. If you truly need real-time processing, cron jobs as a daemon, might not be right for you. If you have a ton of shared state that is much more efficient to keep around in memory, cronjobs as a daemon might not be right for you.
What does Cronjob As A Daemon even look like?
Turns out, pretty much like a normal cron entry, except with more sleep! Lets say for instance you wanted to keep up a cache up to date. You could write a daemon for it, but by putting sleeps in front of the invocation we can get our caching script to run every five seconds!
* * * * * python Data/refresh_cache.py * * * * * sleep 5; python Data/refresh_cache.py * * * * * sleep 10; python Data/refresh_cache.py * * * * * sleep 15; python Data/refresh_cache.py ... ... * * * * * sleep 40; python Data/refresh_cache.py * * * * * sleep 45; python Data/refresh_cache.py * * * * * sleep 50; python Data/refresh_cache.py * * * * * sleep 55; python Data/refresh_cache.py
It might seem a bit verbose, but it does get around the once a minute restriction. We’re not changing the actual crontab entry very often, but this duplication is open to programmer error, so be careful editing the crontab.
How do I benefit?
Since a fresh process is spun up at every invocation, we can’t share state between runs (in memory that is). While for some applications this might be limiting, reducing shared state makes our programs easier to reason about. Also, since its not sharing state, we don’t have to worry about memory leaks proliferating.
Deploying New Code is a Breeze
Deploys are one of my favorite reasons for using cronjobs as a daemon. This may be a bit specific to our deploys, but if you have a
current folder which points to a symlink of the most recently deployed code, this will be of interest for you. Whenever you have new code, it will automatically be ran when the new version of the code is deployed. No longer do you have to worry about killing the old daemon and restarting it so that code changes go into effect. Since the cron is pointing to the latest file, whenever a new deploy happens it will run with the latest code the next time it runs.
No restart-if-killed logic needed
If you’ve ever written a daemon, there is a pretty good chance that you have had this daemon die. To manage this, you will probably use some sort of external monitoring to check that the process is still alive and restart on failure. Using a crontab as a daemon takes away this extra level of support needed. Now, restarting a daemon is not a tricky thing to do, but it is one more piece that has to be managed. With Crontab As A Daemon, there is always another task coming up.
There are some additional complexities that crontab as a daemon introduces.
Multiple Executions Running at once
If you schedule a task to run every 5 seconds, but it takes 15 seconds to complete, crontabs will start running into each other. Depending on the application, this may cause many problems: state could become corrupted, one task could affect another task, multiple processes could cause all of them to grind to a halt. One way to solve the problem is to add a job level lock to prevent multiple executions of the task being ran at the same time.
Truly Real-time Execution is difficult or sloppy
If you truly need real-time updating, crontab entries may not be right for you. However, I would argue that most of the time you think you need real-time data updating, you don’t actually need real-time processing. Usually being up to date within a few seconds suffices.
In the end, I find that in some cases, an entry in a crontab is much simpler than writing and managing a always running daemon. Hopefully this was helpful! Let us know how you manage your background jobs below.