Cheatham is a service for monitoring cron jobs (see guide) and similar periodic processes:
Cheatham works as a dead man's switch for processes that need to run continuously or on a regular, known schedule. Some examples of jobs that would benefit from Cheatham monitoring:
Cheatham is not the right tool for:
A Check represents a single service you want to monitor. For example, when monitoring cron jobs, you would create a separate check for each cron job to be monitored. Each check has a unique ping URL, schedule, and associated integrations. For the available configuration options, see Configuring checks.
Each check is always in one of the following states, depicted by a status icon:
Ping URL. Each check has a unique Ping URL. Clients (cron jobs, background workers, batch scripts, scheduled tasks, web services) make HTTP requests to the ping URL to signal a start of the execution, a success, or a failure.
Cheatham supports two ping URL formats:
hc.zachcheatham.me/ping/<uuid>hc.zachcheatham.me/ping/<project-ping-key>/<name-slug>You can append /start, /fail or /<exitcode> to the base ping URL to send
"start" and "failure" signals. The "start" and "failure" signals are optional.
You don't have to use them, but you can gain additional monitoring insights
if you do use them. See Measuring script run time and
Signaling failures for details.
You should treat check UUIDs and project Ping keys as secrets. If you make them public, anybody can send telemetry signals to your checks and mess with your monitoring.
Read more about Ping URLs in Pinging API.
Grace Time is one of the configuration parameters you can set for each check. It is the additional time to wait before sending an alert when a check is late. Use this parameter to account for minor, expected deviations in job execution times.
When a check is considered late depends on whether the check uses a simple or cron schedule, and whether or not you are tracking job durations using the "start" events.
For simple schedules, the check is late when the checks's configured period has passed. For example, consider a periodic task that should run every hour, and the gaps between runs should not deviate by more than 5 minutes (Period = 1 hour, Grace Time = 5 minutes). And let's say the last successful ping arrived at 12:00.
For cron and OnCalendar schedules, the check enters the late state at the exact
moment when the current wall clock time matches the schedule. Let's consider a cron
job with the schedule 10 * * * * (10 minutes past every hour) and grace time of 5 minutes.
And let's say the last successful ping arrived at 12:30.
If you use "start" signals to measure job execution time, Grace Time also sets the maximum allowed time gap between "start" and "success" signals. If a job sends a "start" signal but does not send a "success" signal within grace time, Cheatham will assume failure and send out alerts.
An Integration is a specific method for delivering monitoring alerts when a check's change states. Cheatham supports many types of integrations: email, webhooks, SMS, Slack, PagerDuty, etc. You can set up multiple integrations. For each check, you can specify which integrations it should use.
For more information on integrations, see Configuring notifications.
Project. To keep things organized, you can group checks and integrations in Projects. Your account starts with a single default project, but you can create additional projects as needed. You can transfer existing checks between projects while preserving their configuration and ping URLs.
Each project has a configurable name, a separate set of API keys, and a separate project team. The project's team is the set of people you have granted read-only or read-write access to the project.
For more information on projects, see Projects and teams.