You are here

Drupal Cron: A Better Way

Corey Pennycuff's picture
BenBois_Clock.png
cron: The Unix clock daemon that executes
commands at specified dates and times according to
instructions in a "crontab" file.
Image from openclipart.org.

Drupal is an amazing system, there is no denying it. Every once in a while, however, small problems come up that no one anticipates, and your entire system grinds to a halt. This recently happened with me when needing for Drupal to run some heavy processes in cron. It croaked. Drupal got stuck in a loop of trying to run the same ill-fated task over and over, and my site stopped updating. Drupal needs cron, and all the normal ways of running cron were failing me.

Cron is an interesting creature. Many new sysadmins are mystified by its cryptic symbology and voodoo-like power, so they avoid it. Some shared hosting environments don't even allow users to set cron jobs! Back in the old days (Drupal 5 & 6), we solved this problem by using the Poorman's Cron module, which worked by starting the cron jobs at the end of a page load, after information was already sent to the user. It was virtually transparent to the end user, and this method was so popular that it made its way into Drupal 7 as a core module.

The Problem

The problem with this "tacked-on to the end of a page load" method was two-fold. (1.) What if the cron job took a long time? (2.) What if nobody ever visited the site? Most sites, however, are not running processes that are time-critical, and so most people overlook the ad hoc solution's shortcomings and are never the worse for wear. This was me... until I started building more serious sites.

I actually had two problems, either one of which necessitated the search for this solution. First, I began using the Boost module on my website. (As an aside, it is amazing, and you should be running it on all of your websites!) The problem with Boost is that, if all traffic is hitting cached pages, then cron will never be run to expire those pages! Second, I had a lot of things to do during cron (think in terms of minutes or hours) and I couldn't be limited to the standard timeout of normal page loads. (If you are wondering what could take hours of cron time, I was using Storage API and was transferring multiple gigs of files during the cron, as well as subscription-based credit card processing.)

I have read all of D.org's info on running cron in different ways, and none of them seemed to fit my needs. My requirements were as follows:

  1. Only one cron job could run at a time.
  2. The cron job could run for however long it needed to.
  3. The cron job needed to be run entirely server-side, bypassing the webserver (i.e., no curl or wget).
  4. It had to support multiple installations.
  5. It had to be easy to modify.

My Solution

Full disclaimer: I referenced Ryan Solomon's blog post on the subject, but made my own alterations. I chose to wrap my execution using Perl so that I could detect simultaneous processes without bootstrapping Drupal or relying on its mechanisms.

My approach comes in two parts: Perl and PHP. I use Perl to ensure singular execution, and I use PHP to bootstrap Drupal with the necessary environment changes.

I decided to put all of my code into a folder that I created called /var/cron. There's nothing special about it, and you can use your own folder, just update the paths within the scripts to match.

Perl

It goes without saying that you need Perl installed on your machine. You will also need a few modules. I prefer to use cpanminus for installing modules. In order to compile the modules, you will also need a compiler installed. All of this can generally be done with 3 commands:

apt-get install perl gcc
cpan App::cpanminus
cpanm Proc::ProcessTable

Depending on the state of your system, you may need to install more packages, but this is all I needed on my Ubuntu 14.04 system. Also, the first time that you install or run some of the commands, they may have to configure themselves. I opt for the automatic configuration, and my server has never melted.

Create a file called cron.10min.pl. Yes, I am running my cron every 10 minutes. That is far too often for most sites, so do whatever you want to. Put this into the file using your favorite text editor and afterward chmod it so you can execute it.

#!/usr/bin/perl
 
use strict;
use warnings;
use Proc::ProcessTable;
use Sys::Syslog;
 
# Exit if another copy of this process is already running
my $proc_count = 0;
my $proc_table = Proc::ProcessTable->new;
 
# Find out what cmndline was use to initiate the cron request
my $pid = getppid();
my $cmndline = '';
for my $process ( @{ $proc_table->table } ) {
  next unless $process->{pid} == $pid;
  $cmndline = $process->{cmndline};
}
# Do not continue if any other process has the same cmndline
for my $process ( @{ $proc_table->table } ) {
  next unless $process->{cmndline};
  exit if (($process->{pid} != $pid) && ($process->{cmndline} eq $cmndline));
}
 
# Add cron records for each site here, for example:
#`/usr/bin/php /var/cron/localcron.php --key <INSERT KEY HERE> --root '/var/www/WHEREVER'`;
 
`/usr/bin/php /var/cron/localcron.php --key Gh7-C45vViOBzJkq6jmSvgFNC5u7uyDSdfKflF8w4_c --root '/var/www/site1'`;
`/usr/bin/php /var/cron/localcron.php --key PGnG-X76UUZ71EpTwpH5HR6CunU21ytt5DiPmAViB8M --root '/var/www/site2'`;

This code is relatively straightforward. It makes sure that only one copy of itself is running (not an easy thing to do with Perl and without using lock files!) and then it calls the cron on two different sites, one after the other. The --key is the cron key which is found in the administration pages of your site, and --root is the physical location of the Drupal installation in the file system.

PHP

This file will also go into the /var/cron directory, with the file name localcron.php.

<?php
// Inspired from http://www.failover.co/blog/drupal-7-running-cron-command-line
 
// Remove time limits
set_time_limit(0);
ini_set("mysqli.reconnect", "On");
 
// Checks that the arguments are supplied, if not display help information.
if (in_array($argv, array('--help')) || $argc != 5) {
  ?>
  This is a script to run cron from the command line.
 
  It takes 2 arguments, which must both be specified:
  --key is the cron key for your website, found on your
  status report page, the part after ?cron_key=?
  --root is the path to your drupal root directory for
  your website.
 
  Usage:
  php localcron.php --key YOUR_CRON_KEY --root '/path/to/drupal/root'
 
<?php
} else {
  // Loop through arguments and extract the cron key and drupal root path.
  for ($i = 1; $i < $argc; $i++) {
    switch ($argv[$i]) {
      case '--key':
        $i++;
        $key = $argv[$i];
        break;
      case '--root':
        $i++;
        $path = $argv[$i];
        break;
    }
  }
 
  chdir($path);
  // Sets script name
  $_SERVER['SCRIPT_NAME'] = $argv[0];
 
  // Values as copied from drupal.sh
  $_SERVER['HTTP_HOST'] = 'default';
  $_SERVER['REMOTE_ADDR'] = '127.0.0.1';
  $_SERVER['SERVER_SOFTWARE'] = 'perl-cron';
  $_SERVER['REQUEST_METHOD'] = 'GET';
  $_SERVER['QUERY_STRING'] = '';
  $_SERVER['HTTP_USER_AGENT'] = 'perl-cron console';
  $_SERVER['REQUEST_URI'] = '';
 
  // Set cron key get variable to use below code with as
  // little modification as possible.
  $_GET['cron_key'] = $key;
  define('DRUPAL_ROOT', $path);
 
  // Code below is almost verbatim from cron.php, just the messages for
  // watchdog have been changed to indicate that the problem originated from
  // this script.
  include_once DRUPAL_ROOT . '/includes/bootstrap.inc';
  drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
 
  if (!isset($_GET['cron_key']) || variable_get('cron_key', 'drupal') != $_GET['cron_key']) {
    watchdog('cron', "Cron could not run via $argv[0] because an invalid key was used.", array(), WATCHDOG_NOTICE);
    drupal_access_denied();
  } elseif (variable_get('maintenance_mode', 0)) {
    watchdog('cron', "Cron could not run via $argv[0] because the site is in maintenance mode.", array(), WATCHDOG_NOTICE);
    drupal_access_denied();
  } else {
    db_query("SET wait_timeout=28800");
 
    drupal_cron_run();
  }
}

Notice several of the tweaks that makes this work. First, the PHP time limit is removed. Second, mysqli_reconnect is enabled, as well as wait_timeout set for a long time (8 hours, line 70). Both mysql settings may not be necessary for every site, but setting both has made this script work on many different mysql versions.

You can now check if your code works by running /var/cron/cron.10min.pl (if you chmodded it correctly, that is!). If it gives you any error messages, solve those before moving on. If it tells you that you are missing Perl modules, install them using the cpanm command as shown earlier.

The Crontab

Adding the crontab entry is easy. Just run crontab -e, which will open a file, your "cron table", for editing. Add the following entry:

*/10 * * * * /usr/bin/perl /var/cron/cron.10min.pl

This sets the cron to be run every ten minutes. To run it every two hours instead, use * */2 * * * for the time component. For more examples, check out this post.

My crontab example also assumes that perl is installed at /usr/bin/perl. You can check to see where it is installed on your system by running which perl. Use the path appropriate for your system. Depending on your installation, you may have to run the cron as another user in order to have access permissions to all the files in your installation, although I did not have this problem with my sites.

Possible Configurations

You may recall that my Perl file runs the cron for two sites, one after the other. If your site requires that 2 crons be called simultaneously, then create a separate Perl file for the second cron and create an entry for it in the crontab. The way that the process checking is written, Perl only stops if another process is running with the same file name. By creating a second Perl file, the two can run simultaneously, although it is advised that you do not do this for the same site.

Another customization can be in controlling which cron hooks are run in the PHP. Rather than calling drupal_cron_run() directly, you can call your own version of the function instead that can pick and choose which cron functions are called. An example of this would be my site with the multi-hour cron runs. I can set one cron process to only execute the long-running procedure, while another cron job does everything else except the long-running procedure.

The possibilities truly are endless.

Tags: