Acquire Daily Earthquakes Data Using Cron
Schedule Jobs 1: How to use Cron and Python scripts to acquire and upload earthquakes data to a staging database every day.

Scheduling regularly recurring tasks is a very common use case in many different situations, such as automatic database backup, automatic data acquisition, and processing in ETL pipeline, automated ML model retraining and deployment, and so on. Based on the requirement and the nature of the tasks, different tools can be used to achieve the goal. In this tutorial, I am going to demonstrate how to use Cron and Python scripts to acquire and upload earthquakes data to a staging database every day.
What is Cron Job?
Cron job (also known as cron command-line utility) is a job scheduler on Unix-like operating systems. Cron jobs can be scheduled at fixed times, dates, or (repeating) intervals, and it is best suitable for scheduling repetitive system maintenance or administration tasks.
Cron is a daemon that runs when the system boots. It checks the crontab files to see what tasks to run and when. Each user of the system can have their own crontab file. Crontab files also can be shared among users or at the system level for collaboration or other similar purposes. User crontab files are stored under /var/spool/cron folder, and system-wide crontab files include /etc/crontab and those stored under /etc/cron.d folder.
$ pwd
/etc
$ ls -l cron*
-rw------- 1 root root 5 Dec 18 2020 cron.allow
-rw------- 1 root root 6 Nov 13 2019 cron.deny
-rw------- 1 root root 255 Nov 13 2019 crontabcron.d:
total 20
-rw-r--r-- 1 root root 513 Aug 20 09:19 mysched_mystat
-rw-r--r-- 1 root root 51 Dec 3 2020 my-dfmon
-rw-r----- 1 root root 287 Aug 31 11:45 my-drop_caches
-rw-r--r-- 1 root root 0 Aug 30 14:48 my-gsctools
-rw-r--r-- 1 root root 98 May 26 10:22 my-perfmon
-rw-r--r-- 1 root root 329 Aug 31 11:45 my-sacron.daily:
total 36
-rwxr-xr-x 1 root root 778 Jun 24 2020 mdadm
-rwxr-xr-x 1 root root 1924 Feb 16 2017 mlocate.cron
-rwxr--r-- 1 root root 983 Jan 7 2020 suse-clean_catman
-rwxr-xr-x 1 root root 1879 Jan 21 2020 suse.de-backup-rc.config
-rwxr-xr-x 1 root root 2059 Sep 11 2014 suse.de-backup-rpmdb
-rwxr-xr-x 1 root root 566 Sep 11 2014 suse.de-check-battery
-rwxr-xr-x 1 root root 371 Sep 11 2014 suse.de-cron-local
-rwxr--r-- 1 root root 1693 Jan 7 2020 suse-do_mandb
cron.hourly:
total 0cron.monthly:
total 0cron.weekly:
total 0
$
$ cat cron.allow
root
$ cat cron.deny
guest
$ cat crontab
SHELL=/bin/sh
PATH=/usr/bin:/usr/sbin:/sbin:/bin:/usr/lib/news/bin
MAILTO=root
#
# check scripts in cron.hourly, cron.daily, cron.weekly, and cron.monthly
#
-*/15 * * * * root test -x /usr/lib/cron/run-crons && /usr/lib/cron/run-crons >/dev/null 2>&1$
Next, I am going to prepare a new user such that I can schedule Cron jobs using this new user.
$ useradd -m -g users demo
$ passwd demo
New password:
BAD PASSWORD: it is too short
Retype new password:
passwd: password updated successfully
$
$ pwd
/home
$ ls -l
total 16
drwxr-xr-x 7 demo users 4096 Oct 29 15:06 demo
drwxr-xr-x 7 dscuser users 4096 Dec 31 2020 dscuser
$
$ vi /etc/sudoers
71 ##
72 ## Runas alias specification
73 ##
74
75 ##
76 ## User privilege specification
77 ##
78 root ALL=(ALL) ALL
79 demo ALL=(ALL) ALL <=== add the new user here
80
81
82 ## Uncomment to allow members of group wheel to execute any command
83 # %wheel ALL=(ALL) ALL
84
"/etc/sudoers" [readonly] 91L, 3433C$ sudo su demo
demo@localhost:/home>
demo@localhost:/home> ls -l
total 16
drwxr-xr-x 7 demo users 4096 Oct 29 15:06 demo
drwxr-xr-x 7 dscuser users 4096 Dec 31 2020 dscuser
demo@localhost:/home> cd demo
demo@localhost:~> ls -l
total 8
drwxr-xr-x 2 demo users 4096 Jun 27 2017 bin
drwxr-xr-x 2 demo users 4096 Dec 18 2020 public_html
demo@localhost:~>
A Cron job is defined in the crontab file, and each Cron job is defined by one line in the crontab file. The bottom line in the comments below gives the format of the line defining a Cron job. Each line consists of five (time/date specification) fields representing when to execute the (shell) command, and one field specifying what (shell) command to be executed.
# ┌─────────── minute (0 - 59)
# │ ┌─────────── hour (0 - 23)
# │ │ ┌─────────── day of the month (1 - 31)
# │ │ │ ┌─────────── month (1 - 12)
# │ │ │ │ ┌─────────── day of the week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │ 7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * * <command to execute>
The time/date specification fields are defined in unix-cron format. If the current time and date match that defined in the time/date specification fields, the system executes the specified command.
If “*” is used in a field, it stands for “first-last”. For example, if all five fields are filled with “*”, it means the command will be executed once every minute.
Ranges, lists, and “*/n” can also be used in different fields. One can refer to this page for detailed explanations.
The Crontab file can be listed, edited, and removed by using crontab commands.
Usage:
crontab [options] file
crontab [options]
crontab -n [hostname]Options:
-u <user> define user
-e edit user's crontab
-l list user's crontab
-r delete user's crontab
-i prompt before deleting
-n <host> set host in cluster to run users' crontabs
-c get host in cluster to run users' crontabs
-s selinux context
-x <mask> enable debugging
In the next section, before actually demonstrating how to schedule a daily job using Cron, I am going to set up a PostgreSQL database for the staging earthquake data table and show the python code actually acquiring earthquake data and uploading it to the staging database.
Python Script for Acquiring and Uploading Earthquakes data
Here, I use a docker container to hold my PostgreSQL database. For simplicity, I pull the latest PostgreSQL image and run the database with it.
demo@localhost:~> docker pull postgres:latest
latest: Pulling from library/postgres
7d63c13d9b9b: Pull complete
cad0f9d5f5fe: Pull complete
ff74a7a559cb: Pull complete
c43dfd845683: Pull complete
e554331369f5: Pull complete
d25d54a3ac3a: Pull complete
bbc6df00588c: Pull complete
d4deb2e86480: Pull complete
cb59c7cc00aa: Pull complete
80c65de48730: Pull complete
1525521889be: Pull complete
38df9e245e81: Pull complete
79300c1d4f7a: Pull complete
Digest: sha256:db927beee892dd02fbe963559f29a7867708747934812a80f83bff406a0d54fd
Status: Downloaded newer image for postgres:latest
docker.io/library/postgres:latest
demo@localhost:~> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
postgres latest 317a302c7480 9 days ago 374MB
demo@localhost:~>
demo@localhost:~>
demo@localhost:~> docker volume create postgres-volume
postgres-volume
demo@localhost:~> docker volume ls
DRIVER VOLUME NAME
local postgres-volume
demo@localhost:~>
demo@localhost:~> docker run -d --name=pgdb_staging -p 5432:5432 -v postgres-volume:/home/demo/postgres/data -e POSTGRES_PASSWORD=dbc postgres
a3cfee411b31568559ef041e9abe762e9dceacd1791861d5dbac0609deb3602b
demo@localhost:~> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
be9bfb767fac postgres "docker-entrypoint.s…" 2 seconds ago Up 1 second 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pgdb_staging
demo@localhost:~>
Now, we log into the database and prepare the user and database.
demo@localhost:~> psql -h localhost -p 5432 -d postgres -U postgres
Password for user postgres:
psql (14.0)
Type "help" for help.postgres=#
postgres=#
postgres=# CREATE DATABASE demo;
CREATE DATABASE
postgres=# SELECT datname, dattablespace FROM pg_catalog.pg_database;
datname | dattablespace
-----------+---------------
postgres | 1663
template1 | 1663
template0 | 1663
demo | 1663
(4 rows)postgres=#
postgres=# CREATE USER dbc WITH PASSWORD 'dbc';
CREATE ROLE
postgres=#
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+-----------
dbc | | {}
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}postgres=#
postgres=# \q
demo@localhost:~> psql -h localhost -p 5432 -d demo -U dbc
Password for user dbc:
psql (14.0)
Type "help" for help.demo=> \dt
Did not find any relations.
demo=>
demo=> \q
demo@localhost:~>
Next, the snippet below shows the Python code actually retrieving earthquakes data and uploading it to the staging database.
Scheduling Cron Job
Now that we have everything ready, we go ahead to schedule the daily job using Cron.
demo@localhost:~>
demo@localhost:~> sudo crontab -u demo -e
no crontab for demo - using an empty one
crontab: installing new crontab
demo@localhost:~>
demo@localhost:~>
demo@localhost:~> sudo crontab -l
no crontab for root
demo@localhost:~> sudo crontab -u demo -l
# cron job (demo): executing the python script for acquiring and uploading earthquake data to staging database
0 1 * * * python /home/demo/earthquakes_daily_report.py
demo@localhost:~>
At 1:00 am every day, the script will retrieve the earthquake data of the day and upload it to the staging database. The snippet below shows the data uploaded after the first day.
demo@localhost:~> psql -h localhost -p 5432 -d demo -U dbc
Password for user dbc:
psql (14.0)
Type "help" for help.demo=> \dt
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+-------
public | earthquakes | table | dbc
(1 row)demo=> SELECT COUNT(*) FROM earthquakes;
count
-------
210
(1 row)demo=> SELECT * FROM earthquakes WHERE mag > 5;
id | time | place | mag | coordinates | detail
------------+-------------------------+-----------------------------------------------+-----+---------------------------+------------------------------------------------------------------------------------
us7000frfx | 2021-11-04 16:36:53.368 | 298 km SW of Bluff, New Zealand | 5.5 | {165.2157,-48.2606,10} | https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000frfx&format=geojson
us7000frfu | 2021-11-04 16:25:05.135 | 35 km SW of Finschhafen, Papua New Guinea | 5.2 | {147.626,-6.7899,46.39} | https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000frfu&format=geojson
us7000frem | 2021-11-04 15:46:38.965 | Maug Islands region, Northern Mariana Islands | 5.3 | {145.0334,19.0284,564.21} | https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000frem&format=geojson
us7000frcu | 2021-11-04 08:57:06.817 | 65 km ESE of Nikolski, Alaska | 5.2 | {-167.9814,52.7003,35.82} | https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000frcu&format=geojson
us7000fraj | 2021-11-04 02:42:44.037 | 63 km NE of Amahai, Indonesia | 5.7 | {129.3383,-2.9492,18} | https://earthquake.usgs.gov/fdsnws/event/1/query?eventid=us7000fraj&format=geojson
(5 rows)demo=>
demo=> \q
demo@localhost:~>
demo@localhost:~>
If we don’t want a job to continue, we can remove the corresponding line in the crontab file using the edit command, crontab -e. As currently, we only have one job defined, in the snippet below, we simply remove the whole crontab file.
demo@localhost:~>
demo@localhost:~> sudo crontab -u demo -r
[sudo] password for demo:
demo@localhost:~> sudo crontab -u demo -l
no crontab for demo
demo@localhost:~>
Wait till the next day, check the database again, and we will see that no new data is added into the database anymore.
Summary
In this tutorial, I have demonstrated how to schedule recurring tasks using Cron, which may be arguably the simplest and most commonly used tool for scheduling jobs. However, cronjob has its limitations. For example, using cronjob, tasks recurring more than once every minute cannot be scheduled. Another example is portability.
While cron
is available in all Unix distros, it is not available on Windows machines. To achieve the same goal, an equivalent tool, such as schtasks
, has to be used. Other limitations include (but are not limited to): it usually requires root privilege; it may be inconvenient for scheduling tasks in a distributed environment and may create a single-point-of-failure; and so on.
In the next tutorial, I am going to demonstrate how to use another very popular job scheduling tool, Celery
, to schedule jobs.
More content at plainenglish.io. Sign up for our free weekly newsletter. Get exclusive access to writing opportunities and advice in our community Discord.