step by step of airflow installation with examples author: tan … · 2019-01-23 · 1 | p a g e...

9
1 | Page Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23 January 2019 Part 1: Installation of Airflow, see Ref[1] Make directory: \airflow\workspace Installing 2 additional packages below: psycopg2, kubernetes Take note below that we have to issue: airflow initdb, followed by airflow version, else there would be errors.

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

1 | P a g e

Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23 January 2019

Part 1: Installation of Airflow, see Ref[1]

Make directory: \airflow\workspace

Installing 2 additional packages below: psycopg2, kubernetes

Take note below that we have to issue: airflow initdb, followed by airflow version, else there would be errors.

Page 2: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

2 | P a g e

Check that the airflow_home directory contains those relevant files: airflow.cfg, airflow.db, unittests.cfg

Inside the airflow.cfg file, there are many different settings and configurations, one of which is for the webserver port.

Open up a browser and point to your server with its web server port number, i.e, IP_Server:Server Port Number. Airflow

UI should be shown as below.

Page 3: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

3 | P a g e

Part 2: Airflow Example 1

Now we are all set to define some tasks for the Airflow. The below example is taken from Ref[2]

(a) The first step is to setup a PostgreSQL Database from the Python script (makeTable.py) below.

Below shows the WeatherDB with its table weather_table being created.

Page 4: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

4 | P a g e

(b) Next, the Python script (getWeather.py) uses the web API to get its relevant data.

(c) After which we define a function to transform and load in the weather data into the DB.

(d) Before we run those scripts in Airflow, we want to make sure there is no error in those scripts. Hence we test out

those scripts individually first.

Page 5: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

5 | P a g e

Part 3: DAG testing

The weatherDAG is defined by Weather_Dag.py which contains 2 functions: get_weather, transform_load, which we will

test them individually.

We will test that both functions work perfectly with no error before we put the weatherDAG into the Airflow UI.

(a) airflow test weatherDAG get_weather 2019-01-17

Above confirms that the JSON file is properly written to its /data directory.

(b) airflow test weatherDAG transform_load 2019-01-17

Above shows that the weather data is properly written to the database dated 2019-01-17.

Now that we have tested both functions to be error-free, we can then use the Airflow UI to turn ON the weatherDAG.

Page 6: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

6 | P a g e

Part 4: Airflow Webserver, Airflow Scheduler

The first step is to start the airflow webserver. See Ref[4] which explains how Systemd can be used to run Airflow

Webserver and Airflow Scheduler.

To bring the weatherDAG into the Airflow UI, execute “airflow scheduler” at the Linux prompt.

Below shows that weatherDAG inside Airflow UI. We toggle that On / Off button to On and let it run as scheduled.

Page 7: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

7 | P a g e

The above script shows that weatherDAG will run in 10 minutes interval. By setting catchup=False it then does not matter

whether your start_date belongs to the past or not. It will be executing from the current time and continues. By

setting end_date you can make a DAG stop running itself. We see the below output in the DB to be correctly updated at

10 minutes interval, and its weatherDAG to be running successfully.

Page 8: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

8 | P a g e

Take note that the Linux prompt on “airflow scheduler” needs to be running in the background for the scheduled tasks to

keep on running. The scheduler is designed to be a long running process, an infinite loop. It orchestrates the work that is

being done, it is the heart of airflow. If it is not running, we are not scheduling more work to be done.

Part 5: Database Connections

In the above example on Part 2 (c) shows its database connection string with its username and password. Another better

method to deal with the database connection is using PostgresHook object which we instantiate and pass the postgres

connection_id, weather_id, to the contructor, as below:

In this case, we need to set up the connection for weather_id as seen above on the right.

Press Save button and a message should show “Record was successfully saved.”

Take note that the above needs the package crypto to be installed, else message “Failed to update record. Could not create

Fernet object: Incorrect padding” would be persistent. See Ref[3].

Part 6: Airflow Example 2

The example of updating Foreign Exchange Rate using Airflow is taken from Ref[5]. Its DB and Airflow outputs for its hourly

update are as shown:

Page 9: Step by Step of Airflow installation with Examples Author: Tan … · 2019-01-23 · 1 | P a g e Step by Step of Airflow installation with Examples Author: Tan Thiam Huat Date: 23

9 | P a g e

References

1) http://michal.karzynski.pl/blog/2017/03/19/developing-workflows-with-apache-airflow/

2) http://michael-harmon.com/blog/AirflowETL.html

3) https://airflow.readthedocs.io/en/stable/howto/secure-connections.html

4) http://site.clairvoyantsoft.com/installing-and-configuring-apache-airflow/

5) https://tech.marksblogg.com/airflow-postgres-redis-forex.html

Good Weblinks

https://caserta.com/data-blog/airflow-tips-tricks-pitfalls/

https://blog.pythian.com/airflow-scheduler-basics/

https://medium.com/@hafizbadrie/airflow-when-your-dag-is-far-behind-the-schedule-ea11bf02e44c

https://stackoverflow.com/questions/52948855/how-to-use-airflow-scheduler-with-systemd

https://airflow.readthedocs.io/en/stable/howto/run-with-systemd.html

https://stackoverflow.com/questions/39073443/how-do-i-restart-airflow-webserver

https://medium.com/@jayden.chua/use-supervisor-to-run-your-python-tests-13e91171d6d3

http://www.alphadevx.com/a/455-Installing-Supervisor-and-Superlance-on-CentOS

# Get the PID of the service you want to stop

ps -eaf | grep airflow