Celery Configuration
Tutorial to follow along
Configuring celery workers and celery beat for async task execution and scheduling alerts, reports, etc.
- Python and Python-venv (Version 3.10)
- Postgres (Metadata db)
- Redis
Everything I am installing on single vm as creating multiple will cost a lot but you should be able to understand.
Process is going to be same as normal installation we saw with few changes. Let's start.
sudo apt update -y & sudo apt upgrade -y
Installing Redis
sudo apt install redis-server
Installing Apache Superset
-
Install dependencies
sudo apt-get install build-essential libssl-dev libffi-dev python3.10-dev python3.10-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev python3.10-venv libpq-dev
If 3.10 gives error then you can add repo using
sudo add-apt-repository ppa:savoury1/python
sudo apt update -
Create app directory for superset and dependencies
sudo mkdir /app
sudo chown user /app
cd /app -
Create python environment
mkdir superset
cd superset
python3 -m venv superset_env
. superset_env/bin/activate
pip install --upgrade setuptools pip -
Install python dependencies
pip install pillow
pip install apache-superset
pip install psycopg2
pip install gunicorn
pip install celery
pip install gevent -
Create superset config file and set environment variable
touch superset_config.py
export SUPERSET_CONFIG_PATH=/app/superset/superset_config.py -
Edit
superset_config.py
usingnano superset_config.py
and put following code in it# Superset specific config
ROW_LIMIT = 5000
# Flask App Builder configuration
# Your App secret key will be used for securely signing the session cookie
# and encrypting sensitive information on the database
# Make sure you are changing this key for your deployment with a strong key.
# Alternatively you can set it with `SUPERSET_SECRET_KEY` environment variable.
# You MUST set this for production environments or the server will not refuse
# to start and you will see an error in the logs accordingly.
SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
# The SQLAlchemy connection string to your database backend
# This connection defines the path to the database that stores your
# superset metadata (slices, connections, tables, dashboards, ...).
# Note that the connection information to connect to the datasources
# you want to explore are managed directly in the web UI
# The check_same_thread=false property ensures the sqlite client does not attempt
# to enforce single-threaded access, which may be problematic in some edge cases
SQLALCHEMY_DATABASE_URI = 'sqlite:////app/superset/superset.db?check_same_thread=false'
TALISMAN_ENABLED = False
WTF_CSRF_ENABLED = False
# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''
# Celery Redis configuration for async query execution
class CeleryConfig(object):
broker_url = "redis://localhost:6379/0"
imports = (
"superset.sql_lab",
"superset.tasks.scheduler",
)
result_backend = "redis://localhost:6379/0"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
}
CELERY_CONFIG = CeleryConfig
# On Redis
from flask_caching.backends.rediscache import RedisCache
RESULTS_BACKEND = RedisCache(
host='localhost', port=6379, key_prefix='superset_results') -
Please replace YOUR_OWN_RANDOM_GENERATED_SECRET_KEY in above file with the code returned by following command
openssl rand -base64 42
-
Once Done let us inititlize database with following commands
# Create an admin user in your metadata database (use `admin` as username to be able to load the examples)
export FLASK_APP=superset
superset db upgrade
superset fab create-admin
# As this is going to be production I have commented load example part but if you need you can run this
# superset load_examples
# Create default roles and permissions
superset init -
Now Our environment is ready lets try running it.. To run superset I have created a sh script that you can run in order to run the server. To create create script using following command.
nano run_superset.sh
and paste following code in it.
#!/bin/bash
export SUPERSET_CONFIG_PATH=/app/superset/superset_config.py
. /app/superset/superset_env/bin/activate
gunicorn \
-w 10 \
-k gevent \
--timeout 120 \
-b 0.0.0.0:8088 \
--limit-request-line 0 \
--limit-request-field_size 0 \
--statsd-host localhost:8125 \
"superset.app:create_app()" -
In order to run it we need to grant it run permission. To do that lets run following command.
chmod +x run_superset.sh
-
Lets run and test if it works?
sh run_superset.sh
-
check if you are able to login using admin creds on server-ip-address:8088. If everything is working fine then we can go ahead and create service that will start automatically as soon as server starts or in case it reboots.
[!NOTE]
Now if you try to connect with database and async query is enabled it will just run indefinitely as celery worker is not running. We will create celery once superset service is created
Lets create service called superset using following command
sudo nano /etc/systemd/system/superset.service
paste following code in it
[Unit] Description = Apache Superset Webserver Daemon After = network.target
[Service] PIDFile = /app/superset/superset-webserver.PIDFile Environment=SUPERSET_HOME=/app/superset Environment=PYTHONPATH=/app/superset WorkingDirectory = /app/superset limit-re> ExecStart = /app/superset/run_superset.sh ExecStop = /bin/kill -s TERM $MAINPID
[Install] WantedBy=multi-user.target
once copied run following command to enable and start service
systemctl daemon-reload sudo systemctl enable superset.service sudo systemctl start superset.service
Run and Test Celery
-
To run celery I have created a sh script that you can run in order to run the server. To create create script using following command.
nano run_celery.sh
and paste following code in it.
#!/bin/bash export SUPERSET_CONFIG_PATH=/app/superset/superset_config.py . /app/superset/superset_env/bin/activate
celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4 & celery --app=superset.tasks.celery_app:app beat
In above script
-c 4
represents how many worker processes should run in a worker. -
In order to run it we need to grant it run permission. To do that lets run following command.
chmod +x run_celery.sh
-
Lets run and test if it works?
sh run_celery.sh
-
Create Celery service edit/create file
sudo nano /etc/systemd/system/celery.service
and paste following code in it[Unit]
Description = Apache Celery worker Daemon
After = network.target
[Service]
PIDFile = /app/superset/celery.PIDFile
Environment=SUPERSET_HOME=/app/superset
Environment=PYTHONPATH=/app/superset
WorkingDirectory = /app/superset
ExecStart = /app/superset/run_celery.sh
ExecStop = /bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.targetonce copied run following command to enable and start service
systemctl daemon-reload
sudo systemctl enable celery.service
sudo systemctl start celery.service
YEY! Your Enterprise Server is Up and running you can test it by restarting the server...
If you have any issues you can contact me on [email protected] .