When deploying your dashboard it is better not to use the built-in flask
development server but use a more robust production server like
Probably gunicorn is a bit more fully featured and
faster but only works on unix/linux/osx, whereas
waitress also works
on Windows and has very minimal dependencies.
Install with either
pip install gunicorn or
pip install waitress.
Storing explainer and running default dashboard with gunicorn¶
Before you start a dashboard with gunicorn you need to store both the explainer instance and and a configuration for the dashboard:
from explainerdashboard import ClassifierExplainer, ExplainerDashboard explainer = ClassifierExplainer(model, X, y) db = ExplainerDashboard(explainer, title="Cool Title", shap_interaction=False) db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True)
Now you re-load your dashboard and expose a flask server as
from explainerdashboard import ExplainerDashboard db = ExplainerDashboard.from_config("dashboard.yaml") app = db.flask_server()
If you named the file above
dashboard.py, you can now start the gunicorn server with:
$ gunicorn dashboard:app
If you want to run the server server with for example three workers, binding to
8050 you launch gunicorn with:
$ gunicorn -w 3 -b localhost:8050 dashboard:app
If you now point your browser to
http://localhost:8050 you should see your dashboard.
Next step is finding a nice url in your organization’s domain, and forwarding it
to your dashboard server.
With waitress you would call:
$ waitress-serve --port=8050 dashboard:app
Although you can all use the
waitress directly from the dashboard by passing
use_waitress=True flag to
Deploying dashboard as part of Flask app on specific route¶
Another way to deploy the dashboard is to first start a
Flask app, and then
use this app as the backend of the Dashboard, and host the dashboard on a specific
route. This way you can for example host multiple dashboard under different urls.
You need to pass the Flask
server instance and the
url_base_pathname to the
ExplainerDashboard constructor, and then the dashboard itself can be found
from flask import Flask app = Flask(__name__) [...] db = ExplainerDashboard(explainer, server=app, url_base_pathname="/dashboard/") @app.route('/dashboard') def return_dashboard(): return db.app.index()
Now you can start the dashboard by:
$ gunicorn -b localhost:8050 dashboard:app
And you can visit the dashboard on
Deploying to heroku¶
In case you would like to deploy to heroku (which is normally the simplest option for dash apps, see dash instructions here). The demonstration dashboard is also hosted on heroku at titanicexplainer.herokuapp.com.
In order to deploy the heroku there are a few things to keep in mind. First of
all you need to add
requirements.txt (pinning is recommended to force a new build of your environment
whenever you upgrade versions):
Select a python runtime compatible with the version that you used to pickle
your explainer in
(supported versions as of this writing are
python-3.6.12, but check the
for the latest)
And you need to tell heroku how to start your server in
web: gunicorn dashboard:app
If you want to visualize individual trees inside your
model using the
dtreeviz package you will
need to make sure that
graphviz is installed on your
heroku dyno by
adding the following buildstack (as well as the
(you can add buildpacks through the “settings” page of your heroku project)
You can also deploy a dashboard using docker. You can build the dashboard and store it inside the container to make sure it is compatible with the container environment. E.g. generate_dashboard.py:
from sklearn.ensemble import RandomForestClassifier from explainerdashboard import * from explainerdashboard.datasets import * X_train, y_train, X_test, y_test = titanic_survive() model = RandomForestClassifier(n_estimators=50, max_depth=5).fit(X_train, y_train) explainer = ClassifierExplainer(model, X_test, y_test, cats=["Sex", 'Deck', 'Embarked'], labels=['Not Survived', 'Survived'], descriptions=feature_descriptions) db = ExplainerDashboard(explainer) db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib", dump_explainer=True)
from explainerdashboard import ExplainerDashboard db = ExplainerDashboard.from_config("dashboard.yaml") db.run(host='0.0.0.0', port=9050, use_waitress=True)
FROM python:3.8 RUN pip install explainerdashboard COPY generate_dashboard.py ./ COPY run_dashboard.py ./ RUN python generate_dashboard.py EXPOSE 9050 CMD ["python", "./run_dashboard.py"]
And build and run the container exposing port
$ docker build -t explainerdashboard . $ docker run -p 9050:9050 explainerdashboard
Reducing memory usage¶
If you deploy the dashboard with a large dataset with a large number of rows (
and a large number of columns (
it can use up quite a bit of memory: the dataset itself, shap values,
shap interaction values and any other calculated properties are alle kept in
memory in order to make the dashboard responsive. You can check the (approximate)
memory usage with
explainer.memory_usage(). In order to reduce the memory
footprint there are a number of things you can do:
- Not including shap interaction tab.
Shap interaction values are shape
n*m*m, so can take a subtantial amount of memory, especially if you have a significant amount of columns
- Setting a lower precision.
By default shap values are stored as
'float64', but you can store them as
'float32'instead and save half the space:
`ClassifierExplainer(model, X_test, y_test, precision='float32')`. You can also set a lower precision on your
X_testdataset yourself ofcourse.
- Drop non-positive class shap values.
For multi class classifiers, by default
ClassifierExplainercalculates shap values for all classes. If you are only interested in a single class you can drop the other shap values with
- Storing row data externally and loading on the fly.
You can for example only store a subset of
10.000rows in the
explaineritself (enough to generate representative importance and dependence plots), and store the rest of your millions of rows of input data in an external file or database that get loaded one by one with the following functions:
explainer.set_X_row_func()you can set a function that takes an index as argument and returns a single row dataframe with model compatible input data for that index. This function can include a query to a database or fileread.
explainer.set_y_func()you can set a function that takes and index as argument and returns the observed outcome
yfor that index.
explainer.set_index_list_func()you can set a function that returns a list of available indexes that can be queried.
If the number of indexes is too long to fit in a dropdown you can pass
index_dropdown=Falsewhich turns the dropdowns into free text fields. Instead of an
index_list_funcyou can also set an
explainer.set_index_check_func(func)which should return a bool whether the
indexexists or not.
Important: these function can be called multiple times by multiple independent components, so probably best to implement some kind of caching functionality. The functions you pass can be also methods, so you have access to all of the internals of the explainer.
Setting logins and password¶
ExplainerDashboard supports dash basic auth functionality.
flask_simple_login for its user authentication.
You can simply add a list of logins to the
ExplainerDashboard to force a login
and prevent random users from accessing the details of your model dashboard:
ExplainerDashboard(explainer, logins=[['login1', 'password1'], ['login2', 'password2']]).run()
hub = ExplainerHub([db1, db2], logins=[['login1', 'password1'], ['login2', 'password2']])
Make sure not to check these login/password pairs into version control though,
but store them somewhere safe!
ExplainerHub stores passwords into a hashed
format by default.
Automatically restart gunicorn server upon changes¶
We can use the
explainerdashboard CLI tools to automatically rebuild our
explainer whenever there is a change to the underlying
model, dataset or explainer configuration. And we we can use
kill -HUP gunicorn.pid
to force the gunicorn to restart and reload whenever a new
is generated or the dashboard configuration
dashboard.yaml changes. These two
processes together ensure that the dashboard automatically updates whenever there
are underlying changes.
First we store the explainer config in
explainer.yaml and the dashboard
dashboard.yaml. We also indicate which modelfiles and datafiles the
explainer depends on, and which columns in the datafile should be used as
a target and which as index:
explainer = ClassifierExplainer(model, X, y, labels=['Not Survived', 'Survived']) explainer.dump("explainer.joblib") explainer.to_yaml("explainer.yaml", modelfile="model.pkl", datafile="data.csv", index_col="Name", target_col="Survival", explainerfile="explainer.joblib", dashboard_yaml="dashboard.yaml") db = ExplainerDashboard(explainer, [ShapDependenceTab, ImportancesTab], title="Custom Title") db.to_yaml("dashboard.yaml", explainerfile="explainer.joblib")
dashboard.py is the same as before and simply loads an
directly from the config file:
from explainerdashboard import ExplainerDashboard db = ExplainerDashboard.from_config("dashboard.yaml") app = db.app.server
Now we would like to rebuild the
explainer.joblib file whenever there is a
explainer.yaml by running
explainerdashboard build. And we restart the
gunicorn server whenever
there is a change in
dashboard.yaml by killing
the gunicorn server with
kill -HUP pid To do that we need to install
the python package
pip install watchdog[watchmedo]). This
package can keep track of filechanges and execute shell-scripts upon file changes.
So we can start the gunicorn server and the two watchdog filechange trackers
from a shell script
trap "kill 0" EXIT # ensures that all three process are killed upon exit source venv/bin/activate # activate virtual environment first gunicorn --pid gunicorn.pid gunicorn_dashboard:app & watchmedo shell-command -p "./model.pkl;./data.csv;./explainer.yaml" -c "explainerdashboard build explainer.yaml" & watchmedo shell-command -p "./explainer.joblib;./dashboard.yaml" -c 'kill -HUP $(cat gunicorn.pid)' & wait # wait till user hits ctrl-c to exit and kill all three processes
Now we can simply run
chmod +x start_server.sh and
get our server up and running.
Whenever we now make a change to either one of the source files
explainer.yaml), this produces a fresh
explainer.joblib. And whenever there is a change to either
dashboard.yaml gunicorns restarts and rebuild the dashboard.
So you can keep an explainerdashboard running without interuption and simply
model.pkl or a fresh dataset
data.csv into the directory and
the dashboard will automatically update.