Get started with Dask
Prerequisites
You need the following:
- Git
- Python >=3.8
- pip
Some libraries installed on Graal:
- adlfs==2022.2.0
- aiohttp==3.8.1
- gcsfs==2022.2.0
- lightgbm==3.3.2
- prometheus-client==0.13.1
- protobuf==3.19.4
- pyarrow==7.0.0
- python-socketio==5.4.1
- s3fs==2022.2.0
- scikit-learn==1.0.2
- joblib==1.1.0
- xgboost==1.5.2
- dask==2022.02.1
- dask_kubernetes==2022.1.0
- blosc==1.10.2
- lz4==3.1.10
- pandas==1.3.0
Prerequisites for your package
Your code must be inside one or many modules others than __main__.py module.
In each module you must define a function that contains the parameter "client" and encloses the code to be executed. Not every Dask Distributed features require the "client" parameter, so it may be normal for you not to use it despite its presence in your function signature.
For example, with Dask Distributed XGBoost you could use the following code:
def distributed_xgboost(client):
Example
Clone the example project and use pip to build it.
The example project named dask_ml_example is composed of 4 modules that show some applications with Dask Distributed. We can find an implementation of Lightgbm distributed, XGBoost distributed and Sklearn distributed.