Graal Platform Documentation

Graal Platform Documentation

  • Docs
  • Help

›Tutorials

Overview

  • What is Graal Platform?
  • Why use our platform?
  • How Graal Platform works?
  • Concepts
  • Jobs & workflows
  • Security

Quickstart

  • Quickstart

Tutorials

  • Get started with Python
  • Get started with Dask
  • Get started with XGBoost
  • Get started with Apache Spark and Maven
  • Get started with Apache PySpark
  • Get started with Apache Beam and Gradle
  • Use the API
  • Using the command line tool (graalctl)
  • Using secrets
  • Migration from Databricks
  • Get started with Tensorflow
  • Get started with Pytorch
  • Get started with Mxnet
  • Setting up the Hadoop bridge
  • Get started with Apache Flink and Maven
  • Get started with Dbt
  • Get started with Pulsar
  • Get started with Apache Spark Streaming Pulsar
  • Get started with Debezium
  • Get started with the SDK

How-to guides

  • Using Graal Platform with Azure Data Factory
  • Publishing your artefacts with Azure DevOps
  • Using Graal Platform with Apache Airflow
  • Publishing your artefacts with Jenkins
  • Spark
  • Network, VPN, gateway and firewall
  • Logs
  • Pricing

Security

  • Overview
  • Comply with requirements
  • Infrastructures under Graal Systems
  • Responsibilities

Troubleshoot & debug

  • Troubleshooting
  • Common issues
  • Debug jobs

Get started with Dask

Prerequisites

You need the following:

  • Git
  • Python >=3.8
  • pip

Some libraries installed on Graal:

  • adlfs==2022.2.0
  • aiohttp==3.8.1
  • gcsfs==2022.2.0
  • lightgbm==3.3.2
  • prometheus-client==0.13.1
  • protobuf==3.19.4
  • pyarrow==7.0.0
  • python-socketio==5.4.1
  • s3fs==2022.2.0
  • scikit-learn==1.0.2
  • joblib==1.1.0
  • xgboost==1.5.2
  • dask==2022.02.1
  • dask_kubernetes==2022.1.0
  • blosc==1.10.2
  • lz4==3.1.10
  • pandas==1.3.0

Prerequisites for your package

Your code must be inside one or many modules others than __main__.py module.

In each module you must define a function that contains the parameter "client" and encloses the code to be executed. Not every Dask Distributed features require the "client" parameter, so it may be normal for you not to use it despite its presence in your function signature.

For example, with Dask Distributed XGBoost you could use the following code:
def distributed_xgboost(client):

Example

Clone the example project and use pip to build it.

The example project named dask_ml_example is composed of 4 modules that show some applications with Dask Distributed. We can find an implementation of Lightgbm distributed, XGBoost distributed and Sklearn distributed.

← Get started with PythonGet started with XGBoost →
Graal Platform Documentation
Overview
What is Graal Platform?
Quickstart
Apache SparkApache FlinkApache BeamPythonTensorflowDaskDistributed XGBoost
Links
HomeConsoleCopyrights
Copyright © 2023 Graal Systems