Adding a Node in Your Saagie Cluster

Saagie uses the Kubernetes pods scheduler to dispatch jobs on available nodes. In most situations, you do not need to let the Saagie product know that you created a new node in your cluster. In some situations, however, you will need to configure the new node to work properly.

Adding Nodes With GPU

Saagie supports job scheduling on a GPU. This feature has been tested with Saagie’s Python image on the following hardware configurations:

  • Tesla P4 GPU

  • NVIDIA drivers v418.67

Currently, only NVIDIA GPUs are supported. The feature might also work with other NVIDIA-based hardware configurations, but they have not been tested.
  1. Configure the Node resource by adding the nvidia.com/gpu=present:NoSchedule taint with the following command line:

    kubectl taint nodes [NODE_NAME] nvidia.com/gpu=present:NoSchedule
    For more information, see the Kubernetes documentation about Taints and Tolerations.
  2. Activate the GPU support in the Saagie settings component for each platform where jobs using GPU can be scheduled. This can be done during the installation of the platform.

  3. Retrieve your configuration status by running the following command lines:

    # Authentication query
    TOKEN = $(curl -X POST -H "Content-Type:application/json" -H "Saagie-Realm:<realm>" https://<saagie_host>/authentication/api/open/authenticate --data '{"login":"<username>", "password":"<password>"}')
    
    # Query reading GPU setting
    curl -X GET -H "Content-Type:application/json" -H "Saagie-Realm:<realm>" -H "Authorization: Bearer $TOKEN" https://<saagie_host>/settings/api/v1/settings/platform/<platform_id>/gpu

    Where:

    • <realm> is the prefix that was determined during Saagie installation.

    • <prefix> must be replaced with the same value determined for your DNS entry at the beginning of the installation process.

    • <saagie_host> is your Saagie URL.

    • <username> and <password> must be the credentials of an admin user.

    • <platform_id> is the ID of the platform being configured.

  4. Activate the ExtendedResourceToleration admission controller on the Kubernetes cluster to schedule jobs on the GPU node.

Adding Nodes Dedicated to the saagie-common Namespace

During the installation process, you can dedicate nodes to the pod responsible for running the saagie-common namespace.

If you did not choose to dedicate nodes to the saagie-common namespace, or you do not want to dedicate the node you are adding to the saagie-common namespace, then no action is required.

When a node is dedicated to the saagie-common namespace, the Saagie platform will only run from that node. A node with the label saagie-common will not run anything other than the Saagie platform.

If you need to dedicate multiple nodes to the saagie-common namespace, make sure all dedicated nodes have the same label or value pairs.

If you have installed Saagie without assigning nodes to the saagie-common namespace and would like to do so now, contact the Saagie support team.