Downloading and Configuring Saagie

Once the Saagie installer saagiectl downloaded, run the configuration process, in your terminal. Answer to prompts to generate the .mdl files, which will be used during the installation process.

Default answers are in [brackets].
Before you begin:
  1. Claim and download the latest installer version from your account manager.

  2. Configure Saagie by following these steps:

    1. Launch the installer.

    2. Choose the deployment mode.

    3. Configure the cluster settings.

    4. Enable and configure the Lakehouse.

    5. Define the platform URL.

    6. Define the SMTP (Simple Mail Transfer Protocol).

    7. Configure the platform settings.

    8. Define the Kubernetes CIDR (Classless Inter-Domain Routing).

    9. Configure the automatic removal of obsolete data.

    10. Assign the pods.

    11. Define the access modes.

    12. Configure Prometheus authentication.

    13. Configure the HTTP proxy.

    14. Configure the storage space size settings.

    15. Configure EKS (Amazon EKS Only).

    16. Configure ingress.

    17. Set the Docker registry (dedicated mode only)

    18. Configure the technology repository (dedicated mode only)

    19. Retrieve the saagie.mdl file.

    As you work through the configuration process, answer carefully to prompts. It is much easier to make a correction before moving on to the next prompt. In case of error, follow these instructions:

    1. Close the installer.

    2. Delete the .mdl file that contains the mistake.

    3. Relaunch the installer.

    4. Continue to follow the prompts.

Launching the Installer

  1. Launch the installer, by running the following command:

    ./bin/saagiectl configurebashCopy
  2. Answer all the prompts.
    Each prompt will generate a file.

    Each prompt is described in detail in the sections below. Follow along with this page as you complete the configuration process.

Deployment Mode

settings note a File generated → deploymentmode.mdl

What is your K8s provider ? (valid answers: GKE, EKS, AKS, CUSTOM) [CUSTOM]:bashCopy

Where:

  • GKE is for Google Cloud.

  • EKS is for Amazon Web Services.

  • AKS is for Microsoft Azure.

  • CUSTOM is for any other type of Kubernetes cluster management.

What is the type of registry? (valid answers: 'OFFICIAL', 'CUSTOM') [OFFICIAL]:bashCopy

Where:

  • If you choose OFFICIAL, Docker images will be pulled from the Saagie Docker registry.

  • If you choose CUSTOM, Docker images will be pulled from your Docker registry.

What is your installation Id:bashCopy

Where:

Do you want to isolate execution nodes? (valid answers: 'true' or 'false') [true]bashCopy

Where:

  • If you answer true, your cluster nodes must be labeled beforehand to dedicate their execution to a specific platform. The isolated mode will allow you to separate the workload between your platforms on dedicated nodes. This will prevent the executions of one platform to intrude on another platform’s resources. For more information, see Node Isolation.

    Node labelling must have been done beforehand, when configuring your cluster.
  • If you answer false, your cluster nodes will be undifferentiated. Your workload will be executed wherever there is space. In other words, your Saagie installation and all runs related to your Saagie platform(s) will be executed on any available node in your cluster.

You cannot switch between modes seamlessly. If you choose to configure Saagie to isolate your workload, it is not recommended to reconfigure it in non-isolated mode. If you choose to switch between modes anyway, you will have to manually restart all your apps.

To know if you have configured Saagie in isolated or non-isolated mode, you can run the following command lines:

TOKEN=$(curl -k -X "POST" -H "Content-Type: application/json" -H "Saagie-Realm: $INSTALLATION_ID" "$BASE_URL/authentication/api/open/authenticate" --data '{"login":$LOGIN, "password":$PASSWORD}')
curl -X "GET" -k "$BASE_URL/settings/api/v1/settings" \
    -H "Authorization: Bearer $TOKEN" \
    -H "Content-Type: application/json"bashCopy

It will return, among others, the nodeIsolation value with NONE in non-isolated mode or PLATFORM_ISOLATED in isolated mode.

Authentication Mode (valid answers: LDAP, STANDARD, SSO) [STANDARD]:bashCopy

Where:

  • LDAP is to rely on your corporate LDAP for identity and access management.

  • STANDARD is to rely on Saagie's built-in user management.

  • SSO is to rely on your identity provider for identity management.

Cluster Settings

settings note a File generated → settings.mdl

Metric data is mandatory for billing. It is also sent to the Saagie server. To opt out of sending anonymous data, choose false. (valid answers: 'true' or 'false') [true]:bashCopy

Where:

  • true is to keep anonymous data tracking.

  • false is to avoid sending anonymous data.

Do you need a custom extra volume for FluentBit ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to customize the Fluent Bit volume.

  • false is to keep the default volume.

Expose Prometheus endpoint ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to expose Prometheus metrics via HTTP.

  • false will ask you for your username and password later.

We use Prometheus for monitoring purposes.
Do you need to define a HTTP/HTTPS proxy ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to define HTTP/HTTPS proxy.

  • false will ask you to provide HTTP/HTTPS proxy and the related credentials, if any.

Do you want to use OpenAI? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true enables you the use of OpenAI. You will be asked to provide your OpenAI API key in the next prompt. The AI option can be specified for each platform when configuring the platform settings.

  • false is to disable the use of OpenAI.

OpenAI API key:bashCopy

Where:

  • The answer to this prompt is your OpenAI API key.

    You will be asked to answer this prompt only if you answered true to the previous prompt about the use of OpenAI.

Lakehouse Settings

settings note a File generated → lakehouse.mdl

Do you want to enable Lakehouse ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to enable the Lakehouse.

  • false is to disable the Lakehouse.

You will be asked to answer next prompts about the Lakehouse only if you answered true to the previous prompt about the use of Lakehouse.
Lakehouse: S3 access key:bashCopy

Where:

  • The answer to this prompt is your S3 access key for your Lakehouse.

Lakehouse: S3 secret key:bashCopy

Where:

  • The answer to this prompt is your S3 secret key for your Lakehouse.

Lakehouse: S3 endpoint:bashCopy

Where:

  • The answer to this prompt is your S3 endpoint for your Lakehouse.

Lakehouse: S3 region:bashCopy

Where:

  • The answer to this prompt is your S3 region for your Lakehouse.

Lakehouse: S3 bucket:bashCopy

Where:

  • The answer to this prompt is your S3 bucket for your Lakehouse.

Lakehouse: PostgreSQL username:bashCopy

Where:

  • The answer to this prompt is your PostgreSQL username for your Lakehouse.

Lakehouse: PostgreSQL password:bashCopy

Where:

  • The answer to this prompt is your PostgreSQL password for your Lakehouse.

Lakehouse: PostgreSQL hostname:bashCopy

Where:

  • The answer to this prompt is your PostgreSQL hostname for your Lakehouse.

Lakehouse: PostgreSQL port (number required):bashCopy

Where:

  • The answer to this prompt is your PostgreSQL port for your Lakehouse.

Platform URL

settings note a File generated → url.mdl

The answers to these prompts must match what you determined for your DNS entry.
Platform url suffix:bashCopy

Where:

Platform url domain:bashCopy

Where:

→ The example answer above leads to the URL dunder-workspace.dundermifflin.com. dunder is the installation ID that you defined when configuring the deployment mode.

SMTP (Simple Mail Transfer Protocol)

settings note a File generated → smtp.mdl

SMTP Host:bashCopy

Where:

  • The answer to this prompt is the IP or DNS name of the SMTP host. For example, smtp.mailgun.org.

SMTP Port (number required) [25]:bashCopy

Where:

  • The answer to this prompt is usually either 25, 465, or 587.

SMTP: Enable authentication (valid answers: 'true' or 'false') [true]:bashCopy

Where:

  • true is to enable the SMTP authentication, if any.

  • false is to disable the SMTP authentication, if you do not have any.

SMTP: Transport protocol (valid answers: SMTP, SMTPS) [SMTP]:bashCopy

Where:

  • You have to choose between SMTP or SMTPS depending on your infrastructure.

SMTP: Enable starttls (valid answers: 'true' or 'false') [true]:bashCopy

Where:

  • true is to allow the SMTP server to negotiate the use of TLS.

  • false is to prevent the SMTP server to negotiate the use of TLS.

SMTP username:
SMTP password:
Repeat for confirmation:bashCopy

Where:

  • SMTP username and SMTP password are the credentials of the account from which Saagie emails will be sent.

    The password must contain at least eight characters, including upper case (A-Z), lower case (a-z), numbers (0-9), and special characters (!, $, #, %, etc).
Platform email sender ? (your SMTP gateway must allow this email address as the sender):bashCopy

Where:

  • The answer to this prompt is the email address used to send emails from Saagie. This email address can be used for job alerts and resetting your password.

Platform Settings

settings note a File generated → platforms.mdl

How many platform(s) do you want to create/configure/install? (number required):bashCopy

Where:

  • The answer to this prompt must indicate the number of platforms you want.

What is the Platform name?:bashCopy

Where:

  • The answer to this prompt is the name of your platform. It will be displayed in the Platforms menu. You can choose one that best suits your needs.

    The order in which the platforms are declared during configuration must match the order of the platform IDs you entered the node pool when configuring your cluster.
What is the Platform authorized egress CIDR block? [0.0.0.0/0]:bashCopy

Where:

  • The answer to this prompt is to define an authorized destination network for platform egress (outgoing) communication from the platform.

Do you want to enable GPU option ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to enable the GPU option, which is required to run processes on GPU nodes.

  • false is to disable the GPU option.

Do you want to enable AI option? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is to enable the AI option for the platform you are configuring.

  • false is to disable the AI option for the platform you are configuring.

    You will be asked to answer this prompt only if you answered true to the OpenAI prompt when configuring your cluster settings.
Do you want to customize the data lake url ? (valid answers: 'true' or 'false') [false]bashCopy

Where:

  • true is to define a custom URL for this platform’s data lake. You will be asked to provide your URL in the next prompt.

  • false is to have an automatically generated URL for this platform’s data lake.

Custom data lake url: http://www.mydatalakeurl.combashCopy

Where:

  • The answer to this prompt is your custom URL for this platform’s data lake. Replace the example http://www.mydatalakeurl.com with your value.

    You will be asked to answer this prompt only if you answered true to the previous prompt about the custom data lake URL.

Kubernetes CIDR (Classless Inter-Domain Routing)

settings note a File generated → k8scidr.mdl

K8S CIDR IP Range [0.0.0.0/0]:bashCopy

Where:

  • The answer to this prompt is to specify the IP address or range to which you are joining your Kubernetes API server. You must use the IP or range of your physical network interface of your master server, not the cluster IP of the Kubernetes service in the default namespace.

    Remember to add /32 as a netmask if you specify only one address.

Remove Junks: automatic obsolete data user removal

settings note a File generated → removejunks.mdl

Set the frequency of automatic purging of junk files via cron: [*/30 * * * *]:bashCopy

Where:

  • The answer to this prompt must indicate the frequency of removing junk users’s instances you want. (Every 30 minutes by default.)

How many days minimum do you want to keep the job instances? [90]:bashCopy

Where:

  • The answer to this prompt must indicate the minimum days of keeping job’s instances you want.

How many instances minimum do you want to keep by job? [40]:bashCopy

Where:

  • The answer to this prompt must indicate the minimum instances to keep by job you want.

How many instances maximum do you want to keep by job? [4000]:bashCopy

Where:

  • The answer to this prompt must indicate the maximum instances to keep by job you want.

How many Mongo documents maximum do you want to remove by purge's execution? [1000]:bashCopy

Where:

  • The answer to this prompt must indicate the maximum MongoDB’s documents removable by Remove-Junks’s execution you want.

Do you want to force junks listing for each purge's execution although other junks already exists? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • The answer to this prompt must indicate if you want force the listing of purge’s execution.

Do you want enable automatic purge of junks (does not prevent execution of listing)? (valid answers: 'true' or 'false') [true]:bashCopy

Where:

  • The answer to this prompt must indicate if you want enable the deletion in purge’s execution.

Pods Assignment

Fluent Bit Volume

settings note a File generated → fluentbit.mdl

You will be asked to answer the following prompt only if you answered true to the Fluent Bit volume prompt when configuring your cluster settings.
Fluent Bit volume mountPath:bashCopy

Access Mode

Depending on your answer to the Authentication Mode prompt when choosing the deployment mode, you will be asked different prompts.

The passwords in the following prompts must contain all the following character types: upper case (A-Z), lower case (a-z), numbers (0-9), and special characters (!, $, #, %, etc).
  • If you answered STANDARD or SSO beforehand, you will be asked to answer the following prompts:

    • Standard Access

    • Customer Access

    settings note a File generated → keycloakaccess.mdl

    Standard Password:
    Repeat for confirmation:bashCopy

    Where:

    • The answer to this prompt is the default user, with administrative rights, that will be used for Saagie internal communication. The password will be the one used by M2M user for internal communication.

    settings note a File generated → customeraccess.mdl

    Customer Password:
    Repeat for confirmation:bashCopy

    Where:

    • The answer to this prompt is the default user, with administrative rights, that you will use to connect to the Saagie user interface for the first time. The password will be the one used by the customer_admin user.

  • If you answered LDAP beforehand, you will be asked to answer the following prompts:

    settings note a File generated → ldapaccess.mdl

    LDAP Login:
    LDAP Password:
    Repeat for confirmation:bashCopy

    Where:

    • The answer to this prompt is the credentials for the User DN that Saagie components will use to communicate with your LDAP service.

    LDAP Admin group:bashCopy

    Where:

    • The answer to this prompt is your administration group name.

SSO

settings note a File generated → sso.mdl

You will be asked to answer the following prompts only if you answered SSO to the Authentication Mode prompt when configuring your deployment mode.
SSO CryptR service url:bashCopy
SSO CryptR account domain:bashCopy
SSO CryptR organisation domain:bashCopy
SSO CryptR client Id:bashCopy
SSO CryptR client secret:bashCopy

Where:

  • The answers to all these prompts are provided by CryptR to Saagie.

Keycloak client Id: "exchange-cryptr"bashCopy
Keycloak client secret:bashCopy

LDAP

settings note a File generated → ldap.mdl

You will be asked to answer the following prompts only if you answered LDAP to the Authentication Mode prompt when configuring your deployment mode.
LDAP Vendor (valid answers: LDAP, AD, OTHER) [ad]:bashCopy

Where:

  • LDAP is for LDAP.

  • AD is for Active Directory.

  • OTHER is for another vendor.

LDAP Host:bashCopy

Where:

  • The answer to this prompt is the IP or hostname of the LDAP server. For example, ldap.priv.company.com.

LDAP Base DN:bashCopy

Where:

  • The answer to this prompt is the base DN of the LDAP directory. For example, dc=company,dc=com.

LDAP User DN [CN=Users]:bashCopy

Where:

  • The answer to this prompt is the prefix where to search for users.

    Do not add the base DN here.
LDAP User Object Classes (expecting a comma-separated list) [person, organizationalperson, user]:bashCopy

Where:

  • The answer to this prompt is the comma-separated list of expected object classes for the user.

LDAP Username Attribute [cn]:bashCopy

Where:

  • The answer to this prompt is the attribute used to identify a user.

LDAP RDN Attribute [cn]:bashCopy

Where:

  • The answer to this prompt is the attribute used for user’s RDN.

LDAP UUID Attribute [objectGUID]:bashCopy

Where:

  • The answer to this prompt is the operational attribute, which is unique in the whole directory.

LDAP Bind DN attribute [cn]:bashCopy

Where:

  • The answer to this prompt is the user and the user’s location in the LDAP directory.

LDAP Group Membership Attribute Type (valid answers: DN, UID) [DN]:bashCopy

Where:

  • The answer to this prompt is how group members are defined in LDAP directory.

LDAP Group Name Attribute [cn]:bashCopy

Where:

  • The answer to this prompt is the attribute used to identify a group.

LDAP Group DN [ou=Groups]:bashCopy

Where:

  • The answer to this prompt is the directory where to search for groups.

    Do not add the base DN here.
LDAP Group Membership Attribute [member]:bashCopy

Where:

  • The answer to this prompt is the attribute used by a group to declare members.

LDAP Group Object Classes (expecting a comma-separated list) [group]:bashCopy

Where:

  • The answer to this prompt is the comma-separated list of expected object classes for groups.

Prometheus

settings note a File generated → prometheus.mdl

You will be asked to answer the following prompts only if you answered true to the Prometheus prompt when configuring your cluster settings.
Prometheus endpoint HTTP Basic Authentication - Set user name [monitoring]:bashCopy
Prometheus endpoint HTTP Basic Authentication - Set user password:bashCopy

Where:

  • The answers to these prompts are the credentials for Prometheus HTTP monitoring.

Repeat for confirmation:bashCopy

Where:

  • The answers to this prompt is the confirmation of your user password defined in the previous prompt.

HTTP Proxy

settings note a File generated → proxy.mdl

You will be asked to answer the following prompts only if you answered true to the HTTP proxy prompt when configuring your cluster settings.
Do you want to define a HTTP Proxy directive ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • If you answer true, you will be asked to provide the HTTP Proxy directive in the next prompt.

Do you want to define a HTTPS Proxy directive ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • If you answer true, you will be asked to provide the HTTPS Proxy directive in the next prompt.

Do you want to define a NO Proxy directive ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • If you answer true, you will be asked to provide the NO Proxy directive in the next prompt.

Storage Space Size Settings

settings note a File generated → settingsservice.mdl

Settings storage size step (eq. minimum size) for apps (in MB) (number required) [1000]:bashCopy

Where:

  • The answer to this prompt corresponds to both the minimum size of the storage space, and the step size. The value of the step must be set according to the driver volume used by your cloud provider.

Settings max storage size for apps (must be a multiple of the step and in MB) (number required) [10000]:bashCopy

Where:

  • The answer to this prompt is the maximum size of the storage space. The value must be a multiple of the step and expressed in MB. By default, the maximum value corresponds to 10 * the step value in MB. The default value, in brackets, automatically adapts to the previously specified step value.
    For example, if the value of your step is 1500 (MB), your default maximum storage space size will be 15000. Where 15​,​000 MB = 10 * the step value in MB.

EKS Configuration (Amazon EKS Only)

settings note a File generated → eksconfig.mdl

For security matter, provide the ARN of the role to assign to the Saagie jobs (see documentation):bashCopy

Where:

Restrict to private network (private network needed on VPC - internal load balancer) (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is used if the Saagie frontend load balancer is not to be exposed to the Internet.

Ingress Configuration

settings note a File generated → ingressconfig.mdl

Does the cluster support load balancer auto-provisioning? (valid answers: 'true' or 'false') [true]:bashCopy

Where:

  • Answer true if Saagie is deployed on a Kubernetes cluster that supports load balancer auto-provisioning. For more information, see the Kubernetes documentation on the LoadBalancer type.

  • Answer false if Saagie is deployed behind an external load balancer that you must configure. For more information, see the Kubernetes documentation on the NodePort type.

    You will be asked to answer this prompt only if you answered CUSTOM to the Kubernetes provider prompt when configuring your deployment mode.
What kind of loadbalancer is in front of K8s cluster? (valid answers: 'L3' or 'L4' or 'L7') [L4]:bashCopy

Where:

  • Answer L3 if Saagie is deployed behind a network load balancer.

  • Answer L4 if Saagie is deployed behind a TCP load balancer.

  • Answer L7 if Saagie is deployed behind an HTTP load balancer.

You should also configure your cluster to collect users' IP addresses, as Saagie will block them if there are too many unsuccessful connection attempts.

Docker Registry (Dedicated Mode Only)

settings note a File generated → registry.mdl

Docker registry:bashCopy

Where:

  • The answer to this prompt set the Docker registry used to pull the image inside Kubernetes.

Technology Repository (Dedicated Mode Only)

settings note a File generated → technologiesrepository.mdl

Is the technologies repository an internal one, for offline deployment ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • true is for an offline cluster where you have provided your technologies.zip file using the saagiectl command.

  • false is for a cluster where you will download the technologies.zip file via a URL.

Url of the technology repository:bashCopy

Where:

  • The answer to this prompt is the URL of the technology repository.

Does the technology repository use a different Docker registry than the product ? (valid answers: 'true' or 'false') [false]:bashCopy

Where:

  • Answer true if the Docker images for the technology repository are hosted on a different Docker registry than the Docker registry for the product.

  • Answer false if the Docker images for the technology repository and the product are hosted on the same Docker registry.

Docker registry for the technologies repository:bashCopy

Where:

  • The answer to this prompt is the Docker registry of the technology repository.

Saagie File

settings note a File generated → saagie.mdl

The saagie.mdl file is automatically generated and compiles all the information from the configuration process.

If you find an error afterwards, you do not need to delete the saagie.mdl file to fix the error. Follow these instructions:

  1. Close the installer.

  2. Delete the .mdl file that contains the mistake.

  3. Relaunch the installer.

  4. Continue to follow the prompts.

Once you have corrected the error, and the new file is generated, the saagie.mdl file is automatically updated.