How Does Saagie Work?

Saagie consists of numerous components that work together in stacks. Kafka manages communication between stacks and between components within stacks. Stacks use MongoDB for data storage.

Outdated architecture.

Several modules of the following architecture are obsolete and no longer exist (like all Data lake Governance stack for instance), others are missing.

An update of this architecture is planned soon.

The Global diagram per stack illustrates the stacks and how they work together. The criticality of each component is also represented. Stacks and criticality are described in the sections below.

Global architecture diagram
Figure 1. Global diagram per stack

Saagie Stacks and Components

Saagie has several stacks:

  • The Ingress stack exposes all other stacks.

  • The Authentication stack manages users, groups, and authorizations.

  • The Orchestration stack manages the execution of jobs, apps, and pipelines in Kubernetes.

  • The Datalake Governance stack describes and qualifies the data and data structure using Hadoop.

  • Ingress Stack

  • Authentication Stack

  • Orchestration Stack

  • Data Lake Governance Stack

  • Other Stack

The Ingress stack exposes pods outside of Kubernetes.

The following pods are exposed through ingress-nginx-controller:

  • admin-ui

  • authentication

  • conforama

  • datasetaccess

  • datasetaccess-ui

  • governance

  • login

  • profile

  • projects-and-jobs

  • projects-and-jobs-api

  • security

  • settings

  • technology-manager

  • traefik

If a request does not reach a pod listed above, ingress-nginx-defaultbackend provides a fallback path.

The Authentication stack is responsible for logins and the management of users and groups.

This stack is composed of both Saagie and third-party components:

Component Description

auth

auth is a CRUD component for users and groups. It communicates with the security pod to get the authorizations associated with groups of a given user.

authentication

authentication uses the Keycloak third-party service to change passwords and manage tokens. This component also triggers verification emails sent to users.

security

security retrieves group authorizations. It uses the idmacl pod to determine the groups associated with a user. It calls the authentication pod to check a token’s validity, and can be called by other external components, such as projects-and-jobs-api.

idmacl

idmacl is an abstraction layer for the users and groups persistence system. According to the Saagie offer you chose, this persistence could be an external LDAP, such as OpenLDAP or ActiveDirectory, or Keycloak.

Keycloak

Keycloak is a third-party service used to manage tokens, such as creation and check. Keycloak also keeps current sessions open related to these tokens.

admin-ui

admin-ui is the user interface that manages users and groups.

login

login is the user interface that manages connection to the platform, that is, login and password.

profile

profile manages user profiles, such as email and job. It calls the authentication pod to trigger emails sent to users.

The Orchestration stack is responsible for executing jobs, apps, and pipelines, as well as everything related to environment variables and Docker credentials. There are two API entry points that receive requests from Saagie users: projects-and-jobs-api and conforama.

Each platform has its own Kubernetes namespace containing a MinIO server. Each project also has a Kubernetes namespace containing a MinIO server and Argo. All executable elements, such as jobs, apps, and pipelines, are executed in the namespace of the corresponding project.

Due to the namespace system, projects are isolated from other projects, just as platforms are isolated from other platforms. Saagie components that are not in the project or platform namespaces are in the <installationId> namespace. Where <installationId> must be replaced with your installation ID. It must match the prefix you have determined for your DNS entry.

This stack is composed of several components with different roles and features:

Component Description

projects-and-jobs

projects-and-jobs is the user interface for Saagie’s The "All Projects" page icon is a folder and the same as the "Projects" module. Projects module.

projects-and-jobs-api

projects-and-jobs-api is the API used by the projects-and-jobs component.

project-k8s-controller

project-k8s-controller is a Kubernetes controller (along with the CRD Project created by Saagie) that allows the creation and update of project namespaces.

platform-k8s-controller

platform-k8s-controller is a Kubernetes controller (along with the CRD Platform created by Saagie) that allows the creation and update of platform namespaces.

conforama

conforama is an HTTP API that saves files for a platform.

technology-manager

technology-manager allows users to manage their own technologies and repositories within Saagie.

Fluent Bit

Fluent Bit is a Kubernetes DaemonSet that reads Docker logs to extract job and app logs and make them available in projects-and-jobs-api.

scredz

scredz is used by the projects-and-jobs-api API for Docker credentials concerns.

Traefik

Traefik is an ingress controller that allows HTTP access to apps from outside Saagie. It contains a sidecar container responsible for verifying the access rights for the requested app.

The Data lake governance stack is an application used to manage, document, and qualify your data lake. It allows you to manage user domains, provenances, trust levels, and data status. The stack is connected to the data lake, enabling it to extract necessary information.

The Data lake governance stack is composed of the following components:

Component Description

governance

governance is the main component that manages all documentation and qualifications relating to the data lake.

datasetaccess-ui

datasetaccess-ui is the user interface that manages dataset access rights.

datasetaccess

datasetaccess is the API used by datasetaccess-ui.

rule-manager

rule-manager applies the access rights defined in datasetaccess to the data lake.

Component Description

settings

settings allows the configuration of the Saagie product. It uses a REST API exposed in the ingress-nginx-controller pod.

UMDC

UMDC, or User Metrics Data Capture, allows Saagie to collect anonymous metrics on product use.

Component Criticality

Each technical component used by Saagie has a level of criticality depending on its role. The impact varies when these components fail or are shut down.

Criticality Level Color Meaning

Minor

Yellow

Any Anomaly making it impossible for the customer to use one or more non-essential features of the solution.

Major

Orange

Anomaly reducing the use of the solution by preventing the use of certain essential functions.

Critical

Red

Anomaly making total use of the solution impossible.

The tables below show the criticality level and impact on the platform of a failure for each component.

  • Ingress Stack

  • Authentication Stack

  • Orchestration Stack

  • Data Lake Governance Stack

  • Other Stack

Component Criticality Impact

ingress-nginx-controller

Critical

No access to Saagie API.

ingress-nginx-defaultbackend

Minor

No default error page.

Component Criticality Impact

auth

Critical

Authentication stack is unusable, rendering Saagie unusable.

authentication

Critical

Authentication stack is unusable, rendering Saagie unusable.

security

Critical

Authentication stack is unusable, rendering Saagie unusable.

idmacl

Critical

Authentication stack is unusable, rendering Saagie unusable.

keycloak

Critical

Authentication stack is unusable, rendering Saagie unusable.

admin-ui

Minor

Cannot manage users and groups.

login

Critical

Cannot login to Saagie.

profile

Minor

Cannot manage user profiles, such as jobs and email.

Component Criticality Impact

projects-and-jobs

Major

Projects and jobs user interface is unavailable.
projects-and-jobs-api remains available.

projects-and-jobs-api

Major

Cannot create jobs, pipelines, and scheduled jobs.

project-k8s-controller

Major

Impossible to create projects.

platform-k8s-controller

Minor

Impossible to create platforms.

conforama

Minor

Impossible to create, modify, or delete files on the MinIO platform. Doesn’t block usage by jobs and apps.

technology-manager

Minor

No consequence.

Fluent Bit

Minor

No logs for jobs and apps.

scredz

Minor

Impossible to create, update, or delete Docker credentials. Doesn’t block usage by jobs and apps.

traefik

Major

Impossible to access app ports.

Component Criticality Impact

governance

Major

Cannot access Governance.

datasetaccess-ui

Major

Cannot manage dataset access.

datasetaccess

Major

Cannot manage dataset access or use governance properly.

rule-manager

Major

Cannot grant authorizations to the data lake. When rule-manager is re-initiated, it will grant missed authorizations.

Component Criticality Impact

minio

Major

Impossible to run jobs and apps in the corresponding projects.

argo-controller

Major

Impossible to run jobs and apps in the corresponding projects.

kafka

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

schema-registry

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

zookeeper

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

mongodb

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

settings

Critical

Saagie user interface is unusable.

umdc

Minor

Metrics are not collected.