Architecture

Saagie consists of numerous components that work together in stacks. Kafka manages communication between stacks and between components within stacks. Stacks use MongoDB for data storage.

The Global diagram per stack illustrates the stacks and how they work together. The criticality of each component is also represented. Stacks and criticality are described in the sections below.

Global architecture diagram
Figure 1. Global diagram per stack

1. Saagie stacks and components

Saagie has several stacks:

  • The Ingress stack exposes all other stacks.

  • The Authentication stack manages users, groups, and authorizations.

  • The Orchestration stack manages the execution of jobs, apps, and pipelines in Kubernetes.

  • The Datalake governance stack describes and qualifies the data and data structure using Hadoop.

1.1. Ingress stack

The Ingress stack exposes pods outside of Kubernetes.

The following pods are exposed through ingress-nginx-controller:

  • admin-ui

  • authentication

  • conforama

  • datasetaccess

  • datasetaccess-ui

  • governance

  • login

  • profile

  • projects-and-jobs

  • projects-and-jobs-api

  • reporter

  • security

  • settings

  • technology-manager

  • traefik

If a request doesn’t reach a pod listed above, ingress-nginx-defaultbackend provides a fallback path.

1.2. Authentication stack

The Authentication stack is responsible for logins and the management of users and groups.

This stack is composed of both Saagie and third-party components:

  • auth is a CRUD component for users and groups. auth communicates with the security pod to obtain authorizations associated with groups of a given user.

  • authentication uses keycloak to change passwords and manage tokens. This component also triggers verification emails sent to users.

  • security retrieves group authorizations. It uses the idmacl pod to know which groups are associated with a user. security calls the authentication pod to check a token’s validity. security is a component called by other external components, such as projects-and-jobs-api.

  • idmacl is an abstraction layer for the users and groups persistence system. According to the Saagie offer you chose, this persistence could be an external LDAP, such as OpenLDAP or ActiveDirectory, or keycloak.

  • keycloak is a third-party service used to manage tokens, such as creation and check. keycloak also keeps current sessions open related to these tokens.

  • admin-ui is the user interface that manages users and groups.

  • login is the user interface that manages connection to the platform (login and password).

  • profile manages users profiles, such as email and job. profile calls authentication to trigger emails sent to users.

1.3. Orchestration stack

The Orchestration stack is responsible for executing jobs, apps, and pipelines, as well as everything related to environment variables and Docker credentials. There are two API entry points that receive requests from Saagie users: projects-and-jobs-api and conforama.

Each platform has its own Kubernetes namespace containing a MinIO server. Each project also has a Kubernetes namespace containing a MinIO server and Argo. All executable elements—​jobs, apps, and pipelines—​are executed in the namespace of the corresponding project.

Due to the namespaces system, projects are isolated from other projects, just as platforms are isolated from other platforms. Saagie components not in projects or platforms namespaces are in the namespace saagie-common.

This stack is composed of several components with different roles and features:

  • projects-and-jobs is the user interface for Saagie’s Projects section.

  • projects-and-jobs-api is the API used by the projects-and-jobs component.

  • reporter is the API used for the Activity page in projects-and-jobs. It provides a global view of the activity of jobs, pipelines, and apps.

  • project-k8s-controller is a Kubernetes controller (along with the CRD Project created by Saagie) allowing the creation and update of project namespaces.

  • platform-k8s-controller is a Kubernetes controller (along with the CRD Platform created by Saagie) allowing the creation and update of platform namespaces.

  • conforama is an HTTP API that saves files for a platform.

  • technology-manager allows users to manage their own technologies and repositories within Saagie.

  • fluentbit is a Kubernetes DaemonSet that allows Docker logs to be read in order to extract jobs and apps logs and make them available in projects-and-jobs-api.

  • scredz is used by projects-and-jobs-api for Docker credentials concerns.

  • traefik is an ingress controller that allows HTTP access to apps from outside Saagie. It contains a sidecar container responsible for verifying the access rights for the requested app.

1.4. Datalake governance stack

The Datalake governance stack is an application used to manage, document, and qualify your data lake. This is where you will manage user domains, provenances, trust levels, and data status. The stack is connected to the data lake, enabling it to extract necessary information.

The Datalake governance stack is composed of a few components:

  • governance is the main component managing all data lake documentation and qualifications.

  • datasetaccess-ui is the user interface that manages dataset access rights.

  • datasetaccess is the API used by datasetaccess-ui.

  • rule-manager applies the access rights defined in datasetaccess to the data lake.

1.5. Other

  • settings allows the configuration of the Saagie product. It uses a REST API exposed in the pod ingress-nginx-controller.

  • UMDC (User Metrics Data Capture) allows Saagie to collect anonymous product-use metrics.

2. Component criticality

Each technical component used by Saagie has a level of criticality depending on its role. The impact varies when these components fail or are shut down.

Criticality level Color Meaning

Minor

Yellow

Any Anomaly making it impossible for the Customer to use one or more non-essential features of the Solution.

Major

Orange

Anomaly reducing the use of the Solution by preventing the use of certain essential functions.

Critical

Red

Anomaly making total use of the Solution impossible.

The tables below present the level of criticality and the impact on the platform of a failure for each component.

2.1. Ingress stack

Component Criticality Impact

ingress-nginx-controller

Critical

No access to Saagie API.

ingress-nginx-defaultbackend

Minor

No default error page.

2.2. Authentication stack

Component Criticality Impact

auth

Critical

Authentication stack is unusable, rendering Saagie unusable.

authentication

Critical

Authentication stack is unusable, rendering Saagie unusable.

security

Critical

Authentication stack is unusable, rendering Saagie unusable.

idmacl

Critical

Authentication stack is unusable, rendering Saagie unusable.

keycloak

Critical

Authentication stack is unusable, rendering Saagie unusable.

admin-ui

Minor

Cannot manage users and groups.

login

Critical

Cannot login to Saagie.

profile

Minor

Cannot manage user profiles (such as jobs and email).

2.3. Orchestration stack

Component Criticality Impact

projects-and-jobs

Major

Projects and jobs user interface is unavailable. projects-and-jobs-api remains available.

projects-and-jobs-api

Major

Cannot create jobs, pipelines, and cronJobs. The component reporter cannot synchronize.

reporter

Minor

The activity page is empty.

project-k8s-controller

Major

Impossible to create projects.

platform-k8s-controller

Minor

Impossible to create platforms.

conforama

Minor

Impossible to create, modify, or delete files on the Minio platform. Doesn’t block usage by jobs and apps.

technology-manager

Minor

No consequence.

fluentbit

Minor

No logs for jobs and apps.

scredz

Minor

Impossible to create, update, or delete Docker credentials. Doesn’t block usage by jobs and apps.

traefik

Major

Impossible to access app ports.

2.4. Datalake governance stack

Component Criticality Impact

governance

Major

Cannot access Governance.

datasetaccess-ui

Major

Cannot manage dataset access.

datasetaccess

Major

Cannot manage dataset access or use governance properly.

rule-manager

Major

Cannot grant authorizations to the datalake. When rule-manager is reinitiated, it will grant missed authorizations.

2.5. Other

Component Criticality Impact

MinIO

Major

Impossible to run jobs and apps in the corresponding projects.

Argo-controller

Major

Impossible to run jobs and apps in the corresponding projects.

kafka

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

schema-registry

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

Zookeeper

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

MongoDB

Critical

Numerous Saagie components cannot work, rendering Saagie unusable.

Settings

Critical

Saagie user interface is unusable.

UMDC

Minor

Metrics are not collected.