2023
December 2023
Here are the highlights of new and updated features for this release:
-
Product Updates (2023.05)
The2023.05
version of the Saagie product has been released with the following features:-
Find the Saagie CI/CD GitHub Action on the GitHub marketplace. Its aim is to make it easier to set up the CI/CD process on your Saagie platform.
-
App storage space management has been improved. You can now move a storage space to another project, as well as increase the capacity of a storage space.
-
It is now possible to duplicate a pipeline.
-
A new capacity planning dashboard has been added to the Saagie Usage Monitoring (SUM) add-on.
-
Learn how to resume a pipeline execution from the point at which it stopped if it failed.
-
Learn how to do machine learning pipelines using BigQuery with Python in Saagie.
-
Learn how to push models in the Hugging Face’s Model Hub with Saagie.
-
-
User Experience Improvements
The access to Saagie Resources Monitoring (SRM) has been enhanced for a better user experience. -
Bug Fixes
A number of known issues have been fixed. -
Saagie Technology Repository Updates
New technologies have been added and others deprecated.
Product Updates (2023.05)
Saagie CI/CD GitHub Action
Find the Saagie CI/CD GitHub Action on the GitHub marketplace. It is designed to make it easier to set up the CI/CD process on your Saagie platform, and provides you with a set of customizable options to upgrade your jobs and pipelines.
For more information, see CI/CD Action for Saagie DataOps Platform.
Managing Storage Spaces
- Moving a Storage
-
You can now move a storage from one project to another and on a different platform from the Storage page of your project.
- Expanding a Storage
-
You can now expand the capacity of your storage space by editing it.
For storage expansion to work, you must add the allowVolumeExpansion : true
option to thestorage.yml
file created when you configured your cluster(s). For more information, see Creating Storage Classes for Your Saagie Platform for EKS, AKS, GKE, and other service platforms.
For more information, see Managing Storage Spaces.
Duplicating a Pipeline
From the pipeline library or its Overview page, you can now duplicate the Current
version of your pipeline.
This avoids you to start from scratch and improves your productivity.
For more information, see Duplicating Pipelines.
New Saagie Usage Monitoring (SUM) Dashboard
A new default dashboard has been added to SUM. It gives you information on the capacity planning of your jobs and pipelines.
For more information, see the Saagie – Next Scheduling default dashboard.
Resuming a Pipeline
You now have a solution for resuming a pipeline that has failed. There are many reasons why a pipeline may not be completed. It can be costly and time consuming to completely restart it. Resuming a pipeline allows you to pick up where the pipeline stopped and complete the remaining jobs.
For more information, see Resume a Pipeline.
Creating Machine Learning Pipelines With BigQuery in Saagie
Read our new how-to section to learn how to do machine learning with BigQuery using Python in Saagie. To do this, we will look at sentiment analysis applied to movie reviews in the IMDB (Internet Movie Database) and follow each step of this process. Our goal is to determine the polarity of the reviews, that is, whether they are positive or negative.
This section will include several articles. One article for each step of the sentiment analysis process. The article Send Data to BigQuery is the first in a series of four. Stay tuned!
For more information, see Create Machine Learning Pipelines With BigQuery in Saagie.
Pushing Models to Hugging Face With Saagie
Learn how to push models in the Hugging Face’s Model Hub with Saagie. Then use them with our Saagie Hugging Face Model Server add-on. This add-on includes the Saagie HF ModelServer TextCLF
app, which is designed to facilitate the deployment and prediction of Hugging Face deep learning models for text classification.
For more information, see our article on how to Push Models to Hugging Face With Saagie.
User Experience Improvements
Enhanced Access to SRM
The access to Saagie Resources Monitoring (SRM) has been enhanced for a better user experience.
To access it, you can now click Resource Details in the secondary navigation menu of the Monitoring module. It will open SRM in a new tab.
For more information, see About Saagie Resources Monitoring.
Bug Fixes
-
A job that runs in a scheduled pipeline with an invalid image problem will now stop the pipeline.
-
A job that runs in a scheduled pipeline can now be stopped manually.
-
Now, stopping a scheduled job sets the instance to
Stopped
and it also stops running on the cluster.
Saagie Technology Repository Updates
The following technologies have been added or deprecated in the Saagie official technology repository:
Technology | New contexts | Deprecated contexts |
---|---|---|
R |
- |
|
Technology | New contexts | Deprecated contexts |
---|---|---|
CloudBeaver |
- |
|
RStudio |
- |
|
Saagie Usage Monitoring |
|
- |
Do not forget to synchronize your Saagie repositories to keep them up to date. |
October 2023
Here are the highlights of new and updated features for this release:
-
Product Updates (2023.04)
The2023.04
version of the Saagie product has been released with the following features:-
Display a global view of your cluster consumption with Saagie Resources Monitoring.
-
The function for moving a job from one project to another has been improved.
-
You can now see and follow the details of your job execution time.
-
You can delete pipeline instances and versions.
-
You can delete job instances based on date criteria.
-
You can also delete job versions based on tag criteria.
-
You can integrate your Saagie projects into a CI/CD pipeline.
-
You can generate a job description with ChatGPT.
-
You can configure your Jupyter Notebook app for use with generative AI.
-
You can also configure your VS Code app for use with generative AI pair programmers.
-
A new add-on named Saagie Hugging Face Model Server has been released. It helps you deploy and predict Hugging Face deep learning models for text classification.
-
Another add-on named Saagie Code Search has been released too. It helps you search and retrieve Python code snippets from existing codebase.
-
Saagie now supports Kubernetes
1.25.x
.
-
-
User Experience Improvements
The monitoring modules have been restructured. -
Saagie Technology Repository Updates
New technologies have been added.
Product Updates (2023.04)
Viewing Your Cluster Resources With Saagie Resources Monitoring
Saagie Resources Monitoring (SRM) is a set of graphs providing an overview of your cluster’s resource consumption. SRM gives you a global view of RAM and CPU consumption for nodes, jobs and apps of your cluster through several graphs. It is based on Grafana. With its custom dashboard, you can quickly visualize and analyze RAM and CPU consumption in visual form.
For more information, see About Saagie Resources Monitoring.
Moving a Job to Another Project
Moving jobs from one project to another was already possible, but only between projects on the same platform. You can now move jobs from one project to another on a different platform.
From the job library or its Overview page, click the kebab menu and enter the required information. The moved job keeps its versions, instances, logs, packages, alerts, and resource settings.
This avoids you to start from scratch and improves your productivity.
For more information, see Moving a Job to Another Project.
Monitoring Job Execution Time
From your job’s Overview and Instances pages, you can now see the execution time of the running and terminated job, along with the different types of status it has gone through.
This allows you to determine the performance of your job. If it is not effective enough, you can optimize it accordingly.
Deleting Pipeline Instances and Versions
From the Instances and Versions pages of your pipeline, you can now delete instances and versions. This allows you to streamline your list, improve your user experience, and maintain control over storage. You can either delete a single instance or version, a selection of versions or instances with or without filters.
For more information, see Deleting Pipeline Instances and Versions.
Deleting Job Instances and Logs Based On Date Criteria
The feature to delete job instances has been improved. You can now delete job instances with their logs using a date picker. From your job’s Instances page, select the All instances older than filter to delete all instances prior to the selected date.
This will streamline the list and improves your user experience.
For more information, see Deleting Job Instances and Job Versions.
Deleting Job Versions Based On Tag Criteria
The job version deletion feature has been improved. You can now delete versions of a job based on tag criteria. From the Versions page of your job, select the desired filter to delete versions accordingly.
This will streamline the list and improves your user experience.
For more information, see Deleting Job Instances and Job Versions.
Integrating Your Projects Into a CI/CD Pipeline
You can now integrate your Saagie projects into a CI/CD pipeline using our Saagie Python API. By including the source code of your jobs and pipelines in a leading Git tool like GitHub, you can enable CI/CD across all Saagie platforms, from development to production. These development best practices, such as pull changes, review, compare, or commit, can help you better control changes and thus ensure the integrity and consistency of your production environment.
For more information, see Saagie CI/CD.
Generating a Job Description With ChatGPT
You can use ChatGTP to generate your job description. Click Generate with ChatGPT above the description field to send your request to ChatGPT.
To enable the option, you must upgrade Saagie to the latest version. When configuring saagiectl
, you will have to answer new prompts about the use of OpenAI. This will be asked when configuring your cluster settings.
This feature is only available for Spark with a Python context, Bash, R, Sqoop, and Python job technologies. |
For more information, see Generating a Job Description With ChatGPT.
Using Jupyter Notebook With a Generative AI
You can now use the Jupyter Notebook app with generative AI, such as ChatGPT, SageMaker, or Bedrock. A new app called JupyterLab+GenAI 4.0 Python 3.10
has been added to the Saagie official technology repository for use with a generative AI.
For more information, see Use Generative AI in Jupyter Notebook
Using VS Code With a Generative AI Pair Programmer
You can now use the VS Code app with generative AI pair programmers, such as GitHub Copilot and Genie. Use the VS Code Python 4.15.0
app context to have this feature.
For more information, see Use VS Code Powered by Generative AI
Saagie Hugging Face Model Server Add-On
The Saagie Hugging Face Model Server add-on is an app designed to facilitate the deployment and prediction of Hugging Face deep learning models for text classification.
For more information, see Saagie Hugging Face Model Server.
Saagie Code Search Add-On
The Saagie Code Search add-on is an app designed to help you search and retrieve Python code snippets from a default codebase or code repositories hosted on GitHub.
For more information, see Saagie Code Search.
Kubernetes 1.25.x Support
This new version of Saagie is compatible with Kubernetes 1.25.x
.
The following specifications ONLY apply if you use Saagie on your own installation and want to upgrade your cluster to Kubernetes If you already have a Saagie installation on your cluster, but you do NOT plan to upgrade it to Kubernetes |
If you plan to upgrade your cluster to Kubernetes 1.25.x
and already have a Saagie installation on it, you must upgrade Saagie to the latest version BEFORE upgrading to Kubernetes 1.25.x
.
Be careful, Kubernetes v1.25
comes with several major changes, including the removal of PSPs (Pod Security Policies). As a reminder, your administrator is responsible for your clusters and their security. As such, they are also responsible for removing PSPs, as well as any other Kubernetes resources removed in v1.25
. For more information, see the official Kubernetes documentation on Kubernetes Removals and Major Changes In 1.25.
With the removal of PSPs, it is important to find another way of guaranteeing the security of your clusters. |
Also, if your cluster is hosted on Microsoft AKS, and you want to use Saagie on an AKS v1.25.x
environment, you need to reinforce the cgroups_v1
.
Why? For now, Saagie is still using cgroup_v1
. Although Saagie is compatible with Kubernetes 1.25.x
, we will not be natively compatible with AKS as Microsoft has forced the transition from AKS to Kubernetes v1.25
on OS cgroup_v2
.
To reinforce the cgroups_v1
, refer to Azure’s README.md
file on GitHub: Revert Kubernetes 1.25 to cgroup v1.
For more information on the compatible versions of Kubernetes, see Compatible Kubernetes Versions.
User Experience Improvements
Restructuring Monitoring Modules
The Monitoring and Operations modules have been restructured.
The Monitoring module have been deleted. As a reminder, this module was composed of the Platform Overview page. This page provided you with an overview of node consumption and reservations for the selected platform.
The Operations module have been renamed Monitoring. It stays the same as before, except for the name. For more information, see Monitoring Module.
Saagie Technology Repository Updates
The following technologies have been added to the Saagie official technology repository:
Technology | New contexts |
---|---|
R |
|
Technology | New contexts |
---|---|
CloudBeaver |
|
Jupyter Notebook |
|
RStudio |
|
Saagie HF ModelServer TextCLF |
|
VS Code |
|
Do not forget to synchronize your Saagie repositories to keep them up to date. |
July 2023
Here are the highlights of new and updated features for this release:
-
Product Updates (2023.03)
The2023.03
version of the Saagie product has been released with the following features:-
You can now delete job instances and versions.
-
You can now duplicate a job.
-
Default values for CPU and RAM resources have been defined for all technologies, except for external technologies. In addition, these resource capacities are now enabled by default when creating jobs and apps with the predefined default values.
-
-
Saagie Python API Documentation
Read the documentation to use our Python packagesaagieapi
and interact with the Saagie platform in Python. -
-
A patch has been released to handle ambiguous floating values in the attributes of the technology’s
metadata.yaml
files. -
Job execution lasting more than 15 minutes now end with an appropriate status, instead of
Unknown
. -
A pagination has been implemented on the app History page to improve page loading fluidity.
-
-
Saagie Technology Repository Updates
New technologies have been added and others deprecated.
Product Updates (2023.03)
Deleting a Job Instance
From the job’s Instances page, you can now delete instances and associated logs to streamline the list, improve your user experience, and maintain control over storage. You can either delete a single instance, a selection of instances, or a selection of instances based on status filters.
For more information, see Deleting Job Instances and Job Versions.
Deleting a Job Version
From the job’s Versions page, you can now delete versions to streamline the list and improve your user experience. You can either delete a single version, or a selection of versions.
For more information, see Deleting Job Instances and Job Versions.
Duplicating a Job
From the job library or its Overview page, you can now duplicate the Current
version of your job.
This avoids you to start from scratch and improves your productivity.
For more information, see Duplicating Jobs
Default Resource Allocation for All Technologies and Contexts
To increase the reliability of job and app execution, better share limited resources with others, and guarantee simultaneous execution our internal system has been enhanced.
Default values for CPU and RAM resources have been defined to all technologies and contexts in Saagie’s Technology Catalog, except for external technologies. These values ensure greater platform stability. You can see the details by clicking the technology in Catalog > Repositories > Saagie.
These values also exist at the technology context level and can override the values defined at the technology level. You can configure them when creating a job or app, or by modifying the Resources setting of your job or app.
|
In addition, the catalog schemas have been updated with new optional fields to add default values to the technologies in your custom repositories. If this field is left blank, the default values will be 1 CPU and 500 MB RAM. For more information, see Type-Specific Attribute Tables.
Saagie Python API Documentation
You can use our Python package saagieapi
, which implements Python API wrappers to easily interact with the Saagie platform in Python.
For more information, see Saagie Python API documentation.
Bug Fixes
Handle Ambiguous Floating Values
Each technology has its own metadata.yaml
file composed of a variety of attributes requiring different types of values.
The parser is sensitive to float ambiguity when the attribute expects a value of type string.
This has a particular impact on the technology version number.
For example, if you have Python 3.10
, it will be read as 3.1
and not 3.10
.
To remove this ambiguity in version 2023.03
of the Technology Catalog, you must:
-
Modify your technology’s
metadata.yaml
file by adding quotation marks to the value of attributes requiring a string value. For example, writeid: "3.10"
instead ofid: 3.10
. -
Duplicate the technology context. One of the versions will have the identifier
3.2
and will be marked DEPRECATED . The other version will be identical, but with the identifier3.20
.
This concerns all attributes requiring a string value.
Saagie’s official technology repository will be updated automatically without any action on your part. |
Job Status Unknown
Jobs lasting more than 15 minutes were automatically assigned the Unknown
status.
They now end with an appropriate status.
Loading App History
To solve performance issues of the app History page, a pagination has been implemented. Events are loaded progressively rather than all at once, improving page loading time and fluidity.
In addition, the timeline display on the app Overview page has also been modified accordingly. If your app history contains too many events, only the most recent will be displayed. Part of the beginning of the timeline will be grayed out to indicate that the oldest events cannot be displayed.
Saagie Technology Repository Updates
The following technologies have been added or deprecated in the Saagie official technology repository:
Technology | New contexts | Deprecated contexts |
---|---|---|
Bash |
|
- |
Python |
- |
|
Technology | New contexts |
---|---|
Airbyte |
|
VS Code |
|
Do not forget to synchronize your Saagie repositories to keep them up to date. |
April 2023
Here are the highlights of new and updated features for this release:
-
Product Updates (2023.02)
The2023.02
version of the Saagie product has been released with the following features:-
New elements have been created to monitor resource consumption of pipelines and your cluster.
-
Pipeline functionality has been enhanced to include more advanced orchestration logic, such as conditions on environment variables and job status.
-
Saagie now supports Google Cloud Platform (GCP).
-
-
Saagie Technology Repository Updates
New technology versions have been added.
Product Updates (2023.02)
Cluster and Pipeline Resource Monitoring
New resource monitoring elements have been added to monitor resource consumption of your cluster and pipelines.
At the cluster level, you can access the Operations module to see an overview of your cluster. This page displays the number of projects, jobs, pipelines, and apps created on each platform, as well as resource capacity metrics for CPU and RAM for each node in the platform.
In the Overview and Instances page of pipelines, you can access graphs displaying runtime and resource consumption metrics.
This added focus on resource monitoring in Saagie will allow data engineers and platform administrators to have a complementary view of clusters and pipelines to track performance and better optimize resource usage on their platforms.
Smart Conditions in Pipelines
You can now create new type of conditions to build more relevant pipelines:
-
Conditions based on environment variables
-
Conditions based on job status
These new conditions will allow you to implement advanced intelligence in your pipelines.
For more information, see About Conditions in Pipelines.
Saagie With Google Cloud Platform (GCP)
Saagie is now available on Google Cloud Platform (GCP).
Saagie Technology Repository Updates
The following technologies have been added to the official Saagie technology repository:
Technology | New contexts |
---|---|
Dataiku DDS |
|
dbt |
|
Google Cloud Data Transfer |
|
Google Cloud Dataflow |
|
Python |
|
Do not forget to synchronize your Saagie repositories to keep them up to date. |
January 2023
Here are the highlights of new and updated features for this release:
-
Product Updates (2023.01)
The2023.01
version of the Saagie product has been released with the following features:-
New elements to monitor resource consumption have been created.
-
A new add-on, called Saagie Usage Monitoring, can be deployed as an app inside projects.
-
Pipeline functionality has been enhanced to allow context propagation between jobs in a pipeline.
-
Saagie will now be installed with a ready-to-use example project, which goal is to propose an intelligent learning pipeline able to detect feelings on movie reviews.
-
Saagie now supports Kubernetes
1.23.x
and1.24.x
. -
The product version naming pattern has changed.
-
-
Saagie Technology Repository Updates
New technology versions and external job technologies have been added.
Product Updates (2023.01)
Resource Monitoring
New resource monitoring pages have been added throughout Saagie to monitor resource consumption, from a platform level down to a specific item.
At the platform level, you can access the Monitoring module to see an overview of your platform.
This page displays the number of projects, jobs, pipelines, and apps created on the selected platform, as well as resource capacity metrics for CPU and RAM for each node in the platform.
If node isolation has not been configured for your platform, the Monitoring module will not be fully operational, that is, no resource data will be displayed. However, you still have information about the number of platform, jobs, pipelines, and apps. |
In the Overview page of jobs and apps, you can access new graphs displaying runtime and resource consumption metrics for the last running instance.
Besides the resource consumption limits that can already be defined for jobs and apps, Saagie’s focus on monitoring will help data engineers and platform administrators quickly identify bottlenecks, debug memory-starved jobs and apps, and better optimize resource usage on the platform.
Saagie Usage Monitoring
The new Saagie Usage Monitoring add-on can be installed on your platforms as an app, to monitor:
-
The amount of jobs and apps created, with their high-level metadata.
-
Metrics on the execution time and status of jobs and pipelines.
-
Metrics on the global usage of the storage volume associated with Saagie.
This app, based on Grafana, is available as an app technology in the Saagie’s official technology repository and can be installed in any project.
This app requires some configuration to work. Click the information icon to display the README help file directly in Saagie. |
As this app is designed to display cross-project metrics, Saagie recommends deploying it in a dedicated administration project. |
For more information, see Saagie Usage Monitoring.
Context Propagation Between Jobs in Pipelines
In addition to existing environment variables that are set at the global or project levels, you can now create environment variables inside a pipeline and use them to transfer information between jobs during a pipeline execution.
These variables can be dynamically modified by jobs as the pipeline execution progresses, with a table displaying for each job the input and output values of variables.
This feature allows you to build smarter pipelines and paves the way to conditions based on a pipeline environment variables.
For more information, see the Pipeline Overview Page.
Saagie Project Example
Saagie will now be installed with a ready-to-use example project, which goal is to propose an intelligent learning pipeline able to detect feelings on movie reviews. It is accessible from your platform’s project library.
For more information, see Starting With the Saagie Project Example
Kubernetes 1.23.x and 1.24.x Support
This new version of the Saagie installer is now also compatible with Kubernetes versions 1.23.x
and 1.24.x
.
For more information on supported versions of Kubernetes, see System Requirements.
Product Version Naming Convention
For clarity on the product version you are using, it will now follow a new naming convention made up of the year, and the product version increment for the year.
For this version, it is 2023.01
.
Saagie Technology Repository Updates
The following technologies have been added in the official Saagie technology repository:
Technology | New contexts |
---|---|
Bash |
|
Java/Scala |
|
Talend |
|
GCP Cloud Functions |
|
GCP Cloud Run |
|
Technology | New contexts |
---|---|
Apache Superset |
|
Grafana |
|
Metabase |
|
MLFlow Server |
|
Saagie Usage Monitoring |
For Saagie |
Do not forget to synchronize your Saagie repositories to keep them up to date. |