About Jobs

The The "Jobs" page icon is a 3D pyramid of three squares. Jobs page of a project selected from the The "Projects" module icon is a folder. Projects module library gives you access to the library of embedded and external jobs of this project.

What is a job?

A job is a computation task performed inside projects on Saagie using one of the embedded or external technologies. Jobs run through a command line and can be launched individually or as part of a data pipeline.

Jobs are listed with some basic information, such as the name of the job (a), the technology icon (b), the type of the job (c), the job status (d), and the last executed job instance (e). You can also run (f), upgrade, revert to the last instance, or delete the job (g) from the job library.

A clock Clock icon to view information about the next job run. icon (h) is displayed for scheduled jobs: hover over the icon to view information about the next job run.
Focused image of listed jobs.

When creating a job, you choose a category and a technology for it. For each category, the available technologies depend on the technologies you selected when you created the project.

If the technology you need does not appear in the list of available technologies, that means it was not selected when the project was created. You’ll either have to choose another technology or update the project settings to include the technology you need.
In addition, the category and technology of a job cannot be changed after the job is created. If you want to change them, you’ll have to create a new job.

A job category does not impact the job execution, it helps you to organize your jobs. There are three default job categories:

  • Extraction: For jobs that retrieve data.

  • Processing: For jobs that process data.

  • Smart Apps: For jobs that use or expose data.

When choosing your technology, it’s important to note that the technologies have different requirements. For more information, see the following table:

Table 1. Job requirements by technology
Technology File type Default shell command

Bash

Any file type (Optional)

echo "Saagie Bash"

Generic

Docker image

none

Java/Scala

.jar

java -jar {file} arg1 arg2

Python

.py or .zip

python {file} arg1 arg2

R

.r

Rscript {file} arg1 arg2

Spark

.jar

spark-submit --class=Main {file} arg1 arg2

Sqoop

Any file type (Optional)

driver=xxx
host="x.x.x.x"
port=xxx
username="xxx"
password="xxxx"
database="xxxx"
table="xxxx"
hdfsdest=hdfs:///tmp/sqoop_import
+ sqoop import --connect jdbc:$driver://$host:$port/$database --username $username --password $password --as-textfile -m 1 --target-dir $hdfsdest --table "$table"

Talend

.zip

sh {file} arg1 arg2

Click a job to access its:

The pages for embedded and external jobs are the same, except for a few details, which will be mentioned if necessary in the corresponding explanations. Thus, the use of the word "job" includes both embedded and external uses.

Overview Page

The The "Overview" page icon is a square divided into several other squares. Overview page provides general information about your job.

By default, the page opens when you click a job in the project’s job library.

For more information about the project’s job library, read the About Jobs introduction.

The first part of the page (1) provides general information about the job, such as the name and description, the job category, the last instance status, the runtime mode (manual or scheduled), the start and end date of the last instance and its duration, the version used, the job creator, and the job settings.
You also have information about the logs of the job. You can choose to display Saagie Logs or Error Logs Only, and you can download them.

For external jobs, you can choose to display the External Connection Logs alone or simultaneously with the Saagie Logs.

The second part of the page (2) provides information about the pipelines related to the job and information about the technology and either the job package if it’s an embedded job or the external connection if it’s an external job.

What is a package?

The package is either a file or a collection of files in a .zip file. Compatible file types change depending on the technology selected and are listed when uploading a package.

Screenshot of the "Overview" page of a job.

Instances Page

The The "Instances" page icon is three overlapping squares. Instances page provides information about your job instances and allows you to keep track of all executed instances.

What is an instance?

An instance is a single run of a job or pipeline in a project. The execution information and logs of all instances are saved on your platform.

Whenever you run a job, a new instance of that job is created. All instances are saved and remain accessible: they are listed on the right side of the page (1). You can view the information of an instance by selecting it in the list.

Screenshot of the "Instances" page of a job.

By default, the page opens on the last executed instance of the selected job.

The first part of the page (2) provides general information about the instance of the job, such as the instance number (specified by #001 in the title), its execution status, the start and end times, the duration, the version of the job used, and the external connection if it’s an external job.

The second part of the page (3) provides information about the logs of the job. You can choose to display Saagie Logs or Error Logs Only, and you can download them.

For external jobs, you can choose to display the External Connection Logs alone or simultaneously with the Saagie Logs.

Versions Page

The The "Versions" page icon is a folder with an arrow pointing up. Versions page provides information about the current version of the job, but also keeps track of its previous versions.

What is a version?

A version is a single iteration of a job, pipeline, or app. Each new update is stored as a version, enabling you to roll back to previous iterations and keep track of successive changes.

Whenever you upgrade a job (1), a new version of that job is created and automatically defined as the Badge for the "Current" status version.
All versions are saved and remain accessible: they are listed on the right side of the page (2). You can view the information of a version by selecting it in the list.
You can switch back and forth between versions as required by selecting a version from the list and clicking Rollback to this version (3). This action of rolling back to another version of the job will make the selected version the new Badge for the "Current" status version.

You can also define a version as major to highlight the most stable version of a job.
Select a version from the list and click Set as major version (4) to label the version as Major version. Sparks will appear in front of the major version. Similarly, click Unset as a major version to remove the label from a version.
Screenshot of the "Versions" page of a job.

By default, the page opens on the version of the job currently in use which is tagged with the Badge for the "Current" status badge

The first part of the page (5) provides general information about the version, such as the release note, the creation date and creator, and the runtime context.

The second part of the page (6) display:

  • For embedded jobs, information about the job package. You can edit it and modify the runtime context of the job technology, the package, the command lines, and add a release note.

  • For external jobs, information about the external connection. You can edit it and modify the name, the access key ID, the secret access key, and the region.