About Jobs

The The "Jobs" page icon is a 3D pyramid of three squares. Jobs page of a project selected from the The "Projects" module icon is a folder. Projects module library gives you access to the library of embedded and external jobs of this project.

The pages for embedded and external jobs are the same, except for a few details, which will be mentioned if necessary in the corresponding explanations. Thus, the word "job" includes both embedded and external uses.
What is a job?

A job is a computation task performed inside projects on Saagie using one of the embedded or external technologies. Jobs run through a command line and can be launched individually or as part of a data pipeline.

Jobs are listed with some basic information, such as the name of the job (a), the technology icon (b), the type of the job (c), the job status (d), and the last executed job instance (e). You can also run (f), upgrade, revert to the last instance, or delete the job (g) from the job library.

A clock Clock icon to view information about the next job run. icon (h) is displayed for scheduled jobs: hover over the icon to view information about the next job run.
Focused image of listed jobs.

When creating a job, you choose a category and a technology for it. For each category, the available technologies depend on the technologies you selected when you created the project.

If the technology you need does not appear in the list of available technologies, that means it was not selected when the project was created. You will either have to choose another technology or update the project settings to include the technology you need.
Besides, the category and technology of a job cannot be changed after the job is created. If you want to change them, you will have to create a new job.

A job category does not impact the job execution, it helps you to organize your jobs. There are three default job categories:

  • Extraction: For jobs that retrieve data.

  • Processing: For jobs that process data.

  • Smart Apps: For jobs that use or expose data.

When choosing your technology, it is important to note that the technologies have different requirements. For more information, see the following table:

Table 1. Job requirements by technology
Technology File type Default shell command

Bash

Any file type (Optional)

echo "Saagie Bash"

Generic

Docker image

none

Java/Scala

.jar

java -jar {file} arg1 arg2

Python

.py or .zip

python {file} arg1 arg2

R

.r

Rscript {file} arg1 arg2

Spark

.jar

spark-submit --class=Main {file} arg1 arg2

Sqoop

Any file type (Optional)

driver=xxx
host="x.x.x.x"
port=xxx
username="xxx"
password="xxxx"
database="xxxx"
table="xxxx"
hdfsdest=hdfs:///tmp/sqoop_import

sqoop import --connect jdbc:$driver://$host:$port/$database --username $username --password $password --as-textfile -m 1 --target-dir $hdfsdest --table "$table"

Talend

.zip

sh {file} arg1 arg2

Click a job to access its:

Overview Page

The The "Overview" page icon is a square divided into several other squares. Overview page provides general information about your job.

By default, the page opens when you click a job in the project’s job library.

The first part of the page (1) provides general information about the job, such as the name and alias, the description, the job category, the last instance status, the runtime mode (manual or scheduled), the start and end date of the last instance and its duration, the version used, the job creator, and other job settings.

A job alias is unique for each job within a project and allows you to refer to a job within another job. It can be used in pipelines when the settings env vars Variables setting is enabled.

You also have information about the logs of the job. You can choose to display Saagie Logs, Pipeline Variable Logs, or Error Logs Only for each, and you can download them.

For external jobs, you can choose to display the External Connection Logs alone or simultaneously with the Saagie Logs.

The second part of the page (2) provides information about the job consumption through different graphic resources. Use these graphs to check your job consumption during and after its execution.

  • Hover over the legend (a) to see on which node the job has been executed.

  • Select a range on the graph to zoom in on the selected part of the line. Click close Reset range to display the entire graph again.

Monitoring the RAM consumption of your job can help you anticipate potential memory issues. Indeed, a job that consumes more than the available RAM limit goes into an out of memory Out Of Memory (OOM) state.
You can define a RAM limit for your job in its settings. If you have not defined a RAM limit for your job, it will run according to the overall RAM capacity of the node. In both cases, adjust the RAM limit for your node or job to ensure successful execution.

For more information on monitoring your platform resources, see About the Monitoring Module.

The third part of the page (3) provides information about the pipelines related to the job and information about the technology, and either the job package if it is an embedded job, or the external connection if it is an external job.

What is a package?

The package is either a file, or a collection of files in a .zip file. Compatible file types change depending on the technology selected and are listed when uploading a package.

What is an external connection?

An external connection is a connection parameter specific to an external technology provider, used when creating external jobs in projects. It includes at least the remote host information and authentication requirements used to connect to the external job technology.

Screenshot of the "Overview" page of a job.

Instances Page

The The "Instances" page icon is three overlapping squares. Instances page provides information about your job instances and allows you to keep track of all executed instances.

What is an instance?

An instance is a single run of a job or pipeline in a project. The execution information and logs of all instances are saved on your platform.

Whenever you run a job, a new instance of that job is created. All instances are saved and remain accessible: they are listed on the right side of the page (1). You can view the information of an instance by selecting it in the list.

Screenshot of the "Instances" page of a job.

By default, the page opens on the last executed instance of the selected job.

The first part of the page (2) provides general information about the instance of the job, such as the instance number (specified by #001 in the title), its execution status, the start and end times, the duration, the version of the job used, and the external connection if it is an external job.

The second part of the page (3) provides information about the logs of the job. You can choose to display Saagie Logs, Pipeline Variable Logs, or Error Logs Only for each, and you can download them.

For external jobs, you can choose to display the External Connection Logs alone or simultaneously with the Saagie Logs.

The third part of th page (4) provides information about the job consumption through different graphic resources. Use these graphs to check your job consumption during and after its execution.

  • Hover over the legend (a) to see on which node the job has been executed.

  • Select a range on the graph to zoom in on the selected part of the line. Click close Reset range to display the entire graph again.

Versions Page

The The "Versions" page icon is a folder with an arrow pointing up. Versions page provides information about the current version of the job, but also keeps track of its previous versions.

What is a version?

A version is a single iteration of a job, pipeline, or app. Each new update is stored as a version, enabling you to roll back to previous iterations and keep track of successive changes.

Whenever you upgrade a job (1), a new version of that job is created and automatically defined as the Current Badge for the "Current" status version.
All versions are saved and remain accessible: they are listed on the right side of the page (2). You can view the information of a version by selecting it in the list.
You can switch back and forth between versions as required by selecting a version from the list and clicking Rollback to this version (3). This action of rolling back to another version of the job will make the selected version the new Current Badge for the "Current" status version.

You can also define a version as major to highlight the most stable version of a job.
Select a version from the list and click Set as major version (4) to label the version as Major version. Sparks will appear in front of the major version. Similarly, click Unset as a major version to remove the label from a version.
Screenshot of the "Versions" page of a job.

By default, the page opens on the version of the job in use, tagged with the Current Badge for the "Current" status badge

The first part of the page (5) provides general information about the version, such as the release note, the creation date and creator, and the runtime context.

The second part of the page (6) display:

  • For embedded jobs, information about the job package. You can edit it and modify the runtime context of the job technology, the package, the command lines, and add a release note.

  • For external jobs, information about the external connection. You can edit it and modify the name, the access key ID, the secret access key, and the region.

See also