Explanation: Jobs

1. Library

When you first navigate to your project, you’ll land in the job library for that project. Additionally, you’ll return to this library page anytime you select Jobs in the secondary navigation panel.

Screenshot of the jobs library homepage with eight sections outlined
  1. Access project-level environment variables in the secondary navigation panel.

  2. Also in the secondary navigation panel you’ll find all Docker credentials for this project.

  3. The breadcrumbs in the top bar display your current location.

  4. The top bar title is the title of the current page. In this case, you’re on Jobs.

  5. Also in the top bar are buttons to:

    • Update (refresh) your page

    • Access project settings

    • Create a new job

  6. Use the search bar to search for jobs by name.

  7. Sort jobs by status, last instance, creation date, technology, or alphabetically. Group jobs by category or not at all.

    View options for list of jobs
  8. Finally, you’ll see a list of all of the jobs in this project. Continue reading for more details.

1.1. List of jobs

The list of jobs includes several details, with each line describing one job.

A job line includes the following information and options:

Job line details
  1. Technology icon

  2. Job name

  3. Badge indicating the job’s most recent status

  4. Most recent instance of the job

  5. Button to run the job

  6. Menu with shortcuts:

Additionally, there is a small clock icon next to scheduled jobs. When you hover over the icon, you’ll see when the job is next scheduled to run.

Schedule job icon and message

2. Overview

The overview page is the first page you’ll reach when you navigate to your job details. It displays essential information about the job and links to other important job details.

Screenshot of the job overview homepage with ten sections outlined.
  1. The second-level navigation column has links to the pages for a job’s overview, instances, and versions.

  2. The second-level navigation column displays the job’s current status and when the job last ran. There are also buttons to run or delete your job.

    This specific job is a part of a pipeline and therefore cannot be deleted.
  3. The breadcrumbs in the top bar show which project and job you’re viewing. You can also use the breadcrumbs as navigation.

  4. The top bar title is the title of the current page.

  5. Also in the top bar are buttons to update your job, access job settings, and upgrade your job.

  6. In the body of the page layout, there is a section featuring all job settings.

  7. Next is a technology section, which features the technology used for the job, that technology’s repository, the technology’s runtime context for the job’s current version, and when the technology was last updated for this job.

  8. Also in the body of the page layout is a section featuring the job’s configuration, including the job’s version information as well as the version’s release note (if one was added), creation date, and who created the version.

  9. Next is information about any pipelines to which this job belongs.

    • This information is important to note because there are more limitations to what actions we can take with a job when it belongs to a pipeline.

  10. The final section in the body displays information about the package, including technology type and shell command.

3. Job categories and technologies

You choose a job’s category and technology when creating the job.

A job’s category and technology cannot be changed after the job is created. If you need to change either the category or the technology, you’ll need to create a new job.

The selection screen will look similar for each category. The technologies available for each category depend on which technologies were selected for each category when the project was created.

Job categories and technologies

3.1. Categories

There are three default job categories to organize your jobs. A job’s category does not impact how the job runs.

  • Extraction: retrieve data

  • Processing: treat data

  • Smart Apps: use or expose data

3.2. Technologies

You determine the technologies available for each job category at the project level.

Then, when creating your job, you’ll choose a single technology in one job category (such as a Python Processing job or a Spark Extraction job).

If the technology you need does not appear on the list of available technologies, that means it was not selected at the project level. You’ll need to either choose a different technology or update the project’s settings to include the technology you need.

When choosing your technology, it’s important to note that different technologies have different requirements. Consult this table for more information.

Table 1. Job requirements by technology
Technology File type Default shell command

Bash

Optional
Any file type

echo "Saagie Bash"

Generic

Docker image

none

Java/Scala

.jar

java -jar {file} arg1 arg2

Python

.py or .zip

python {file} arg1 arg2

R

.r

Rscript {file} arg1 arg2

Spark

.jar

spark-submit --class=Main {file} arg1 arg2

SQOOP

Optional
Any file type

driver=xxx
host="x.x.x.x"
port=xxx
username="xxx"
password="xxxx"
database="xxxx"
table="xxxx"
hdfsdest=hdfs:///tmp/sqoop_import
+ sqoop import --connect jdbc:$driver://$host:$port/$database --username $username --password $password --as-textfile -m 1 --target-dir $hdfsdest --table "$table"

Talend

.zip

sh {file} arg1 arg2

4. Settings

Job settings include the following non-versioned features:

  • Name

  • Description

  • Alerts

  • Run type

4.1. Name

Names are always required.

There are minimal restrictions when naming jobs:

  • Names cannot exceed 255 characters.

  • You cannot repeat job names within a project.

4.2. Description

Descriptions are optional.

There are no restrictions on descriptions, though it’s good practice to keep them short and informative.

4.3. Alerts

By setting up alerts, you’ll receive an email each time your job’s status changes to a status you selected when creating or modifying the job.

Alerts are sent to an email address. You can either choose an email address from the drop-down menu, or, if the email address you need doesn’t appear, you can enter it. You can have alerts sent to one or several email addresses.

You can receive alerts for one or several of the follow statuses:

  • requested: you’ve asked for the job to run; it is waiting for the resources it needs to be able to launch

  • queued: the job is in line to run

  • running: the job is actively running

  • failed: the job’s run failed

  • killing: Saagie is in the process of stopping the job’s run

  • killed: the job’s run has stopped

  • succeeded: the job ran successfully

4.4. Run types

There are two run types:

  1. Manual run requires you to launch the job manually by selecting the Run button in the Saagie user interface.

  2. Scheduled run launches the job according to the schedule you determine.

You can also launch scheduled runs manually.

4.4.1. Scheduled run modes

If you choose a scheduled run, there are three ways to determine the schedule: simple, shortcut, and expert.

Choosing your time and day
  1. All times in Saagie are in UTC. Be sure to calculate for your time zone.

  2. Many Kubernetes clusters are configured to use a time zone where time changes. This will affect scheduled runs.

    1. Any job scheduled between 12:01 and 12:59 AM in that time zone will not run on the days time changes.

    2. For half the year, all job runs will run one hour earlier or later than selected.

  3. Runs scheduled for the 29th, 30th, or 31st day of the month will not run in months with fewer than 29, 30, or 31 days.

Simple

In simple mode, you control each variable through a user interface. Here are the possible configurations.

  1. minute: every [xx] minutes

  2. hour: every [xx] hours at [xx] past the hour

  3. day: every [xx] days at [xx]:[xx] time

  4. week: every [xx] week on [day of week] at [xx]:[xx] time

  5. month: every [xx] month on the [date of month] at [xx]:[xx] time

  6. year: every [xx] year on the [date of month] of [month] at [xx]:[xx] time

Scheduled run type in simple mode
Shortcut

In shortcut mode, you choose how often you want to run the job. All other settings are automatic.

  1. hourly: every hour, on the hour

  2. daily: every day at midnight UTC

  3. weekly: every Sunday at midnight UTC

  4. monthly: the first day of each month at midnight UTC

  5. annually: January 1 at midnight

Scheduled run type in shortcut mode
Expert

In expert mode, you control each variable (like in simple mode), but you do so using Cron format: [minute] [hour] [day of the month] [month] [day of the week]. For example, by entering 30 9 * * 2 in the cron field, you’ve scheduled your job to run at 9:30 AM UTC every Tuesday.

  1. [minute] 0-59

  2. [hour] 0-23

  3. [day of the month] 1-31

  4. [month] 1-12

  5. [day of the week] 0-6 (Sunday to Saturday)

Scheduled run type in expert mode

5. Upgrade

You can upgrade the job’s configuration using the Upgrade job button. Upgrading your job creates a new version of that job.

The job’s configuration includes the following versioned features:

  • Technology version

  • Package (file)

  • Command line

  • Release notes

5.1. Technology version

Supported versions of the technology are listed in the drop-down menu. You can use any supported version, though we indicate a recommended version for optimal performance.

The technology is selected in Job settings and cannot be changed after a job is created. However, the version of the technology can be changed by returning to the Upgrade job menu.

5.2. Package (file)

The package is either one file or a collection of files in a .zip file.

Compatible upload file types change based on the technology selected and are listed when uploading a package.

You can change the uploaded package by returning to the Upgrade job menu.

5.3. Command line

The command line section has an editable text box where you can define the Linux shell command to launch the job.

The default command line text changes based on the technology selected for the job.

5.4. Release notes

Release notes are optional and can be a good way to keep track of changes from one job version to another. Use this field according to your needs.

6. Instances

An instance of your job is a single run of that job.

When you first arrive on your job’s instances page, you’ll see the information for the most recent job instance:

Job instances information page

6.1. Instance information

  • Current job instance number

  • Status of the most recent job instance

  • Current job version

  • Start time, end time, and duration of the instance

  • Corresponding pipeline (if the job ran as part of a pipeline)

6.2. Logs

The jobs instances page is where you can view and download logs. You can view and download all logs or choose either standard or error logs.

Use the buttons around the logs box to scroll up and down, expand the log box to use the full window, refresh logs, and load more logs.

6.3. List of all instances

The job instances page features the list of instances of that job. Select an instance from the list to view its details.

7. Versions

Whenever your job is upgraded, a new version of that job is created. You can access and run any version of your job from the job versions interface.

When you first arrive on your job’s versions page, you’ll see the following information for the current version:

Job versions information page

7.1. Tags

There are two tags available to assign to job versions:

7.1.1. Current version

By default, the current version of the job is the most recent version.

However, you can tag any version of your job as the current version. You can rollback to an older version of the job to make it the current version.

When you create a new version of the job, it is tagged automatically as the current version.

7.1.2. Major version

Use the major version tag however you see fit.

For example, you might use it to highlight stable versions of the job.

You can remove the major version tag from a version by selecting the Unset as a major version button.

7.2. Version information

Each version page contains all information for that version, including:

  • Release note (if one was included)

  • Date created

  • Creator

  • Package information

  • Technology version

  • Shell command line

7.3. List of all versions

The job versions page features a list of all the versions of that job. Select a version from the list to view its details.