Remove Junks

Remove-Junks is a component created by Saagie allowing to clean obsoletes user data in Saagie’s databases.

Remove-Junks is a scheduled service that runs by default every 30 minutes to clean up the junks instances of your resources. For the moment, only job’s instances with their logs are cleaned up.

The deletion includes instances with all their logs.

User instance logs are stored in ElasticSearch since Saagie release 2024.05. Job instances from before this update continue to be persisted in MongoDB. Remove-Junks works on both databases.
Not all instances are cleaned up, many rules are applied to determine if an instance is a junk or not.

Which Instances are Cleaned Up

Remove-Junks cleans up instances that meet the following conditions:

  • The instance:

    • is older than minimum time to live in days (90 days by default)

    • or is one of the oldest instances of its job and exceeds the limit number defined by job (4000 by default)

  • The instance is not the last one of its job (to keep at least N instances by job, 40 by default)

Only standalone job instances are currently supported, job’s instances linked to a pipeline will be in the upcoming releases.

How Remove-Junks Works

Remove-Junks works in two steps:

  • Step 1: list all junks instances to remove and store them in a registry (a Mongo collection).

  • Step 2: remove all instances stored in the registry.

    • A limit (~ a quota) on the number of documents removables is applied by execution to not impact the production usage of Saagie. It’s a limit by Saagie’s platform. So, when the quota is reached, we stop the removal of logs and instances.

    • To remove an instance, Remove-Junks must remove before, all logs associated with this instance.

    • To avoid remove only logs with the available quota and not instances without logs, we prioritize the removal of instances without logs. So, when we removed all logs of an instance, we keep a place in the quota to remove this unbound instance.

To not impact usage production of Saagie, by execution, Remove-Junks can remove a maximum number of documents in Mongo and logs in ElasticSearch. By default, Remove-Junks removes 1000 documents in Mongo and 100_000 logs in ElasticSearch.

When Remove-Junks start, if the registry contains more than the maximum number of documents removable by execution, the listing is not done, only the removal will be done. Also the number of documents in the registry can be decrement without listing again new junks instances and remove the oldest.

If you want to force the listing of junks instances, you can set the REMOVE_JUNKS_FORCE_JUNKS_LISTING environment variable to true.
If you want to disable the deletion of junks instances, you can set the REMOVE_JUNKS_ENABLE_DELETION environment variable to false.

Settings of Remove-Junks

Obviously, you can configure many settings of this automatic cleaning as change the frequency of the service or the minimal number of days to keep instances. At the end of pod’s logs for an execution of Remove-Junks, you can see a table to resume settings used for this execution. By default, settings displayed in logs are:

Table 1. Configuration of Remove-Junks
CONFIG VALUE

Force junks listing

false

Enable deletion

true

Timeout databases queries in seconds

10.00

Max nb documents to remove in Mongo by run

1000

Max nb logs to remove in ElasticSearch by query

100000

Min days to keep job instances

90

Date max to keep job instances

2024-10-09 15:00:01.793793214 +0000 UTC

Min nb of instances by job to keep

40

Max nb of instances by job to keep

4000

The line Date max to keep job instances is only displayed in logs and is not a configuration parameter. It’s the date calculated by Remove-Junks to determine the oldest instances to keep. This date is the difference between the current datetime of the execution and the minimum number of days to keep job instances.
The maximum number of documents to remove in Mongo by run is the number of documents to remove by execution of Remove-Junks is by Saagie platform.

How to Configure Remove-Junks

To configure Remove-Junks, you can use saagiectl or updating the remove-junks configuration in the kubernetes CronJob saagie-common-remove-junks in the saagie namespace.

  • Settings with saagiectl

    You can configure Remove-Junks with saagiectl

  • Update directly kubernetes CronJob

    Below is the yaml list of the configuration which you can change is the CronJob (with default values for this example):

    - name: REMOVE_JUNKS_MIN_DAYS_TO_KEEP_JOB_INSTANCES
      value: "90"
    - name: REMOVE_JUNKS_LOG_LEVEL
      value: INFO
    - name: REMOVE_JUNKS_TIMEOUT_DB_QUERIES_IN_SECONDS
      value: "10"
    - name: REMOVE_JUNKS_MIN_NB_INSTANCES_BY_JOB_TO_KEEP
      value: "40"
    - name: REMOVE_JUNKS_MAX_NB_INSTANCES_BY_JOB_TO_KEEP
      value: "4000"
    - name: REMOVE_JUNKS_MAX_NB_DOCUMENTS_TO_LIST
      value: "1000"
    - name: REMOVE_JUNKS_MAX_NB_DOCUMENTS_TO_REMOVE_BY_RUN
      value: "1000"
    - name: REMOVE_JUNKS_MAX_NB_LOGS_TO_REMOVE_IN_ELASTIC_BY_QUERY
      value: "100000"
    - name: REMOVE_JUNKS_FORCE_JUNKS_LISTING
      value: "false"
    - name: REMOVE_JUNKS_ENABLE_DELETION
      value: "true"

Execution summary tables

At the end of the logs of Remove-Junks, you can see many tables to monitor the execution of the service:

  • The first table is the configuration of Remove-Junks.

  • The second table is stats on the listing of junks instances by Saagie’s platform found by Remove-Junks. This table is not displayed if the listing is not done.

  • The third table is stats on the removal of junks instances by Saagie’s platform done by Remove-Junks. This table is not displayed when the deletion is disabled.

Example of listing stats table

Table 2. Stats on listing junks
DATABASE COLLECTION NUMBER OF DOCUMENTS OLDER THAN DATE EXCEEDING MAX QUOTA TOTAL JUNKS FOUND TOTAL JUNKS FOUND WITHOUT DUPLICATES ALREADY LISTED NEW INSERTED NEW TOTAL JUNKS TO REMOVED

removejunks

jobInstances

2341

2341

39

2380

projectsandjobs-saagie-1

jobInstance

2456

2380

0

2380

2380

projectsandjobs-saagie-2

jobInstance

0

0

0

0

0

Explanation of the table

Column explanations:

  • DATABASE: name of the MongoDB database. In this example, we find:

    • removejunks: database for instances to be removed, in other words, the junk registry

    • projectsandjobs-saagie-1: database for instances of platform 1

    • projectsandjobs-saagie-2: database for instances of platform 2

  • COLLECTION: name of the collection (in the database of the first column). In this example, we find:

    • jobInstances: collection in the registry concerning the job instances to be removed

    • jobInstance : collection of job instances for platforms 1 and 2

  • NUMBER OF DOCUMENTS: number of documents in the collection:

    • 2341 documents in the jobInstances collection of the junk registry (this is the total number of job instances to be removed that are already known)

    • 2456 documents in the jobInstance collection of platform 1

    • 0 documents in the jobInstance collection of platform 2

  • OLDER THAN DATE: date from which the instances are considered obsolete:

  • EXCEEDING MAX QUOTA: number of job instances exceeding the maximum quota to keep per job (none in this case).

  • TOTAL JUNKS FOUND: total number of job instances to be removed. TOTAL JUNKS FOUND = OLDER THAN DATE + EXCEEDING MAX QUOTA.

  • TOTAL JUNKS FOUND WITHOUT DUPLICATES: total number of job instances to be removed without duplicates between EXCEEDING MAX QUOTA and OLDER THAN DATE.

  • ALREADY LISTED: number of obsolete job instances found during this execution of RemoveJunks but already listed in the junks registry.

    • 2341 job instances found during this execution of Remove-Junks were already listed in the junks registry (Remove-Junks thus identified obsolete instances from previous executions).

  • NEW INSERTED: number of obsolete job instances found during this execution of RemoveJunks and inserted into the junks registry:

    • 39 obsolete job instances were found during this execution of Remove-Junks and inserted into the junks registry. That is 2480 - 2341 = 39 new obsolete instances found.

  • NEW TOTAL JUNKS TO REMOVED: total number of job instances to be removed after this execution of Remove-Junks. NEW TOTAL JUNKS TO REMOVED = NUMBER OF DOCUMENTS + NEW INSERTED.

Example of removal stats table

Table 3. Stats on removing junks
DATABASE COLLECTION TOTAL REMOVABLE LOGS REMOVED REMAINDER DOC DELETABLE RESOURCES REMOVED

ElasticSearch

saagie_1_a3344d6d-56e5-406d-8ffa-31097be8a61b-logs

1815

1815

ElasticSearch

saagie_1_1338a8d3-c10a-49c7-8ada-e9a0370b076b-logs

1452

1452

ElasticSearch

saagie_1_1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79-logs

381

381

projects-logs

saagie_1_a3344d6d_56e5_406d_8ffa_31097be8a61b

121

71

29

71

projects-logs

saagie_1_1338a8d3_c10a_49c7_8ada_e9a0370b076b

0

0

29

0

projects-logs

saagie_1_1014258a_6a7d_4a8a_8b7b_ec61bd8dcf79

0

0

29

0

projectsandjobs-saagie-1

jobInstance

100

71

29

29

projectsandjobs-saagie-2

jobInstance

0

100

0

removejunks

jobInstances

2380

29

Explanation of the table

The deletion is done by the Saagie platform and by the Saagie project.
In MongoDB, a log line is represented by a document.

Explanations of the DATABASE and COLLECTION columns:

  • DATABASE: name of the database. In this example we find:

    • ElasticSearch: log database

    • projects-logs: project log database

    • projectsandjobs-saagie-1: Saagie platform 1 database

    • projectsandjobs-saagie-2: Saagie platform 2 database

    • removejunks: database of instances to delete, in other words the junk registry

  • COLLECTION: name of the collection (in the database of the first column). In this example we find:

    • saagie_1_a3344d6d-56e5-406d-8ffa-31097be8a61b-logs: collection of logs from the project a3344d6d-56e5-406d-8ffa-31097be8a61b on platform 1

    • saagie_1_1338a8d3-c10a-49c7-8ada-e9a0370b076b-logs: collection of logs from the project 1338a8d3-c10a-49c7-8ada-e9a0370b076b on platform 1

    • saagie_1_1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79-logs : collection of logs of project 1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79 of platform 1

    • jobInstance : collection of job instances for platforms 1 and 2

    • jobInstances : collection of the registry concerning the job instances to be deleted

For this example, the document deletion limit per run is 100 and most jobs produce one log line each.

Explanations of the remaining columns:

  • TOTAL REMOVABLE: total number of documents to remove for this collection taking into account the number of documents that can be removed per run of Remove-Junks

    • There is no limit of documents to remove for Elastic, only a limit per query of deletion inherent to Elastic, so this column is empty

    • 121 documents are still removable for the MongoDB collection projects-logs of the project a3344d6d-56e5-406d-8ffa-31097be8a61b of the platform 1

    • 0 documents are still removable for the collection projects-logs of the project 1338a8d3-c10a-49c7-8ada-e9a0370b076b of platform 1 (either there are no more documents to delete because the quota for the execution has already been exceeded, or there are no more documents to delete in this collection)

    • 0 documents are still deletable for the collection projects-logs of the project 1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79 of platform 1

    • 100 job instance documents are potentially deletable for platform 1. Potentially because to be deleted, an instance must no longer have associated logs, so we must first delete all its logs.

    • 0 job instance documents are deletable for platform 2

    • 2380 job instance documents are deletable from the junk registry. Note that to delete a job instance from the registry, it must of course be correctly deleted (including logs). In addition, the limit of documents that can be deleted per execution does not apply to this deletion registry.

  • LOGS REMOVED: Number of logs removed for this collection during this Remove-Junks run:

    • 1815 logs were removed in ElasticSearch for project saagie_1_a3344d6d-56e5-406d-8ffa-31097be8a61b-logs on platform 1

    • 1452 logs were removed in ElasticSearch for project saagie_1_1338a8d3-c10a-49c7-8ada-e9a0370b076b-logs on platform 1

    • 381 logs were removed in ElasticSearch for project saagie_1_1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79-logs on platform 1

    • 71 logs were deleted in the MongoDB collection projects-logs for project a3344d6d-56e5-406d-8ffa-31097be8a61b on platform 1

    • 0 logs were deleted in the collection projects-logs for project 1338a8d3-c10a-49c7-8ada-e9a0370b076b on platform 1

    • 0 logs were deleted in the collection projects-logs for project 1014258a-6a7d-4a8a-8b7b-ec61bd8dcf79 on platform 1

    • 71 logs were deleted in total for job instances on platform 1

    • No logs were deleted in total for job instances on platform 2 (no logs to delete on this platform)

    • No logs to delete for the junk registry

  • REMAINDER DOC DELETABLE: remaining quota of documents that can be deleted per execution for each collection after the log deletion step:

    • ElasticSearch: no limit on documents to delete, so this column is empty

    • 29 documents are still to be deleted in the MongoDB projects-logs collections of the 3 projects (initial quota of 100 documents - 71 logs deleted) and will be done during future Remove-Junks executions. It is likely that the quota has not been fully used for the deletion of logs in order to be able to delete associated job instances.

    • 29 documents remain deletable for the collection of job instances of platform 1

    • 100 documents remain deletable for the collection of job instances of platform 2 (no deletable jobs on this platform)

    • The junk registry has no limit on the deletion of documents per execution, so the column is empty.

  • RESOURCES REMOVED: number of resources removed for each collection:

    • We find the number of logs removed for each ElasticSearch and MongoDB collection

    • 29 job instances were removed for platform 1 (which corresponds to the remaining quota of documents that can be removed for this collection)

    • No job instances were removed for platform 2

    • 29 job instances were removed from the junk registry because they were deleted when executing Remove-Junks