Launch GPU On Demand With OVH AI Training
-
You must have an OVHcloud account.
-
You must have a Public Cloud project in your OVHcloud account.
-
You must have access to the OVHcloud Control Panel.
-
Log in to your OVHcloud Control Panel.
-
Create a user on your OVHcloud account with the following roles:
-
AI Training Operator
-
AI Training Reader
-
ObjectStore Operator
To manage users, navigate to manage AI users and roles.
. For more information, see the OVHcloud instructions on how to -
-
Generate an application token for OVH AI tools. Navigate to your users and tokens.
. For more information, see the OVHcloud instructions on -
Configure your OVHcloud Object Storage to store your job code or data.These storage spaces are accessible through an API interface and can be of different storage classes.
-
Chose your object storage class according to your needs from the following:
-
The S3 object storage with the Standard object storage – S3 API or High Performance object storage – S3 API classes.
The S3 storage classes are compatible with the S3 protocol and regularly updated. -
The SWIFT object storage with the Standard object storage - SWIFT API class.
The SWIFT storage classes are from older generations and no longer benefit from further developments.
-
-
Create your Object Storage bucket. Navigate to create a bucket.
. For more information, see the OVHcloud instructions on how to-
Alternatively, you can also use the REST API with the
ovhai
CLI to create an S3 bucket. For more detailed information, see the OVHcloud documentation on S3 buckets. -
Alternatively, you can also use the REST API with a
POST
request to/cloud/project/{serviceName}/region/{regionName}/storage
to create a Swift bucket. For more detailed information, see the OVHcloud documentation on Swift Object Storage.
-
-
-
You can now put your code or data in the newly created Object Storage.
Here is an example of code in Python to upload your data on Object Storage with the S3 API.
import boto3 from botocore.exceptions import ClientError def upload_file(s3_client, file_name, bucket, object_name=None): """Upload a file to an S3 bucket :param s3_client: The boto3 client. :param file_name: The file to upload. :param bucket: The bucket to upload to. :param object_name: The S3 object name. If not specified, the file_name value is used. :return: Returns True if the file was uploaded, else returns False. """ # If the S3 object_name value is not specified, use the file_name value. if object_name is None: object_name = os.path.basename(file_name) # Uploading the file try: response = s3_client.upload_file(file_name, bucket, object_name) except ClientError as e: logging.error(e) return False return True s3_bucket_name = "YOUR_BUCKET_NAME" s3_client = boto3.client("s3", endpoint_url=os.environ["AWS_ENDPOINT_URL"], region_name=os.environ["AWS_REGION_NAME"], aws_access_key_id=os.environ["AWS_ACCESS_KEY_ID"], aws_secret_access_key=os.environ["AWS_SECRET_ACCESS_KEY"]) upload_file(s3_client, "./resources/__main__.py", s3_bucket_name, object_name="__main__.py")
Here is an example of code in Python to upload your data on Object Storage with the Swift API.
import swiftclient from swiftclient.exceptions import ClientException def upload_file_swift(url, token, file_path, bucket_name, name): """ :param url: The Swift bucket endpoint, in String data type. :param file_path: The path of the file to upload, in String data type. :param token: The Openstack token, in String data type. You can create it in OVH by navigating to Users & Roles -> Select a user -> Select the ellipsis menu -> Generate an OpenStack token. :param bucket_name: The bucket to upload to, in String data type. :param name: The S3 object name, in String data type. :return: """ try: with open(file_path, 'rb') as f: file_data = f.read() swiftclient.client.put_object(url=url, token=token, container=bucket_name, name=name, contents=file_data) except ClientException as e: logging.error(e) raise return True ovh_token_data = { "auth": { "identity": { "methods": [ "password" ], "password": { "user": { "name": os.environ["OVH_USER_LOGIN"], "domain": { "id": "default" }, "password": os.environ["OVH_USER_PWD"] } } }, "scope": { "project": { "name": os.environ["OVH_TENANT_NAME"], "domain": { "id": "default" } } } } } s3_bucket_name = "YOUR_BUCKET_NAME" bucket_endpoint = os.environ["BUCKET_ENDPOINT_URL"] # https://storage.<region>.cloud.ovh.net/v1/AUTH_{TENANT_ID}/ res_get_token = requests.post(url = "https://auth.cloud.ovh.net/v3/auth/tokens", json = ovh_token_data, headers={"Content-Type": "application/json"} ) openstack_token = res_get_token.headers["x-subject-token"] upload_file_swift(bucket_endpoint, openstack_token, "./ressources/__main__.py", s3_bucket_name, "__main__.py")
-
Create your job in Saagie to use GPU or CPU on demand.
Example of a job code in Python.import requests class BearerAuth(requests.auth.AuthBase): def __init__(self, token): self.token = token def __call__(self, r): r.headers["authorization"] = "Bearer " + self.token return r s3_bucket_name = "YOUR_BUCKET_NAME" command_line = "python ~/sample_project/__main__.py" # You can customize the command line to suit your needs. ovh_token_gra = os.environ["OVH_TOKEN"] # The token that you have created at step 3. ovh_new_job = { "image": "YOUR_IMAGE", "region": "GRA", # The region where the job will be run. "volumes": [ { "dataStore": { "alias": "GRA", # If you use Object Storage with the S3 API, use the custom alias you have created. "container": s3_bucket_name, "prefix": "" }, "mountPath": "/workspace/sample_project", "permission": "RW", "cache": False } ], "name": "YOUR_JOB_NAME", "unsecureHttp": False, "resources": { "gpu": 1, # The number of GPU you need. "flavor": "ai1-1-gpu" }, "command": [ "bash", "-c", command_line ], "envVars": [ { "name": "YOUR_ENV_VAR_NAME", "value": os.environ['YOUR_ENV_VAR_NAME'] } # You can set other environment variables here. ], "sshPublicKeys": [] } # Sends the request to create the job. response_create_job = requests.post("https://gra.training.ai.cloud.ovh.net/v1/job", auth=BearerAuth(ovh_token_gra), json=ovh_new_job)
In your code, you must have a
POST
request tohttps://<region>.training.ai.cloud.ovh.net/v1/job
with the user or token created at step 3. Where<region>
must be replaced withgra
orbhs
. This POST request will return you the job ID.In the request body, specify the number of GPU or CPU, and link the job to your Object Storage.
You have launched your request for GPU or CPU on demand.
Monitor Your Job
-
Add the following code to your job’s code file to get information about the job and its logs:
Add the
https://<region>.training.ai.cloud.ovh.net/v1/job/{id_job}
GET
request to your code file to get information on your job. Where<region>
and<id_job>
must be replaced with your values.Example of code in Python.import requests response_job = requests.get(f"https://gra.training.ai.cloud.ovh.net/v1/job/{id_job}", auth=BearerAuth(ovh_token_gra))
Add the
https://<region>.training.ai.cloud.ovh.net/v1/job/{id_job}/log
GET
request to your code file to get information on your job logs. Where<region>
and<id_job>
must be replaced with your values.Example of code in Python.import requests # id_job is the job ID that you receive after step 3 response_job_logs = requests.get(f"https://gra.training.ai.cloud.ovh.net/v1/job/{id_job}/log", auth=BearerAuth(ovh_token_gra)).text for line in response_job_logs.splitlines(): print(line, flush=True)