Push Models to Hugging Face With Saagie
-
Install the Transformers and Datasets libraries.
!pip install transformers[torch] -U !pip install datasets
-
Load a pre-trained model from Hugging Face.
-
Find the model you need on the Model Hub. In our example, we use the bert-tiny model.
-
Load the model with the following code.
from transformers import AutoModelForSequenceClassification ## Example NLP model for sentiment analysis model_name = "prajjwal1/bert-tiny:main" (1) if ':' in model_name: model_ver = model_name.split(':')[1] model_name = model_name.split(':')[0] else: model_ver = "main" model = AutoModelForSequenceClassification.from_pretrained(model_name, revision=model_ver)
Where
1 "prajjwal1/bert-tiny:main"
can be replaced by another model name.
-
-
Fine-tune your model.
-
Find the dataset you need on the Hugging Face Hub. In our example, we use the
sst2
dataset for textual sentiment classification. -
Load and pre-process your datasets with the following code.
from transformers import AutoTokenizer, DataCollatorWithPadding, Trainer, TrainingArguments from datasets import load_dataset ## Loading Datasets dataset = load_dataset("sst2") (1) train_dataset = dataset['train'] valid_dataset = dataset['validation'] train_subset = 100 eval_subset = 20 seed = 42 repo_name = "MyRepo" (2) ## Pre-processing Datasets tokenizer = AutoTokenizer.from_pretrained(model_name) def tokenize_function(examples): return tokenizer(examples["sentence"], padding="max_length", truncation=True) tokenized_train = train_dataset.map(tokenize_function, batched=True) tokenized_valid = valid_dataset.map(tokenize_function, batched=True) small_train_dataset = tokenized_train.shuffle(seed=seed).select(range(train_subset)) small_eval_dataset = tokenized_valid.shuffle(seed=seed).select(range(eval_subset)) data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Where
1 "sst2"
can be replaced by another dataset name.2 "MyRepo"
must be replaced by the name of your repository. -
Add the following code to configure the hyperparameters and train your model.
## Defining hyperparameters for fine-tuning training_args = TrainingArguments( output_dir=repo_name, num_train_epochs=2, per_device_train_batch_size=16, per_device_eval_batch_size=16, logging_dir='./logs', logging_steps=10, ) ## Fine-tune the model with Trainer class (1) trainer = Trainer( model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset, tokenizer=tokenizer, data_collator=data_collator, ) trainer.train() (2)
Where
1 This block is your Trainer object with your model, training arguments, training and test datasets, and evaluation function. 2 This line gives you the result of your training. Relevant training results are displayed in the log of this step. It should look like the following: TrainOutput(global_step=14, training_loss=0.664779714175633 , metrics={'train_runtime': 2.4304, 'train_samples_per_second': 82.29 , 'train_steps_per_second': 5.76, 'total_flos': 17489048160.0 , 'train_loss': 0.664779714175633, 'epoch': 2.0})
-
-
Push your model to Hugging Face.
-
Log in directly to the Hub via the
huggingface_hub
library using the access token.from huggingface_hub import notebook_login notebook_login()
-
Push your model.
trainer.push_to_hub("MyModelName") (1)
Where
1 "MyModelName"
must be replaced by the name of your model. -
OPTIONAL: Download the model to test its availability.
from transformers import AutoModelForSequenceClassification model_name = "MY_ORGANIZATION/"+ MyRepo (1) model = AutoModelForSequenceClassification.from_pretrained(model_name)
Where
1 "MY_ORGANIZATION/"
andMyRepo
must be replaced by your own values.
When running a Python script, for example via a job on the Saagie platform, use the following code to log in and push your model to Hugging Face:
from transformers import AutoModelForSequenceClassification model_name = "MY_ORGANIZATION/"+ MyRepo (1) model = AutoModelForSequenceClassification.from_pretrained(model_name)
Where
1 "MY_ORGANIZATION/"
andMyRepo
must be replaced by your own values.As described in the Hugging Face documentation, trained model files must be packaged and manually uploaded to Hugging Face.
-