{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook uses flaml to finetune a transformer model from Huggingface transformers library.\n", "\n", "**Requirements.** This notebook has additional requirements:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "#!pip install torch transformers datasets ipywidgets flaml[blendsearch,ray];" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tokenizer" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoTokenizer" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "MODEL_CHECKPOINT = \"distilbert-base-uncased\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "tokenizer = AutoTokenizer.from_pretrained(MODEL_CHECKPOINT, use_fast=True)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'input_ids': [101, 2023, 2003, 1037, 3231, 102], 'attention_mask': [1, 1, 1, 1, 1, 1]}" ] }, "metadata": {}, "execution_count": 5 } ], "source": [ "tokenizer(\"this is a test\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "TASK = \"cola\"" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import datasets" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n" ] } ], "source": [ "raw_dataset = datasets.load_dataset(\"glue\", TASK)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# define tokenization function used to process data\n", "COLUMN_NAME = \"sentence\"\n", "def tokenize(examples):\n", " return tokenizer(examples[COLUMN_NAME], truncation=True)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "output_type": "display_data", "data": { "text/plain": "HBox(children=(FloatProgress(value=0.0, max=9.0), HTML(value='')))", "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "ecc66e6795f848e0a41e6cf1ce37bdf2" } }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "\n" ] }, { "output_type": "display_data", "data": { "text/plain": "HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))", "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "2d33fc70b80b403080ad8c0e77ed1891" } }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "\n" ] }, { "output_type": "display_data", "data": { "text/plain": "HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))", "application/vnd.jupyter.widget-view+json": { "version_major": 2, "version_minor": 0, "model_id": "d2ab3feb1a354187abb2dded0ead404f" } }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "\n" ] } ], "source": [ "encoded_dataset = raw_dataset.map(tokenize, batched=True)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "{'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", " 'idx': 0,\n", " 'input_ids': [101,\n", " 2256,\n", " 2814,\n", " 2180,\n", " 1005,\n", " 1056,\n", " 4965,\n", " 2023,\n", " 4106,\n", " 1010,\n", " 2292,\n", " 2894,\n", " 1996,\n", " 2279,\n", " 2028,\n", " 2057,\n", " 16599,\n", " 1012,\n", " 102],\n", " 'label': 1,\n", " 'sentence': \"Our friends won't buy this analysis, let alone the next one we propose.\"}" ] }, "metadata": {}, "execution_count": 11 } ], "source": [ "encoded_dataset[\"train\"][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from transformers import AutoModelForSequenceClassification" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n", "- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", "- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n", "Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n", "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" ] } ], "source": [ "NUM_LABELS = 2\n", "model = AutoModelForSequenceClassification.from_pretrained(MODEL_CHECKPOINT, num_labels=NUM_LABELS)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "DistilBertForSequenceClassification(\n", " (distilbert): DistilBertModel(\n", " (embeddings): Embeddings(\n", " (word_embeddings): Embedding(30522, 768, padding_idx=0)\n", " (position_embeddings): Embedding(512, 768)\n", " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " )\n", " (transformer): Transformer(\n", " (layer): ModuleList(\n", " (0): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " (1): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " (2): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " (3): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " (4): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " (5): TransformerBlock(\n", " (attention): MultiHeadSelfAttention(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (q_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (k_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (v_lin): Linear(in_features=768, out_features=768, bias=True)\n", " (out_lin): Linear(in_features=768, out_features=768, bias=True)\n", " )\n", " (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " (ffn): FFN(\n", " (dropout): Dropout(p=0.1, inplace=False)\n", " (lin1): Linear(in_features=768, out_features=3072, bias=True)\n", " (lin2): Linear(in_features=3072, out_features=768, bias=True)\n", " )\n", " (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", " )\n", " )\n", " )\n", " )\n", " (pre_classifier): Linear(in_features=768, out_features=768, bias=True)\n", " (classifier): Linear(in_features=768, out_features=2, bias=True)\n", " (dropout): Dropout(p=0.2, inplace=False)\n", ")" ] }, "metadata": {}, "execution_count": 14 } ], "source": [ "model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Metric" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "metric = datasets.load_metric(\"glue\", TASK)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Metric(name: \"glue\", features: {'predictions': Value(dtype='int64', id=None), 'references': Value(dtype='int64', id=None)}, usage: \"\"\"\n", "Compute GLUE evaluation metric associated to each GLUE dataset.\n", "Args:\n", " predictions: list of predictions to score.\n", " Each translation should be tokenized into a list of tokens.\n", " references: list of lists of references for each translation.\n", " Each reference should be tokenized into a list of tokens.\n", "Returns: depending on the GLUE subset, one or several of:\n", " \"accuracy\": Accuracy\n", " \"f1\": F1 score\n", " \"pearson\": Pearson Correlation\n", " \"spearmanr\": Spearman Correlation\n", " \"matthews_correlation\": Matthew Correlation\n", "Examples:\n", "\n", " >>> glue_metric = datasets.load_metric('glue', 'sst2') # 'sst2' or any of [\"mnli\", \"mnli_mismatched\", \"mnli_matched\", \"qnli\", \"rte\", \"wnli\", \"hans\"]\n", " >>> references = [0, 1]\n", " >>> predictions = [0, 1]\n", " >>> results = glue_metric.compute(predictions=predictions, references=references)\n", " >>> print(results)\n", " {'accuracy': 1.0}\n", "\n", " >>> glue_metric = datasets.load_metric('glue', 'mrpc') # 'mrpc' or 'qqp'\n", " >>> references = [0, 1]\n", " >>> predictions = [0, 1]\n", " >>> results = glue_metric.compute(predictions=predictions, references=references)\n", " >>> print(results)\n", " {'accuracy': 1.0, 'f1': 1.0}\n", "\n", " >>> glue_metric = datasets.load_metric('glue', 'stsb')\n", " >>> references = [0., 1., 2., 3., 4., 5.]\n", " >>> predictions = [0., 1., 2., 3., 4., 5.]\n", " >>> results = glue_metric.compute(predictions=predictions, references=references)\n", " >>> print({\"pearson\": round(results[\"pearson\"], 2), \"spearmanr\": round(results[\"spearmanr\"], 2)})\n", " {'pearson': 1.0, 'spearmanr': 1.0}\n", "\n", " >>> glue_metric = datasets.load_metric('glue', 'cola')\n", " >>> references = [0, 1]\n", " >>> predictions = [0, 1]\n", " >>> results = glue_metric.compute(predictions=predictions, references=references)\n", " >>> print(results)\n", " {'matthews_correlation': 1.0}\n", "\"\"\", stored examples: 0)" ] }, "metadata": {}, "execution_count": 16 } ], "source": [ "metric" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "def compute_metrics(eval_pred):\n", " predictions, labels = eval_pred\n", " predictions = np.argmax(predictions, axis=1)\n", " return metric.compute(predictions=predictions, references=labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training (aka Finetuning)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from transformers import Trainer\n", "from transformers import TrainingArguments" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "args = TrainingArguments(\n", " output_dir='output',\n", " do_eval=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "trainer = Trainer(\n", " model=model,\n", " args=args,\n", " train_dataset=encoded_dataset[\"train\"],\n", " eval_dataset=encoded_dataset[\"validation\"],\n", " tokenizer=tokenizer,\n", " compute_metrics=compute_metrics,\n", ")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n warnings.warn('Was asked to gather along dimension 0, but all '\n" ] }, { "output_type": "display_data", "data": { "text/plain": "", "text/html": "\n
\n \n \n \n [ 2/804 : < :, Epoch 0.00/3]\n
\n \n \n \n \n \n \n \n \n \n
StepTraining Loss

" }, "metadata": {} }, { "output_type": "stream", "name": "stderr", "text": [ "/home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n warnings.warn('Was asked to gather along dimension 0, but all '\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "TrainOutput(global_step=804, training_loss=0.3209413462017306, metrics={'train_runtime': 115.5328, 'train_samples_per_second': 6.959, 'total_flos': 238363718990580.0, 'epoch': 3.0, 'init_mem_cpu_alloc_delta': 2336600064, 'init_mem_gpu_alloc_delta': 268953088, 'init_mem_cpu_peaked_delta': 257929216, 'init_mem_gpu_peaked_delta': 0, 'train_mem_cpu_alloc_delta': 2381066240, 'train_mem_gpu_alloc_delta': 806788096, 'train_mem_cpu_peaked_delta': 186974208, 'train_mem_gpu_peaked_delta': 550790144})" ] }, "metadata": {}, "execution_count": 21 } ], "source": [ "trainer.train()" ] }, { "source": [ "## Hyperparameter Optimization\n", "\n", "`flaml.tune` is a module for economical hyperparameter tuning. It frees users from manually tuning many hyperparameters for a software, such as machine learning training procedures. \n", "The API is compatible with ray tune.\n", "\n", "### Step 1. Define training method\n", "\n", "We define a function `train_distilbert(config: dict)` that accepts a hyperparameter configuration dict `config`. The specific configs will be generated by flaml's search algorithm in a given search space.\n" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "import flaml\n", "\n", "def train_distilbert(config: dict):\n", "\n", " # Load CoLA dataset and apply tokenizer\n", " cola_raw = datasets.load_dataset(\"glue\", TASK)\n", " cola_encoded = cola_raw.map(tokenize, batched=True)\n", " train_dataset, eval_dataset = cola_encoded[\"train\"], cola_encoded[\"validation\"]\n", "\n", " model = AutoModelForSequenceClassification.from_pretrained(\n", " MODEL_CHECKPOINT, num_labels=NUM_LABELS\n", " )\n", "\n", " metric = datasets.load_metric(\"glue\", TASK)\n", " def compute_metrics(eval_pred):\n", " predictions, labels = eval_pred\n", " predictions = np.argmax(predictions, axis=1)\n", " return metric.compute(predictions=predictions, references=labels)\n", "\n", " training_args = TrainingArguments(\n", " output_dir='.',\n", " do_eval=False,\n", " disable_tqdm=True,\n", " logging_steps=20000,\n", " save_total_limit=0,\n", " **config,\n", " )\n", "\n", " trainer = Trainer(\n", " model,\n", " training_args,\n", " train_dataset=train_dataset,\n", " eval_dataset=eval_dataset,\n", " tokenizer=tokenizer,\n", " compute_metrics=compute_metrics,\n", " )\n", "\n", " # train model\n", " trainer.train()\n", "\n", " # evaluate model\n", " eval_output = trainer.evaluate()\n", "\n", " # report the metric to optimize\n", " flaml.tune.report(\n", " loss=eval_output[\"eval_loss\"],\n", " matthews_correlation=eval_output[\"eval_matthews_correlation\"],\n", " )" ] }, { "source": [ "### Step 2. Define the search\n", "\n", "We are now ready to define our search. This includes:\n", "\n", "- The `search_space` for our hyperparameters\n", "- The metric and the mode ('max' or 'min') for optimization\n", "- The constraints (`n_cpus`, `n_gpus`, `num_samples`, and `time_budget_s`)" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "max_num_epoch = 64\n", "search_space = {\n", " # You can mix constants with search space objects.\n", " \"num_train_epochs\": flaml.tune.loguniform(1, max_num_epoch),\n", " \"learning_rate\": flaml.tune.loguniform(1e-6, 1e-4),\n", " \"adam_epsilon\": flaml.tune.loguniform(1e-9, 1e-7),\n", " \"adam_beta1\": flaml.tune.uniform(0.8, 0.99),\n", " \"adam_beta2\": flaml.tune.loguniform(98e-2, 9999e-4),\n", "}" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# optimization objective\n", "HP_METRIC, MODE = \"matthews_correlation\", \"max\"\n", "\n", "# resources\n", "num_cpus = 4\n", "num_gpus = 4\n", "\n", "# constraints\n", "num_samples = -1 # number of trials, -1 means unlimited\n", "time_budget_s = 3600 # time budget in seconds" ] }, { "source": [ "### Step 3. Launch with `flaml.tune.run`\n", "\n", "We are now ready to launch the tuning using `flaml.tune.run`:" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "2021-05-07 02:35:57,130\tINFO services.py:1172 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8265\u001b[39m\u001b[22m\n", "2021-05-07 02:35:58,044\tWARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.\n", "Tuning started...\n" ] }, { "output_type": "display_data", "data": { "text/plain": "", "text/html": "== Status ==
Memory usage on this node: 26.0/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 1/infinite (1 RUNNING)

" }, "metadata": {} }, { "output_type": "stream", "name": "stderr", "text": [ "\u001b[2m\u001b[36m(pid=886303)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 30.9/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 2/infinite (1 PENDING, 1 RUNNING)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_a0c303d0 completed. Last result: loss=0.5879864692687988,matthews_correlation=0.0\n", "\u001b[2m\u001b[36m(pid=886302)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 3/infinite (1 PENDING, 1 RUNNING, 1 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_a0c303d1 completed. Last result: loss=0.6030182838439941,matthews_correlation=0.0\n", "\u001b[2m\u001b[36m(pid=886305)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.4/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 4/infinite (1 PENDING, 1 RUNNING, 2 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_c39b2ef0 completed. Last result: loss=0.5865175724029541,matthews_correlation=0.0\n", "\u001b[2m\u001b[36m(pid=886304)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.7/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 5/infinite (1 PENDING, 1 RUNNING, 3 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_f00776e2 completed. Last result: loss=0.5813134908676147,matthews_correlation=0.0\n", "\u001b[2m\u001b[36m(pid=892770)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 32.0/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 6/infinite (1 PENDING, 1 RUNNING, 4 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_11ab3900 completed. Last result: loss=0.5855756998062134,matthews_correlation=0.0\n", "\u001b[2m\u001b[36m(pid=897725)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 30.9/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 7/infinite (1 PENDING, 1 RUNNING, 5 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_353025b6 completed. Last result: loss=0.5316324830055237,matthews_correlation=0.38889272875750597\n", "\u001b[2m\u001b[36m(pid=907288)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.3/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 8/infinite (1 PENDING, 1 RUNNING, 6 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_5728a1de completed. Last result: loss=0.5385054349899292,matthews_correlation=0.2805581766595423\n", "\u001b[2m\u001b[36m(pid=908756)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.6/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 9/infinite (1 PENDING, 1 RUNNING, 7 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_9394c2e2 completed. Last result: loss=0.5391769409179688,matthews_correlation=0.3272948213494272\n", "\u001b[2m\u001b[36m(pid=912284)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.9/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 10/infinite (1 PENDING, 1 RUNNING, 8 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_b6543fec completed. Last result: loss=0.5275164842605591,matthews_correlation=0.37917684067701946\n", "\u001b[2m\u001b[36m(pid=914582)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.0/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 11/infinite (1 PENDING, 1 RUNNING, 9 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_0071f998 completed. Last result: loss=0.5162246823310852,matthews_correlation=0.417156672319181\n", "\u001b[2m\u001b[36m(pid=918301)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 12/infinite (1 PENDING, 1 RUNNING, 10 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_2f830be6 completed. Last result: loss=0.5516289472579956,matthews_correlation=0.06558874629318973\n", "\u001b[2m\u001b[36m(pid=920414)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.7/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 13/infinite (1 PENDING, 1 RUNNING, 11 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_7ce03f12 completed. Last result: loss=0.523731529712677,matthews_correlation=0.45354879777314566\n", "\u001b[2m\u001b[36m(pid=925520)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 32.3/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 14/infinite (1 PENDING, 1 RUNNING, 12 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_aaab0508 completed. Last result: loss=0.5112878680229187,matthews_correlation=0.4508496945113286\n", "\u001b[2m\u001b[36m(pid=929827)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 15/infinite (1 PENDING, 1 RUNNING, 13 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_14262454 completed. Last result: loss=0.5350601673126221,matthews_correlation=0.40085080763525827\n", "\u001b[2m\u001b[36m(pid=934238)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.8/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 16/infinite (1 PENDING, 1 RUNNING, 14 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_6d211fe6 completed. Last result: loss=0.609851062297821,matthews_correlation=0.5268023551875569\n", "\u001b[2m\u001b[36m(pid=942628)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.1/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 17/infinite (1 PENDING, 1 RUNNING, 15 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_c980bae4 completed. Last result: loss=0.5422758460044861,matthews_correlation=0.32496815807366203\n", "\u001b[2m\u001b[36m(pid=945904)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 32.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 18/infinite (1 PENDING, 1 RUNNING, 16 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_6d0d29d6 completed. Last result: loss=0.9238015413284302,matthews_correlation=0.5494735380761103\n", "\u001b[2m\u001b[36m(pid=973869)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 19/infinite (1 PENDING, 1 RUNNING, 17 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_b16ea82a completed. Last result: loss=0.5334658622741699,matthews_correlation=0.4513069078434825\n", "\u001b[2m\u001b[36m(pid=978003)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.2/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 20/infinite (1 PENDING, 1 RUNNING, 18 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_eddf7cc0 completed. Last result: loss=0.9832845330238342,matthews_correlation=0.5699304939602442\n", "\u001b[2m\u001b[36m(pid=1000417)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 31.4/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 4/4 CPUs, 4/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 21/infinite (1 PENDING, 1 RUNNING, 19 TERMINATED)

" }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "Trial train_distilbert_43008974 completed. Last result: loss=0.8574612736701965,matthews_correlation=0.5200220944545176\n", "\u001b[2m\u001b[36m(pid=1022436)\u001b[0m Reusing dataset glue (/home/chiw/.cache/huggingface/datasets/glue/cola/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", " 0%| | 0/9 [00:00", "text/html": "== Status ==
Memory usage on this node: 32.0/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 22/infinite (22 TERMINATED)

" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": "", "text/html": "== Status ==
Memory usage on this node: 32.0/251.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/150.39 GiB heap, 0.0/47.22 GiB objects (0/1.0 accelerator_type:V100)
Result logdir: /home/chiw/FLAML/notebook/logs/train_distilbert_2021-05-07_02-35-58
Number of trials: 22/infinite (22 TERMINATED)
\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Trial name status loc adam_beta1 adam_beta2 adam_epsilon learning_rate num_train_epochs iter total time (s) loss matthews_correlation
train_distilbert_a0c303d0TERMINATED 0.939079 0.991865 7.96945e-08 5.61152e-06 1 1 55.69090.587986 0
train_distilbert_a0c303d1TERMINATED 0.811036 0.997214 2.05111e-09 2.05134e-06 1.44427 1 71.76630.603018 0
train_distilbert_c39b2ef0TERMINATED 0.909395 0.993715 1e-07 5.26543e-06 1 1 53.76190.586518 0
train_distilbert_f00776e2TERMINATED 0.968763 0.990019 4.38943e-08 5.98035e-06 1.02723 1 56.83820.581313 0
train_distilbert_11ab3900TERMINATED 0.962198 0.991838 7.09296e-08 5.06608e-06 1 1 54.02310.585576 0
train_distilbert_353025b6TERMINATED 0.91596 0.991892 8.95426e-08 6.21568e-06 2.15443 1 98.32330.531632 0.388893
train_distilbert_5728a1deTERMINATED 0.926933 0.993146 1e-07 1.00902e-05 1 1 55.37260.538505 0.280558
train_distilbert_9394c2e2TERMINATED 0.928106 0.990614 4.49975e-08 3.45674e-06 2.72935 1 121.388 0.539177 0.327295
train_distilbert_b6543fecTERMINATED 0.876896 0.992098 1e-07 7.01176e-06 1.59538 1 76.02440.527516 0.379177
train_distilbert_0071f998TERMINATED 0.955024 0.991687 7.39776e-08 5.50998e-06 2.90939 1 126.871 0.516225 0.417157
train_distilbert_2f830be6TERMINATED 0.886931 0.989628 7.6127e-08 4.37646e-06 1.53338 1 73.89340.551629 0.0655887
train_distilbert_7ce03f12TERMINATED 0.984053 0.993956 8.70144e-08 7.82557e-06 4.08775 1 174.027 0.523732 0.453549
train_distilbert_aaab0508TERMINATED 0.940707 0.993946 1e-07 8.91979e-06 3.40243 1 146.249 0.511288 0.45085
train_distilbert_14262454TERMINATED 0.99 0.991696 4.60093e-08 4.83405e-06 3.4954 1 152.008 0.53506 0.400851
train_distilbert_6d211fe6TERMINATED 0.959277 0.994556 5.40791e-08 1.17333e-05 6.64995 1 271.444 0.609851 0.526802
train_distilbert_c980bae4TERMINATED 0.99 0.993355 1e-07 5.21929e-06 2.51275 1 111.799 0.542276 0.324968
train_distilbert_6d0d29d6TERMINATED 0.965773 0.995182 9.9752e-08 1.15549e-05 13.694 1 527.944 0.923802 0.549474
train_distilbert_b16ea82aTERMINATED 0.952781 0.993931 2.93182e-08 1.19145e-05 3.2293 1 139.844 0.533466 0.451307
train_distilbert_eddf7cc0TERMINATED 0.99 0.997109 8.13498e-08 1.28515e-05 15.5807 1 614.789 0.983285 0.56993
train_distilbert_43008974TERMINATED 0.929089 0.993258 1e-07 1.03892e-05 12.0357 1 474.387 0.857461 0.520022
train_distilbert_b3408a4eTERMINATED 0.99 0.993809 4.67441e-08 1.10418e-05 11.9165 1 474.126 0.828205 0.526164
train_distilbert_cfbfb220TERMINATED 0.979454 0.9999 1e-07 1.49578e-05 20.3715


" }, "metadata": {} }, { "output_type": "stream", "name": "stderr", "text": [ "2021-05-07 03:42:30,035\tINFO tune.py:450 -- Total run time: 3992.00 seconds (3991.90 seconds for the tuning loop).\n" ] } ], "source": [ "import time\n", "import ray\n", "start_time = time.time()\n", "ray.shutdown()\n", "ray.init(num_cpus=num_cpus, num_gpus=num_gpus)\n", "\n", "print(\"Tuning started...\")\n", "analysis = flaml.tune.run(\n", " train_distilbert,\n", " search_alg=flaml.CFO(\n", " space=search_space,\n", " metric=HP_METRIC,\n", " mode=MODE,\n", " low_cost_partial_config={\"num_train_epochs\": 1}),\n", " report_intermediate_result=False,\n", " # uncomment the following if report_intermediate_result = True\n", " # max_resource=max_num_epoch, min_resource=1,\n", " resources_per_trial={\"gpu\": num_gpus, \"cpu\": num_cpus},\n", " local_dir='logs/',\n", " num_samples=num_samples,\n", " time_budget_s=time_budget_s,\n", " use_ray=True,\n", ")\n", "\n", "ray.shutdown()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "n_trials=22\ntime=3999.769361972809\nBest model eval matthews_correlation: 0.5699\nBest model parameters: {'num_train_epochs': 15.580684188655825, 'learning_rate': 1.2851507818900338e-05, 'adam_epsilon': 8.134982521948352e-08, 'adam_beta1': 0.99, 'adam_beta2': 0.9971094424784387}\n" ] } ], "source": [ "best_trial = analysis.get_best_trial(HP_METRIC, MODE, \"all\")\n", "metric = best_trial.metric_analysis[HP_METRIC][MODE]\n", "print(f\"n_trials={len(analysis.trials)}\")\n", "print(f\"time={time.time()-start_time}\")\n", "print(f\"Best model eval {HP_METRIC}: {metric:.4f}\")\n", "print(f\"Best model parameters: {best_trial.config}\")\n" ] }, { "source": [ "## Next Steps\n", "\n", "Notice that we only reported the metric with `flaml.tune.report` at the end of full training loop. It is possible to enable reporting of intermediate performance - allowing early stopping - as follows:\n", "\n", "- Huggingface provides _Callbacks_ which can be used to insert the `flaml.tune.report` call inside the training loop\n", "- Make sure to set `do_eval=True` in the `TrainingArguments` provided to `Trainer` and adjust the evaluation frequency accordingly" ], "cell_type": "markdown", "metadata": {} } ], "metadata": { "kernelspec": { "name": "python385jvsc74a57bd031f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6", "display_name": "Python 3.8.5 64-bit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "metadata": { "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } } }, "nbformat": 4, "nbformat_minor": 4 }