"Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
"/home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n warnings.warn('Was asked to gather along dimension 0, but all '\n"
"text/html": "\n <div>\n <style>\n /* Turns off some styling */\n progress {\n /* gets rid of default border in Firefox and Opera. */\n border: none;\n /* Needs to be in here for Safari polyfill so background images work as expected. */\n background-size: auto;\n }\n </style>\n \n <progress value='2' max='804' style='width:300px; height:20px; vertical-align: middle;'></progress>\n [ 2/804 : < :, Epoch 0.00/3]\n </div>\n <table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: left;\">\n <th>Step</th>\n <th>Training Loss</th>\n </tr>\n </thead>\n <tbody>\n </tbody>\n</table><p>"
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stderr",
"text": [
"/home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n warnings.warn('Was asked to gather along dimension 0, but all '\n"
"`flaml.tune` is a module for economical hyperparameter tuning. It frees users from manually tuning many hyperparameters for a software, such as machine learning training procedures. \n",
"The API is compatible with ray tune.\n",
"\n",
"### Step 1. Define training method\n",
"\n",
"We define a function `train_distilbert(config: dict)` that accepts a hyperparameter configuration dict `config`. The specific configs will be generated by flaml's search algorithm in a given search space.\n"
"2021-05-07 02:35:57,130\tINFO services.py:1172 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8265\u001b[39m\u001b[22m\n",
"2021-05-07 02:35:58,044\tWARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=886303)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=886302)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=886305)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=886304)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=892770)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=897725)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=907288)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=908756)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=912284)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=914582)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=918301)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=920414)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=925520)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=929827)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=934238)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=942628)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=945904)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=973869)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=978003)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1000417)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_projector.bias']\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m - This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m - This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m /home/chiw/.local/lib/python3.8/site-packages/torch/nn/parallel/_functions.py:65: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
"\u001b[2m\u001b[36m(pid=1022436)\u001b[0m warnings.warn('Was asked to gather along dimension 0, but all '\n",
"print(f\"Best model eval {HP_METRIC}: {metric:.4f}\")\n",
"print(f\"Best model parameters: {best_trial.config}\")\n"
]
},
{
"source": [
"## Next Steps\n",
"\n",
"Notice that we only reported the metric with `flaml.tune.report` at the end of full training loop. It is possible to enable reporting of intermediate performance - allowing early stopping - as follows:\n",
"\n",
"- Huggingface provides _Callbacks_ which can be used to insert the `flaml.tune.report` call inside the training loop\n",