{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved. \n", "\n", "Licensed under the MIT License.\n", "\n", "# FineTuning NLP Models with FLAML Library\n", "\n", "\n", "## 1. Introduction\n", "\n", "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n", "with low computational cost. It is fast and economical. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can \n", "- serve as an economical AutoML engine,\n", "- be used as a fast hyperparameter tuning tool, or \n", "- be embedded in self-tuning software that requires low latency & resource in repetitive\n", " tuning tasks.\n", "\n", "In this notebook, we demonstrate how to use the FLAML library to fine tune an NLP language model with hyperparameter search. We have tested this notebook on a server with 4 NVidia V100 GPU (32GB) and 400GB CPU Ram.\n", "\n", "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `nlp,ray,notebook` and `blendsearch` option:\n", "```bash\n", "pip install flaml[nlp,ray,notebook,blendsearch];\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mRequirement already satisfied: flaml[blendsearch,nlp,notebook,ray] in /data/xliu127/projects/hyperopt/FLAML (1.0.7)\n", "Requirement already satisfied: NumPy>=1.17.0rc1 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.21.2)\n", "Requirement already satisfied: lightgbm>=2.3.1 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (3.2.1)\n", "Requirement already satisfied: xgboost<=1.3.3,>=0.90 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.3.3)\n", "Requirement already satisfied: scipy>=1.4.1 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.6.2)\n", "Requirement already satisfied: pandas>=1.1.4 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.4.3)\n", "Requirement already satisfied: scikit-learn>=0.24 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (0.24.1)\n", "Requirement already satisfied: openml==0.10.2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (0.10.2)\n", "Requirement already satisfied: jupyter in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.0.0)\n", "Requirement already satisfied: matplotlib in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (3.4.3)\n", "Requirement already satisfied: rgf-python in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (3.12.0)\n", "Requirement already satisfied: catboost>=0.26 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.0.6)\n", "Requirement already satisfied: ray[tune]~=1.10 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.13.0)\n", "Requirement already satisfied: protobuf<4 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (3.15.8)\n", "Requirement already satisfied: transformers>=4.14 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (4.18.0)\n", "Requirement already satisfied: datasets in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (2.4.0)\n", "Requirement already satisfied: torch in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.12.0)\n", "Requirement already satisfied: seqeval in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (1.2.2)\n", "Requirement already satisfied: nltk in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (3.7)\n", "Requirement already satisfied: rouge_score in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (0.1.2)\n", "Requirement already satisfied: optuna==2.8.0 in /home/xliu127/.local/lib/python3.8/site-packages (from flaml[blendsearch,nlp,notebook,ray]) (2.8.0)\n", "Requirement already satisfied: liac-arff>=2.4.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (2.5.0)\n", "Requirement already satisfied: xmltodict in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (0.13.0)\n", "Requirement already satisfied: python-dateutil in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (2.8.2)\n", "Requirement already satisfied: requests in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (2.28.1)\n", "Requirement already satisfied: packaging>=20.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (21.3)\n", "Requirement already satisfied: sqlalchemy>=1.1.0 in /home/xliu127/.local/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.4.11)\n", "Requirement already satisfied: tqdm in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (4.64.0)\n", "Requirement already satisfied: colorlog in /home/xliu127/.local/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (5.0.1)\n", "Requirement already satisfied: cmaes>=0.8.2 in /home/xliu127/.local/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (0.8.2)\n", "Requirement already satisfied: alembic in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.8.1)\n", "Requirement already satisfied: cliff in /home/xliu127/.local/lib/python3.8/site-packages (from optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (3.7.0)\n", "Requirement already satisfied: plotly in /home/xliu127/.local/lib/python3.8/site-packages (from catboost>=0.26->flaml[blendsearch,nlp,notebook,ray]) (4.14.3)\n", "Requirement already satisfied: six in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from catboost>=0.26->flaml[blendsearch,nlp,notebook,ray]) (1.16.0)\n", "Requirement already satisfied: graphviz in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from catboost>=0.26->flaml[blendsearch,nlp,notebook,ray]) (0.20.1)\n", "Requirement already satisfied: wheel in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from lightgbm>=2.3.1->flaml[blendsearch,nlp,notebook,ray]) (0.37.1)\n", "Requirement already satisfied: pytz>=2020.1 in /home/xliu127/.local/lib/python3.8/site-packages (from pandas>=1.1.4->flaml[blendsearch,nlp,notebook,ray]) (2021.1)\n", "Requirement already satisfied: attrs in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (20.3.0)\n", "Requirement already satisfied: frozenlist in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (1.3.0)\n", "Requirement already satisfied: virtualenv in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (20.16.2)\n", "Requirement already satisfied: click<=8.0.4,>=7.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (8.0.4)\n", "Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (1.0.2)\n", "Requirement already satisfied: filelock in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (3.7.1)\n", "Requirement already satisfied: grpcio<=1.43.0,>=1.28.1 in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (1.40.0)\n", "Requirement already satisfied: pyyaml in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (6.0)\n", "Requirement already satisfied: aiosignal in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (1.2.0)\n", "Requirement already satisfied: jsonschema in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (3.2.0)\n", "Requirement already satisfied: tensorboardX>=1.9 in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (2.2)\n", "Requirement already satisfied: tabulate in /home/xliu127/.local/lib/python3.8/site-packages (from ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (0.8.9)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: threadpoolctl>=2.0.0 in /home/xliu127/.local/lib/python3.8/site-packages (from scikit-learn>=0.24->flaml[blendsearch,nlp,notebook,ray]) (2.1.0)\n", "Requirement already satisfied: joblib>=0.11 in /home/xliu127/.local/lib/python3.8/site-packages (from scikit-learn>=0.24->flaml[blendsearch,nlp,notebook,ray]) (1.0.1)\n", "Requirement already satisfied: sacremoses in /home/xliu127/.local/lib/python3.8/site-packages (from transformers>=4.14->flaml[blendsearch,nlp,notebook,ray]) (0.0.45)\n", "Requirement already satisfied: regex!=2019.12.17 in /home/xliu127/.local/lib/python3.8/site-packages (from transformers>=4.14->flaml[blendsearch,nlp,notebook,ray]) (2021.8.28)\n", "Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from transformers>=4.14->flaml[blendsearch,nlp,notebook,ray]) (0.12.1)\n", "Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from transformers>=4.14->flaml[blendsearch,nlp,notebook,ray]) (0.8.1)\n", "Requirement already satisfied: xxhash in /home/xliu127/.local/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (2.0.2)\n", "Requirement already satisfied: fsspec[http]>=2021.11.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (2022.7.1)\n", "Requirement already satisfied: multiprocess in /home/xliu127/.local/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (0.70.12.2)\n", "Requirement already satisfied: dill<0.3.6 in /home/xliu127/.local/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (0.3.4)\n", "Requirement already satisfied: aiohttp in /home/xliu127/.local/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (3.7.4.post0)\n", "Requirement already satisfied: responses<0.19 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (0.18.0)\n", "Requirement already satisfied: pyarrow>=6.0.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from datasets->flaml[blendsearch,nlp,notebook,ray]) (8.0.0)\n", "Requirement already satisfied: ipywidgets in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (7.7.1)\n", "Requirement already satisfied: notebook in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (6.4.12)\n", "Requirement already satisfied: qtconsole in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.3.1)\n", "Requirement already satisfied: nbconvert in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (6.5.1)\n", "Requirement already satisfied: jupyter-console in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (6.4.4)\n", "Requirement already satisfied: ipykernel in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter->flaml[blendsearch,nlp,notebook,ray]) (6.15.1)\n", "Requirement already satisfied: pyparsing>=2.2.1 in /home/xliu127/.local/lib/python3.8/site-packages (from matplotlib->flaml[blendsearch,nlp,notebook,ray]) (2.4.7)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /home/xliu127/.local/lib/python3.8/site-packages (from matplotlib->flaml[blendsearch,nlp,notebook,ray]) (1.3.1)\n", "Requirement already satisfied: cycler>=0.10 in /home/xliu127/.local/lib/python3.8/site-packages (from matplotlib->flaml[blendsearch,nlp,notebook,ray]) (0.10.0)\n", "Requirement already satisfied: pillow>=6.2.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from matplotlib->flaml[blendsearch,nlp,notebook,ray]) (9.2.0)\n", "Requirement already satisfied: absl-py in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from rouge_score->flaml[blendsearch,nlp,notebook,ray]) (1.2.0)\n", "Requirement already satisfied: typing-extensions in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from torch->flaml[blendsearch,nlp,notebook,ray]) (4.3.0)\n", "Requirement already satisfied: charset-normalizer<3,>=2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from requests->openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (2.1.0)\n", "Requirement already satisfied: certifi>=2017.4.17 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from requests->openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (2022.6.15)\n", "Requirement already satisfied: idna<4,>=2.5 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from requests->openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (3.3)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from requests->openml==0.10.2->flaml[blendsearch,nlp,notebook,ray]) (1.26.11)\n", "Requirement already satisfied: greenlet!=0.4.17 in /home/xliu127/.local/lib/python3.8/site-packages (from sqlalchemy>=1.1.0->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.0.0)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /home/xliu127/.local/lib/python3.8/site-packages (from aiohttp->datasets->flaml[blendsearch,nlp,notebook,ray]) (5.1.0)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /home/xliu127/.local/lib/python3.8/site-packages (from aiohttp->datasets->flaml[blendsearch,nlp,notebook,ray]) (1.6.3)\n", "Requirement already satisfied: chardet<5.0,>=2.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from aiohttp->datasets->flaml[blendsearch,nlp,notebook,ray]) (4.0.0)\n", "Requirement already satisfied: async-timeout<4.0,>=3.0 in /home/xliu127/.local/lib/python3.8/site-packages (from aiohttp->datasets->flaml[blendsearch,nlp,notebook,ray]) (3.0.1)\n", "Requirement already satisfied: importlib-metadata in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from alembic->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (4.12.0)\n", "Requirement already satisfied: importlib-resources in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from alembic->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (5.9.0)\n", "Requirement already satisfied: Mako in /home/xliu127/.local/lib/python3.8/site-packages (from alembic->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.1.4)\n", "Requirement already satisfied: cmd2>=1.0.0 in /home/xliu127/.local/lib/python3.8/site-packages (from cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.5.0)\n", "Requirement already satisfied: PrettyTable>=0.7.2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (3.3.0)\n", "Requirement already satisfied: stevedore>=2.0.1 in /home/xliu127/.local/lib/python3.8/site-packages (from cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (3.3.0)\n", "Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in /home/xliu127/.local/lib/python3.8/site-packages (from cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (5.5.1)\n", "Requirement already satisfied: debugpy>=1.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.6.2)\n", "Requirement already satisfied: traitlets>=5.1.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.3.0)\n", "Requirement already satisfied: matplotlib-inline>=0.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.1.3)\n", "Requirement already satisfied: psutil in /home/xliu127/.local/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.8.0)\n", "Requirement already satisfied: ipython>=7.23.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (8.4.0)\n", "Requirement already satisfied: jupyter-client>=6.1.12 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (7.3.4)\n", "Requirement already satisfied: tornado>=6.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (6.2)\n", "Requirement already satisfied: nest-asyncio in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.5.5)\n", "Requirement already satisfied: pyzmq>=17 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (23.2.0)\n", "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipywidgets->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.1.1)\n", "Requirement already satisfied: ipython-genutils~=0.2.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipywidgets->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.2.0)\n", "Requirement already satisfied: widgetsnbextension~=3.6.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipywidgets->jupyter->flaml[blendsearch,nlp,notebook,ray]) (3.6.1)\n", "Requirement already satisfied: pyrsistent>=0.14.0 in /home/xliu127/.local/lib/python3.8/site-packages (from jsonschema->ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (0.17.3)\n", "Requirement already satisfied: setuptools in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jsonschema->ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (61.2.0)\n", "Requirement already satisfied: pygments in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter-console->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.12.0)\n", "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jupyter-console->jupyter->flaml[blendsearch,nlp,notebook,ray]) (3.0.30)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: defusedxml in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.7.1)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.5.0)\n", "Requirement already satisfied: beautifulsoup4 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (4.11.1)\n", "Requirement already satisfied: jupyter-core>=4.7 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (4.11.1)\n", "Requirement already satisfied: lxml in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (4.9.1)\n", "Requirement already satisfied: jupyterlab-pygments in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.2.2)\n", "Requirement already satisfied: bleach in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.0.1)\n", "Requirement already satisfied: tinycss2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.1.1)\n", "Requirement already satisfied: mistune<2,>=0.8.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.8.4)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /home/xliu127/.local/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.0.1)\n", "Requirement already satisfied: jinja2>=3.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (3.1.2)\n", "Requirement already satisfied: nbclient>=0.5.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.6.6)\n", "Requirement already satisfied: nbformat>=5.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.4.0)\n", "Requirement already satisfied: entrypoints>=0.2.2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.4)\n", "Requirement already satisfied: prometheus-client in /home/xliu127/.local/lib/python3.8/site-packages (from notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.10.1)\n", "Requirement already satisfied: argon2-cffi in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (21.3.0)\n", "Requirement already satisfied: Send2Trash>=1.8.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.8.0)\n", "Requirement already satisfied: terminado>=0.8.3 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.15.0)\n", "Requirement already satisfied: retrying>=1.3.3 in /home/xliu127/.local/lib/python3.8/site-packages (from plotly->catboost>=0.26->flaml[blendsearch,nlp,notebook,ray]) (1.3.3)\n", "Requirement already satisfied: qtpy>=2.0.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from qtconsole->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.1.0)\n", "Requirement already satisfied: platformdirs<3,>=2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from virtualenv->ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (2.5.2)\n", "Requirement already satisfied: distlib<1,>=0.3.1 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from virtualenv->ray[tune]~=1.10->flaml[blendsearch,nlp,notebook,ray]) (0.3.5)\n", "Requirement already satisfied: wcwidth>=0.1.7 in /home/xliu127/.local/lib/python3.8/site-packages (from cmd2>=1.0.0->cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (0.2.5)\n", "Requirement already satisfied: colorama>=0.3.7 in /home/xliu127/.local/lib/python3.8/site-packages (from cmd2>=1.0.0->cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (0.4.4)\n", "Requirement already satisfied: pyperclip>=1.6 in /home/xliu127/.local/lib/python3.8/site-packages (from cmd2>=1.0.0->cliff->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (1.8.2)\n", "Requirement already satisfied: stack-data in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.3.0)\n", "Requirement already satisfied: pexpect>4.3 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (4.8.0)\n", "Requirement already satisfied: backcall in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.2.0)\n", "Requirement already satisfied: pickleshare in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.7.5)\n", "Requirement already satisfied: decorator in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (5.1.1)\n", "Requirement already satisfied: jedi>=0.16 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.18.1)\n", "Requirement already satisfied: fastjsonschema in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from nbformat>=5.1->nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.16.1)\n", "Requirement already satisfied: ptyprocess in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from terminado>=0.8.3->notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.7.0)\n", "Requirement already satisfied: argon2-cffi-bindings in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from argon2-cffi->notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (21.2.0)\n", "Requirement already satisfied: soupsieve>1.2 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from beautifulsoup4->nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.3.2.post1)\n", "Requirement already satisfied: webencodings in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from bleach->nbconvert->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.5.1)\n", "Requirement already satisfied: zipp>=0.5 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from importlib-metadata->alembic->optuna==2.8.0->flaml[blendsearch,nlp,notebook,ray]) (3.8.1)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.8.3)\n", "Requirement already satisfied: cffi>=1.0.1 in /home/xliu127/.local/lib/python3.8/site-packages (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (1.14.6)\n", "Requirement already satisfied: executing in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.9.1)\n", "Requirement already satisfied: asttokens in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.0.7)\n", "Requirement already satisfied: pure-eval in /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages (from stack-data->ipython>=7.23.1->ipykernel->jupyter->flaml[blendsearch,nlp,notebook,ray]) (0.2.2)\n", "Requirement already satisfied: pycparser in /home/xliu127/.local/lib/python3.8/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter->flaml[blendsearch,nlp,notebook,ray]) (2.20)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0m\u001b[33mWARNING: Ignoring invalid distribution -andas (/home/xliu127/.local/lib/python3.8/site-packages)\u001b[0m\u001b[33m\n", "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install flaml[nlp,ray,notebook,blendsearch]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run some examples. \n", "\n", "Note: throughout this notebook, you may see a few ModuleNotFoundErrors. As long as the cell successfully executes, you can ignore that error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Sentiment Classification Example\n", "### Load data and preprocess\n", "\n", "The Stanford Sentiment treebank (SST-2) dataset is a dataset for sentiment classification. First, let's load this dataset into pandas dataframes:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Reusing dataset glue (/home/xliu127/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", "Reusing dataset glue (/home/xliu127/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n", "Reusing dataset glue (/home/xliu127/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n" ] } ], "source": [ "from datasets import load_dataset\n", "\n", "train_dataset = load_dataset(\"glue\", \"sst2\", split=\"train\").to_pandas()\n", "dev_dataset = load_dataset(\"glue\", \"sst2\", split=\"validation\").to_pandas()\n", "test_dataset = load_dataset(\"glue\", \"sst2\", split=\"test\").to_pandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take a look at the first 5 examples of this dataset:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sentencelabelidx
0hide new secretions from the parental units00
1contains no wit , only labored gags01
2that loves its characters and communicates som...12
3remains utterly satisfied to remain the same t...03
4on the worst revenge-of-the-nerds clichés the ...04
\n", "
" ], "text/plain": [ " sentence label idx\n", "0 hide new secretions from the parental units 0 0\n", "1 contains no wit , only labored gags 0 1\n", "2 that loves its characters and communicates som... 1 2\n", "3 remains utterly satisfied to remain the same t... 0 3\n", "4 on the worst revenge-of-the-nerds clichés the ... 0 4" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_dataset.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Separate the data into X and y:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "custom_sent_keys = [\"sentence\"] # specify the column names of the input sentences\n", "label_key = \"label\" # specify the column name of the label\n", "\n", "X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]\n", "X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]\n", "X_test = test_dataset[custom_sent_keys]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run FLAML" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.\n", " from pandas import MultiIndex, Int64Index\n", "2022-08-13 16:52:12,092\tINFO services.py:1470 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8265\u001b[39m\u001b[22m\n" ] } ], "source": [ "''' import AutoML class from flaml package '''\n", "from flaml import AutoML\n", "automl = AutoML()\n", "\n", "import ray\n", "if not ray.is_initialized():\n", " ray.init() " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "TIME_BUDGET=1800\n", "automl_settings = {\n", " \"time_budget\": TIME_BUDGET, # setting the time budget\n", " \"task\": \"seq-classification\", # setting the task as seq-classification\n", " \"fit_kwargs_by_estimator\": {\n", " \"transformer\": {\n", " \"output_dir\": \"data/output/\", # setting the output directory\n", " \"model_path\": \"google/electra-small-discriminator\", # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base\n", " }\n", " },\n", " \"gpu_per_trial\": 1, # set to 0 if no GPU is available\n", " \"log_file_name\": \"seqclass.log\", # set the file to save the log for HPO\n", " \"log_type\": \"all\", # the log type for trials: \"all\" if logging all the trials, \"better\" if only keeping the better trials\n", " \"use_ray\": {\"local_dir\": \"data/output/\"}, # set whether to use Ray\n", " \"n_concurrent_trials\": 4,\n", " \"keep_search_state\": True, # keeping the search state\n", "}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-08-13 17:22:28 (running for 00:30:10.04)
Memory usage on this node: 25.3/376.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/96 CPUs, 0/4 GPUs, 0.0/249.66 GiB heap, 0.0/110.99 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: ac41a40a with val_loss=0.07798165137614677 and parameters={'learning_rate': 3.864623804361677e-05, 'num_train_epochs': 3, 'per_device_train_batch_size': 32, 'seed': 24, 'global_max_steps': 9223372036854775807, 'learner': 'transformer', 'FLAML_sample_size': 67349}
Result logdir: /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17
Number of trials: 49/1000000 (49 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1493, 'learning_rate': 9.269681649947959e-06, 'epoch': 2.26}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'eval_loss': 0.3011920750141144, 'eval_automl_metric': 0.09633027522935778, 'eval_runtime': 6.9816, 'eval_samples_per_second': 124.899, 'eval_steps_per_second': 124.899, 'epoch': 2.0}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'loss': 0.1551, 'learning_rate': 3.2693767566133266e-05, 'epoch': 1.9}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.2014, 'learning_rate': 1.2879756568772649e-05, 'epoch': 1.43}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'loss': 0.1524, 'learning_rate': 7.4163905463114075e-06, 'epoch': 2.14}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'eval_loss': 0.3561520278453827, 'eval_automl_metric': 0.11238532110091748, 'eval_runtime': 7.1398, 'eval_samples_per_second': 122.133, 'eval_steps_per_second': 122.133, 'epoch': 2.0}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1492, 'learning_rate': 7.788901833662341e-06, 'epoch': 2.38}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'loss': 0.1292, 'learning_rate': 2.5632478674959775e-05, 'epoch': 2.14}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1936, 'learning_rate': 1.169158714360912e-05, 'epoch': 1.66}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'loss': 0.155, 'learning_rate': 5.373307751184297e-06, 'epoch': 2.38}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1405, 'learning_rate': 6.308122017376726e-06, 'epoch': 2.49}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'loss': 0.1078, 'learning_rate': 1.8571189783786284e-05, 'epoch': 2.38}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1975, 'learning_rate': 1.0503417718445592e-05, 'epoch': 1.9}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1215, 'learning_rate': 4.827342201091109e-06, 'epoch': 2.61}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'loss': 0.1437, 'learning_rate': 3.330224956057188e-06, 'epoch': 2.61}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1698, 'learning_rate': 9.315248293282063e-06, 'epoch': 2.14}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'loss': 0.1083, 'learning_rate': 1.1509900892612791e-05, 'epoch': 2.61}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1355, 'learning_rate': 3.346562384805493e-06, 'epoch': 2.73}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'loss': 0.1428, 'learning_rate': 1.287142160930079e-06, 'epoch': 2.85}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'loss': 0.106, 'learning_rate': 4.4486120014393e-06, 'epoch': 2.85}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1661, 'learning_rate': 8.127078868118535e-06, 'epoch': 2.38}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'eval_loss': 0.2953557074069977, 'eval_automl_metric': 0.0905963302752294, 'eval_runtime': 7.0221, 'eval_samples_per_second': 124.18, 'eval_steps_per_second': 124.18, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.1445, 'learning_rate': 1.8657825685198765e-06, 'epoch': 2.85}\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m {'train_runtime': 416.8987, 'train_samples_per_second': 484.643, 'train_steps_per_second': 15.148, 'train_loss': 0.2114269960332464, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m Num examples = 872\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m Batch size = 1\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_80bc1972_47_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-17-39/checkpoint-6315/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_80bc1972_47_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-17-39/checkpoint-6315/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_80bc1972_47_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-17-39/checkpoint-6315/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_80bc1972_47_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-17-39/checkpoint-6315/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50106)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_80bc1972_47_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-17-39/checkpoint-6315/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'eval_loss': 0.3420582413673401, 'eval_automl_metric': 0.09633027522935778, 'eval_runtime': 8.4313, 'eval_samples_per_second': 103.424, 'eval_steps_per_second': 103.424, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m {'train_runtime': 404.0512, 'train_samples_per_second': 500.053, 'train_steps_per_second': 15.629, 'train_loss': 0.18374049291663574, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m Num examples = 872\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1614, 'learning_rate': 6.938909442955006e-06, 'epoch': 2.61}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_918542ec_48_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train__2022-08-13_17-18-07/checkpoint-6315/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_918542ec_48_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train__2022-08-13_17-18-07/checkpoint-6315/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_918542ec_48_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train__2022-08-13_17-18-07/checkpoint-6315/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_918542ec_48_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train__2022-08-13_17-18-07/checkpoint-6315/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50265)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_918542ec_48_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train__2022-08-13_17-18-07/checkpoint-6315/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'loss': 0.136, 'learning_rate': 3.8500275223426025e-07, 'epoch': 2.97}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'eval_loss': 0.3623480498790741, 'eval_automl_metric': 0.08600917431192656, 'eval_runtime': 6.9268, 'eval_samples_per_second': 125.888, 'eval_steps_per_second': 125.888, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m {'train_runtime': 794.5838, 'train_samples_per_second': 254.28, 'train_steps_per_second': 15.895, 'train_loss': 0.20292378529045002, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m Num examples = 872\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1528, 'learning_rate': 5.750740017791478e-06, 'epoch': 2.85}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_bbde8ec8_45_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-12-09/checkpoint-12630/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_bbde8ec8_45_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-12-09/checkpoint-12630/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_bbde8ec8_45_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-12-09/checkpoint-12630/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_bbde8ec8_45_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-12-09/checkpoint-12630/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=49559)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_bbde8ec8_45_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-12-09/checkpoint-12630/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'eval_loss': 0.29546257853507996, 'eval_automl_metric': 0.08830275229357798, 'eval_runtime': 7.0421, 'eval_samples_per_second': 123.827, 'eval_steps_per_second': 123.827, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1462, 'learning_rate': 4.56257059262795e-06, 'epoch': 3.09}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.1474, 'learning_rate': 3.374401167464421e-06, 'epoch': 3.33}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.134, 'learning_rate': 2.1862317423008924e-06, 'epoch': 3.56}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'loss': 0.138, 'learning_rate': 9.98062317137364e-07, 'epoch': 3.8}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'eval_loss': 0.32222986221313477, 'eval_automl_metric': 0.09518348623853212, 'eval_runtime': 7.0713, 'eval_samples_per_second': 123.315, 'eval_steps_per_second': 123.315, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m {'train_runtime': 539.1601, 'train_samples_per_second': 499.659, 'train_steps_per_second': 15.617, 'train_loss': 0.2059766846428008, 'epoch': 4.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m Num examples = 872\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m Batch size = 1\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_b7e08512_49_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-19-11/checkpoint-6315/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_b7e08512_49_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-19-11/checkpoint-6315/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_b7e08512_49_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-19-11/checkpoint-6315/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_b7e08512_49_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-19-11/checkpoint-6315/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50443)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-08-13_16-52-17/train_b7e08512_49_FLAML_sample_size=67349,global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train__2022-08-13_17-19-11/checkpoint-6315/tokenizer_config.json\n", "2022-08-13 17:28:47,144\tINFO tune.py:747 -- Total run time: 2189.40 seconds (1803.73 seconds for the tuning loop).\n", "[flaml.automl: 08-13 17:28:52] {3322} INFO - selected model: None\n", "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'loss': 0.417, 'learning_rate': 3.55863617139559e-05, 'epoch': 0.24}\n", "{'loss': 0.2872, 'learning_rate': 3.252648538429503e-05, 'epoch': 0.48}\n", "{'loss': 0.2452, 'learning_rate': 2.9466609054634162e-05, 'epoch': 0.71}\n", "{'loss': 0.2204, 'learning_rate': 2.6406732724973297e-05, 'epoch': 0.95}\n", "{'loss': 0.1823, 'learning_rate': 2.334685639531243e-05, 'epoch': 1.19}\n", "{'loss': 0.1743, 'learning_rate': 2.0286980065651558e-05, 'epoch': 1.43}\n", "{'loss': 0.1662, 'learning_rate': 1.722710373599069e-05, 'epoch': 1.66}\n", "{'loss': 0.1674, 'learning_rate': 1.416722740632982e-05, 'epoch': 1.9}\n", "{'loss': 0.1422, 'learning_rate': 1.1107351076668952e-05, 'epoch': 2.14}\n", "{'loss': 0.1271, 'learning_rate': 8.047474747008085e-06, 'epoch': 2.38}\n", "{'loss': 0.123, 'learning_rate': 4.987598417347216e-06, 'epoch': 2.61}\n", "{'loss': 0.1209, 'learning_rate': 1.927722087686347e-06, 'epoch': 2.85}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[flaml.automl: 08-13 17:36:12] {3465} INFO - retrain transformer for 440.2s\n", "[flaml.automl: 08-13 17:36:12] {3472} INFO - retrained model: None\n", "[flaml.automl: 08-13 17:36:12] {2749} INFO - fit succeeded\n", "[flaml.automl: 08-13 17:36:12] {2750} INFO - Time taken to find the best model: 1610.794395685196\n", "[flaml.automl: 08-13 17:36:12] {2761} WARNING - Time taken to find the best model is 89% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'train_runtime': 415.3339, 'train_samples_per_second': 486.469, 'train_steps_per_second': 15.205, 'train_loss': 0.194211708848097, 'epoch': 3.0}\n" ] } ], "source": [ "'''The main flaml automl API'''\n", "automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The best loss by FLAML: 0.07798165137614677\n" ] } ], "source": [ "print(\"The best loss by FLAML: {}\".format(automl.best_loss))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Best model and metric" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Best hyperparmeter config: {'learning_rate': 1.4736175808553141e-05, 'num_train_epochs': 7.623375372739029, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.21605876280261357, 'weight_decay': 0.11938244526496489, 'adam_epsilon': 7.353322403647365e-07, 'seed': 42, 'global_max_steps': 1878, 'learner': 'transformer'}\n", "Best accuracy on validation data: 0.9404\n", "Training duration of best run: 157.7 s\n" ] } ], "source": [ "'''retrieve best config and best learner'''\n", "print('Best hyperparmeter config:', automl.best_config)\n", "print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))\n", "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'loss': 0.528, 'learning_rate': 8.898933352349567e-06, 'epoch': 1.0}\n", "{'eval_loss': 0.2549280524253845, 'eval_automl_metric': 0.08600917431192656, 'eval_runtime': 1.0003, 'eval_samples_per_second': 871.751, 'eval_steps_per_second': 54.984, 'epoch': 1.0}\n", "{'loss': 0.2278, 'learning_rate': 1.3880017803076292e-05, 'epoch': 2.0}\n", "{'eval_loss': 0.24966619908809662, 'eval_automl_metric': 0.06766055045871555, 'eval_runtime': 1.0201, 'eval_samples_per_second': 854.778, 'eval_steps_per_second': 53.914, 'epoch': 2.0}\n", "{'loss': 0.1455, 'learning_rate': 1.1410179501562432e-05, 'epoch': 3.0}\n", "{'eval_loss': 0.23046882450580597, 'eval_automl_metric': 0.059633027522935755, 'eval_runtime': 1.0097, 'eval_samples_per_second': 863.6, 'eval_steps_per_second': 54.47, 'epoch': 3.0}\n", "{'eval_loss': 0.23046882450580597, 'eval_automl_metric': 0.059633027522935755, 'eval_runtime': 0.9726, 'eval_samples_per_second': 896.568, 'eval_steps_per_second': 56.55, 'epoch': 3.0}\n", "{'train_runtime': 146.7879, 'train_samples_per_second': 519.346, 'train_steps_per_second': 32.462, 'train_loss': 0.30043953600021217, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using amp half precision backend\n", "***** Running Prediction *****\n", " Num examples = 872\n", " Batch size = 64\n" ] } ], "source": [ "import pickle\n", "automl.pickle(\"automl.pkl\")\n", "\n", "with open(\"automl.pkl\", \"rb\") as f:\n", " automl = pickle.load(f)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using amp half precision backend\n", "***** Running Prediction *****\n", " Num examples = 1821\n", " Batch size = 4\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Predicted labels [0 0 1 ... 1 1 1]\n" ] } ], "source": [ "'''compute predictions of testing dataset''' \n", "y_pred = automl.predict(X_test, **{\"per_device_eval_batch_size\": 1})\n", "print('Predicted labels', y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Log history" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.4211966209891772e-05, 'num_train_epochs': 0.22667066737595704, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.0638407085008166, 'weight_decay': 0.24365576482793252, 'adam_epsilon': 1.2017005181798623e-08, 'seed': 44, 'global_max_steps': 567, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.00010969207855501012, 'num_train_epochs': 0.5341197058945968, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.00727467370977386, 'weight_decay': 0.2998042286577564, 'adam_epsilon': 2.5359832189101376e-08, 'seed': 44, 'global_max_steps': 168, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00010969207855501012, 'num_train_epochs': 0.5341197058945968, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.00727467370977386, 'weight_decay': 0.2998042286577564, 'adam_epsilon': 2.5359832189101376e-08, 'seed': 44, 'global_max_steps': 168, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 6.105850155006292e-06, 'num_train_epochs': 0.5203392488419986, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.02982185620209511, 'weight_decay': 0.06967105125024514, 'adam_epsilon': 2.8808753274240916e-07, 'seed': 42, 'global_max_steps': 163, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00010969207855501012, 'num_train_epochs': 0.5341197058945968, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.00727467370977386, 'weight_decay': 0.2998042286577564, 'adam_epsilon': 2.5359832189101376e-08, 'seed': 44, 'global_max_steps': 168, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 9.029571768192494e-05, 'num_train_epochs': 0.23081935882872334, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.05927331754619176, 'weight_decay': 0.06990187648448716, 'adam_epsilon': 2.6275569358808137e-08, 'seed': 41, 'global_max_steps': 145, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00010969207855501012, 'num_train_epochs': 0.5341197058945968, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.00727467370977386, 'weight_decay': 0.2998042286577564, 'adam_epsilon': 2.5359832189101376e-08, 'seed': 44, 'global_max_steps': 168, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.00010208514406839324, 'num_train_epochs': 0.24916036398153807, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.28313267937361647, 'adam_epsilon': 1.5042842082223077e-08, 'seed': 44, 'global_max_steps': 78, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00010969207855501012, 'num_train_epochs': 0.5341197058945968, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.00727467370977386, 'weight_decay': 0.2998042286577564, 'adam_epsilon': 2.5359832189101376e-08, 'seed': 44, 'global_max_steps': 168, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.00011786584823407102, 'num_train_epochs': 1.1449809097488275, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.015398649752204965, 'weight_decay': 0.3, 'adam_epsilon': 4.275263179285732e-08, 'seed': 43, 'global_max_steps': 313, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00011786584823407102, 'num_train_epochs': 1.1449809097488275, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.015398649752204965, 'weight_decay': 0.3, 'adam_epsilon': 4.275263179285732e-08, 'seed': 43, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 3.59933640287326e-05, 'num_train_epochs': 1.7937390219164033, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.016745580745240057, 'weight_decay': 0.2892279950527897, 'adam_epsilon': 6.514301011066509e-08, 'seed': 44, 'global_max_steps': 313, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 3.59933640287326e-05, 'num_train_epochs': 1.7937390219164033, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.016745580745240057, 'weight_decay': 0.2892279950527897, 'adam_epsilon': 6.514301011066509e-08, 'seed': 44, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.4736175808553141e-05, 'num_train_epochs': 7.623375372739029, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.21605876280261357, 'weight_decay': 0.11938244526496489, 'adam_epsilon': 7.353322403647365e-07, 'seed': 42, 'global_max_steps': 1878, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.4736175808553141e-05, 'num_train_epochs': 7.623375372739029, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.21605876280261357, 'weight_decay': 0.11938244526496489, 'adam_epsilon': 7.353322403647365e-07, 'seed': 42, 'global_max_steps': 1878, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.0001178658482340711, 'num_train_epochs': 1.1449809097488262, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.015398649752204969, 'weight_decay': 0.3, 'adam_epsilon': 4.275263179285729e-08, 'seed': 43, 'global_max_steps': 313, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.4736175808553141e-05, 'num_train_epochs': 7.623375372739029, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.21605876280261357, 'weight_decay': 0.11938244526496489, 'adam_epsilon': 7.353322403647365e-07, 'seed': 42, 'global_max_steps': 1878, 'learner': 'transformer'}}\n" ] } ], "source": [ "from flaml.data import get_output_from_log\n", "time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \\\n", " get_output_from_log(filename=automl_settings['log_file_name'], time_budget=3000)\n", "for config in config_history:\n", " print(config)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "plt.title('Learning Curve')\n", "plt.xlabel('Wall Clock Time (s)')\n", "plt.ylabel('Validation Accuracy')\n", "print(len(valid_loss_history))\n", "plt.scatter(time_history, 1 - np.array(valid_loss_history))\n", "plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Spooky-author-identification example" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-07-22 07:20:52 (running for 00:30:13.14)
Memory usage on this node: 19.0/376.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/252.65 GiB heap, 0.0/112.27 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: 504afb96 with val_loss=0.11011235955056176 and parameters={'learning_rate': 4.567279255636039e-05, 'num_train_epochs': 5, 'per_device_train_batch_size': 32, 'seed': 37, 'global_max_steps': 9223372036854775807, 'learner': 'transformer'}
Result logdir: /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38
Number of trials: 12/1000000 (12 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m {'train_runtime': 675.882, 'train_samples_per_second': 108.628, 'train_steps_per_second': 6.791, 'train_loss': 0.15587145374491324, 'epoch': 5.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m {'eval_loss': 0.47973549365997314, 'eval_automl_metric': 0.12134831460674156, 'eval_runtime': 42.4421, 'eval_samples_per_second': 115.334, 'eval_steps_per_second': 115.334, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m {'train_runtime': 482.2078, 'train_samples_per_second': 121.806, 'train_steps_per_second': 1.908, 'train_loss': 0.23729169679724652, 'epoch': 4.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.1844, 'learning_rate': 2.7020242630660653e-06, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_e041e7d6_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=5,per_device_trai_2022-07-22_07-09-19/checkpoint-4590/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_e041e7d6_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=5,per_device_trai_2022-07-22_07-09-19/checkpoint-4590/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_e041e7d6_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=5,per_device_trai_2022-07-22_07-09-19/checkpoint-4590/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_e041e7d6_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=5,per_device_trai_2022-07-22_07-09-19/checkpoint-4590/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=78542)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_e041e7d6_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=5,per_device_trai_2022-07-22_07-09-19/checkpoint-4590/tokenizer_config.json\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_567b6878_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-22_07-12-37/checkpoint-920/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_567b6878_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-22_07-12-37/checkpoint-920/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_567b6878_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-22_07-12-37/checkpoint-920/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_567b6878_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-22_07-12-37/checkpoint-920/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=78793)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_567b6878_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-22_07-12-37/checkpoint-920/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'eval_loss': 0.7813186049461365, 'eval_automl_metric': 0.13564862104187947, 'eval_runtime': 38.0461, 'eval_samples_per_second': 128.66, 'eval_steps_per_second': 128.66, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0829, 'learning_rate': 2.3353000145500413e-06, 'epoch': 3.13}\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m {'eval_loss': 0.5877022743225098, 'eval_automl_metric': 0.11848825331971402, 'eval_runtime': 40.6459, 'eval_samples_per_second': 120.43, 'eval_steps_per_second': 120.43, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0714, 'learning_rate': 1.9685757660340173e-06, 'epoch': 3.27}\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m {'loss': 0.0237, 'learning_rate': 3.91430499829805e-06, 'epoch': 4.36}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0584, 'learning_rate': 1.6018515175179931e-06, 'epoch': 3.41}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0836, 'learning_rate': 1.235127269001969e-06, 'epoch': 3.54}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0335, 'learning_rate': 8.684030204859451e-07, 'epoch': 3.68}\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m {'eval_loss': 0.6322397589683533, 'eval_automl_metric': 0.11664964249233911, 'eval_runtime': 39.775, 'eval_samples_per_second': 123.067, 'eval_steps_per_second': 123.067, 'epoch': 5.0}\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m {'train_runtime': 607.6324, 'train_samples_per_second': 120.83, 'train_steps_per_second': 3.777, 'train_loss': 0.17213530145699163, 'epoch': 5.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0562, 'learning_rate': 5.01678771969921e-07, 'epoch': 3.81}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_9a34cd4a_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-22_07-14-32/checkpoint-2295/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_9a34cd4a_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-22_07-14-32/checkpoint-2295/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_9a34cd4a_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-22_07-14-32/checkpoint-2295/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_9a34cd4a_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-22_07-14-32/checkpoint-2295/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=79025)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_9a34cd4a_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-22_07-14-32/checkpoint-2295/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'loss': 0.0527, 'learning_rate': 1.3495452345389687e-07, 'epoch': 3.95}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'eval_loss': 0.8104404211044312, 'eval_automl_metric': 0.12625127681307458, 'eval_runtime': 37.4885, 'eval_samples_per_second': 130.573, 'eval_steps_per_second': 130.573, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m {'train_runtime': 1051.1951, 'train_samples_per_second': 55.875, 'train_steps_per_second': 13.969, 'train_loss': 0.2927411694969607, 'epoch': 4.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `BertForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m Batch size = 1\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_d0e0b7d6_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_train_2022-07-22_07-08-52/checkpoint-14684/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_d0e0b7d6_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_train_2022-07-22_07-08-52/checkpoint-14684/vocab.txt\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_d0e0b7d6_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_train_2022-07-22_07-08-52/checkpoint-14684/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_d0e0b7d6_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_train_2022-07-22_07-08-52/checkpoint-14684/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=78225)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-22_06-50-38/train_d0e0b7d6_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_train_2022-07-22_07-08-52/checkpoint-14684/tokenizer_config.json\n", "2022-07-22 07:27:20,126\tINFO tune.py:747 -- Total run time: 2201.85 seconds (1801.97 seconds for the tuning loop).\n", "[flaml.automl: 07-22 07:27:25] {3314} INFO - selected model: None\n", "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'loss': 0.2317, 'learning_rate': 5.95732076822092e-06, 'epoch': 4.35}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[flaml.automl: 07-22 07:35:19] {3457} INFO - retrain transformer for 474.4s\n", "[flaml.automl: 07-22 07:35:19] {3464} INFO - retrained model: None\n", "[flaml.automl: 07-22 07:35:19] {2742} INFO - fit succeeded\n", "[flaml.automl: 07-22 07:35:19] {2743} INFO - Time taken to find the best model: 1118.247492313385\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'train_runtime': 463.5873, 'train_samples_per_second': 158.374, 'train_steps_per_second': 1.24, 'train_loss': 0.20362980179164722, 'epoch': 5.0}\n" ] } ], "source": [ "from flaml import AutoML\n", "import ray\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "ray.init(num_cpus=4, num_gpus=4, ignore_reinit_error=True)\n", "\n", "df = pd.read_csv('/data/xliu127/projects/hyperopt/FLAML/data/spooky-author-identification.csv')\n", "X, y = df.drop('author', axis=1), df['author']\n", "\n", "X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=123)\n", "\n", "\n", "print(len(X_train), len(X_val))\n", "automl_model = AutoML()\n", "\n", "automl_settings = {\n", " \"time_budget\": 1800, \n", " \"task\": \"seq-classification\", \n", " \"fit_kwargs_by_estimator\": {\n", " \"transformer\": {\n", " \"output_dir\": \"data/output/\", \n", " \"model_path\": \"bert-base-uncased\", \n", " }\n", " },\n", " \"metric\": \"accuracy\",\n", " \"gpu_per_trial\": 1, \n", " \"log_file_name\": \"spooky_bert.log\", \n", " \"log_type\": \"all\", \n", " \"use_ray\": {\"local_dir\": \"data/output/\"}, # set whether to use Ray\n", " \"n_concurrent_trials\": 4,\n", " \"keep_search_state\": True, # keeping the search state\n", "}\n", "\n", "automl_model.fit(X_train=X_train, y_train=y_train,X_val=X_val, y_val=y_val, **automl_settings)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the best loss for spooky author identification: 0.11133810010214507\n" ] } ], "source": [ "print(\"the best loss for spooky author identification: {}\".format(automl_model.best_loss))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-07-21 21:21:15 (running for 00:30:10.30)
Memory usage on this node: 20.5/376.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/4 GPUs, 0.0/252.62 GiB heap, 0.0/112.26 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: 84d3be85 with val_loss=0.12951991828396325 and parameters={'learning_rate': 4.486769916716146e-05, 'num_train_epochs': 4, 'per_device_train_batch_size': 8, 'seed': 28, 'global_max_steps': 9223372036854775807, 'learner': 'transformer'}
Result logdir: /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05
Number of trials: 12/1000000 (12 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m {'eval_loss': 0.7418951392173767, 'eval_automl_metric': 0.1284984678243105, 'eval_runtime': 37.3935, 'eval_samples_per_second': 130.905, 'eval_steps_per_second': 130.905, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m {'train_runtime': 565.7729, 'train_samples_per_second': 103.816, 'train_steps_per_second': 6.49, 'train_loss': 0.2802804773409642, 'epoch': 4.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m {'eval_loss': 1.0893423557281494, 'eval_automl_metric': 0.6024514811031665, 'eval_runtime': 39.7178, 'eval_samples_per_second': 123.245, 'eval_steps_per_second': 123.245, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.2369, 'learning_rate': 1.4090340380281214e-05, 'epoch': 2.72}\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m {'train_runtime': 566.9953, 'train_samples_per_second': 77.694, 'train_steps_per_second': 9.714, 'train_loss': 1.0928592581461545, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'eval_loss': 1.092341661453247, 'eval_automl_metric': 0.6024514811031665, 'eval_runtime': 38.0057, 'eval_samples_per_second': 128.797, 'eval_steps_per_second': 128.797, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m Batch size = 1\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/vocab.json\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/merges.txt\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50245)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_60247332_10_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=4,per_device_trai_2022-07-21_21-11-36/checkpoint-3672/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0896, 'learning_rate': 1.5104688589428795e-05, 'epoch': 3.13}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/vocab.json\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/merges.txt\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50412)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_6861ba34_11_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=3,per_device_trai_2022-07-21_21-11-51/checkpoint-3672/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.2195, 'learning_rate': 1.2404892966371977e-05, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0907, 'learning_rate': 1.2732721160184323e-05, 'epoch': 3.27}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.1252, 'learning_rate': 1.0719445552462741e-05, 'epoch': 3.27}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0926, 'learning_rate': 1.0360753730939852e-05, 'epoch': 3.41}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.1093, 'learning_rate': 9.033998138553504e-06, 'epoch': 3.54}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0908, 'learning_rate': 7.988786301695379e-06, 'epoch': 3.54}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.1166, 'learning_rate': 7.348550724644269e-06, 'epoch': 3.81}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0899, 'learning_rate': 5.616818872450909e-06, 'epoch': 3.68}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0923, 'learning_rate': 3.244851443206437e-06, 'epoch': 3.81}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'eval_loss': 0.7831101417541504, 'eval_automl_metric': 0.13462717058222673, 'eval_runtime': 37.9679, 'eval_samples_per_second': 128.925, 'eval_steps_per_second': 128.925, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'loss': 1.0862, 'learning_rate': 8.728840139619655e-07, 'epoch': 3.95}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.1164, 'learning_rate': 5.663103310735033e-06, 'epoch': 4.08}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'eval_loss': 1.0893481969833374, 'eval_automl_metric': 0.6024514811031665, 'eval_runtime': 36.2865, 'eval_samples_per_second': 134.899, 'eval_steps_per_second': 134.899, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m {'train_runtime': 1069.9104, 'train_samples_per_second': 54.898, 'train_steps_per_second': 13.725, 'train_loss': 1.0960205283875493, 'epoch': 4.0}\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m \n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m Batch size = 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.0542, 'learning_rate': 3.977655896825797e-06, 'epoch': 4.36}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/vocab.json\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/merges.txt\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=49988)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_ebe7d3ee_9_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0001,num_train_epochs=4,per_device_train_2022-07-21_21-08-22/checkpoint-11013/tokenizer_config.json\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.0618, 'learning_rate': 2.2922084829165607e-06, 'epoch': 4.63}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'loss': 0.0494, 'learning_rate': 6.06761069007325e-07, 'epoch': 4.9}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'eval_loss': 0.88468998670578, 'eval_automl_metric': 0.12972420837589382, 'eval_runtime': 37.9519, 'eval_samples_per_second': 128.979, 'eval_steps_per_second': 128.979, 'epoch': 5.0}\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m {'train_runtime': 873.0679, 'train_samples_per_second': 84.094, 'train_steps_per_second': 10.515, 'train_loss': 0.27977710040306475, 'epoch': 5.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: __index_level_0__. If __index_level_0__ are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m Num examples = 4895\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m Batch size = 1\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m Didn't find file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/added_tokens.json. We won't load it.\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/vocab.json\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/merges.txt\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/tokenizer.json\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file None\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/special_tokens_map.json\n", "\u001b[2m\u001b[36m(train pid=50658)\u001b[0m loading file /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-07-21_20-51-05/train_bd71ed64_12_global_max_steps=9223372036854775807,learner=transformer,learning_rate=0.0000,num_train_epochs=5,per_device_trai_2022-07-21_21-14-13/checkpoint-9180/tokenizer_config.json\n", "2022-07-21 21:29:43,228\tINFO tune.py:747 -- Total run time: 2317.81 seconds (1801.93 seconds for the tuning loop).\n", "[flaml.automl: 07-21 21:29:46] {3314} INFO - selected model: None\n", "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'loss': 0.5742, 'learning_rate': 3.264882684494973e-05, 'epoch': 1.09}\n" ] } ], "source": [ "automl_settings[\"fit_kwargs_by_estimator\"][\"transformer\"][\"model_path\"] = \"roberta-base\"\n", "automl_settings[\"log_file_name\"] = \"spooky_roberta.log\"\n", "automl_model.fit(X_train=X_train, y_train=y_train,X_val=X_val, y_val=y_val, **automl_settings)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8\n", "8\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from flaml.data import get_output_from_log\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "\n", "for each_file_name in ['bert', 'roberta']:\n", " time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \\\n", " get_output_from_log(filename='spooky_' + each_file_name + '.log', time_budget=3000)\n", " print(len(valid_loss_history))\n", " plt.scatter(time_history, 1 - np.array(valid_loss_history))\n", " plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n", "\n", "plt.legend(['bert', 'roberta'])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Other Tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides sequence classification, FLAML currently also supports four other tasks (more tasks are to be supported, which can be found on FLAML's documentation website https://microsoft.github.io/FLAML/docs/Examples/AutoML-NLP):\n", "\n", "- sequence regression: predicting a float number from the input sequence, e.g., predicting the rating of a hotel review based on the text content;\n", "- token classification: predicting the label of each token in a sequence, e.g., named entity recognition;\n", "- multiple choice: predicting the best second half of a sentence that comes next to the first part of a sentence based on common sensen reasoning. An example is seen below;\n", "- (abstractive) summarization: generating the textual summarization of an input paragraph;\n", "\n", "For each task, you only have to change the \"Load data and preprocess\" with the corresponding data loading process. For example:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Multiple Choice Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multiple choice is a task of predicting the best second half of a sentence that follows the first half based on common sense reasoning. An example of multiple-choice classification problem is:\n", "\n", "On stage, a woman takes a seat at the piano. She\n", "a) sits on a bench as her sister plays with the doll.\n", "b) smiles with someone as the music plays.\n", "c) is in the crowd, watching the dancers.\n", "d) *nervously sets her fingers on the keys*." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No config specified, defaulting to: swag/regular\n", "Reusing dataset swag (/home/xliu127/.cache/huggingface/datasets/swag/regular/0.0.0/9640de08cdba6a1469ed3834fcab4b8ad8e38caf5d1ba5e7436d8b1fd067ad4c)\n", "No config specified, defaulting to: swag/regular\n", "Reusing dataset swag (/home/xliu127/.cache/huggingface/datasets/swag/regular/0.0.0/9640de08cdba6a1469ed3834fcab4b8ad8e38caf5d1ba5e7436d8b1fd067ad4c)\n", "No config specified, defaulting to: swag/regular\n", "Reusing dataset swag (/home/xliu127/.cache/huggingface/datasets/swag/regular/0.0.0/9640de08cdba6a1469ed3834fcab4b8ad8e38caf5d1ba5e7436d8b1fd067ad4c)\n" ] } ], "source": [ "from datasets import load_dataset\n", "\n", "train_dataset = load_dataset(\"swag\", split=\"train\").to_pandas().iloc[:10000]\n", "dev_dataset = load_dataset(\"swag\", split=\"validation\").to_pandas().iloc[:10000]\n", "test_dataset = load_dataset(\"swag\", split=\"test\").to_pandas()\n", "\n", "custom_sent_keys = [\n", " \"sent1\",\n", " \"sent2\",\n", " \"ending0\",\n", " \"ending1\",\n", " \"ending2\",\n", " \"ending3\",\n", " \"gold-source\",\n", " \"video-id\",\n", " \"startphrase\",\n", " \"fold-ind\",\n", " ] # specify the column names of the input sentences\n", "label_key = \"label\" # specify the column name of the label\n", "\n", "X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]\n", "X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]\n", "X_test = test_dataset[custom_sent_keys]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Members of the procession walk down the street holding small horn brass instruments.'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_dataset.iloc[0][\"sent1\"]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-03-19 14:39:29 (running for 00:08:29.94)
Memory usage on this node: 33.0/376.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/96 CPUs, 0/4 GPUs, 0.0/250.17 GiB heap, 0.0/111.21 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: de45e672 with val_loss=0.18300000000000005 and parameters={'learning_rate': 6.104513714676502e-06, 'num_train_epochs': 2.3743291981165893, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.23610846764298543, 'weight_decay': 0.20205904544254147, 'adam_epsilon': 5.752964074991208e-08, 'seed': 41, 'global_max_steps': 9223372036854775807, 'learner': 'transformer'}
Result logdir: /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-03-19_14-30-59
Number of trials: 10/1000000 (10 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m {'eval_loss': 0.6315866112709045, 'eval_automl_metric': 0.18779999999999997, 'eval_runtime': 15.4883, 'eval_samples_per_second': 645.648, 'eval_steps_per_second': 40.353, 'epoch': 1.66}\n", "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m {'train_runtime': 190.7625, 'train_samples_per_second': 87.254, 'train_steps_per_second': 10.909, 'train_loss': 0.5091343906738046, 'epoch': 1.66}\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m {'eval_loss': 1.2118068933486938, 'eval_automl_metric': 0.2015, 'eval_runtime': 15.2585, 'eval_samples_per_second': 655.374, 'eval_steps_per_second': 40.961, 'epoch': 2.87}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForMultipleChoice.forward` and have been ignored: ending3, ending1, video-id, sent1, ending0, sent2, fold-ind, ending2, startphrase, gold-source. If ending3, ending1, video-id, sent1, ending0, sent2, fold-ind, ending2, startphrase, gold-source are not expected by `RobertaForMultipleChoice.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m Num examples = 10000\n", "\u001b[2m\u001b[36m(train pid=86157)\u001b[0m Batch size = 16\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m {'eval_loss': 1.2118068933486938, 'eval_automl_metric': 0.2015, 'eval_runtime': 15.1369, 'eval_samples_per_second': 660.639, 'eval_steps_per_second': 41.29, 'epoch': 2.87}\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m {'train_runtime': 546.3809, 'train_samples_per_second': 156.658, 'train_steps_per_second': 39.165, 'train_loss': 0.5030154804349909, 'epoch': 2.87}\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'loss': 0.4854, 'learning_rate': 1.3592147782116173e-06, 'epoch': 2.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForMultipleChoice.forward` and have been ignored: fold-ind, sent2, gold-source, ending1, startphrase, sent1, ending0, video-id, ending2, ending3. If fold-ind, sent2, gold-source, ending1, startphrase, sent1, ending0, video-id, ending2, ending3 are not expected by `RobertaForMultipleChoice.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m Num examples = 10000\n", "\u001b[2m\u001b[36m(train pid=86249)\u001b[0m Batch size = 16\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'eval_loss': 0.49709731340408325, 'eval_automl_metric': 0.17600000000000005, 'eval_runtime': 15.4983, 'eval_samples_per_second': 645.232, 'eval_steps_per_second': 40.327, 'epoch': 2.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:41:56,719\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'eval_loss': 0.5254333019256592, 'eval_automl_metric': 0.17800000000000005, 'eval_runtime': 15.45, 'eval_samples_per_second': 647.251, 'eval_steps_per_second': 40.453, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'loss': 0.3989, 'learning_rate': 3.8051750127352887e-07, 'epoch': 3.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:42:56,729\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'eval_loss': 0.5254867076873779, 'eval_automl_metric': 0.17789999999999995, 'eval_runtime': 15.424, 'eval_samples_per_second': 648.341, 'eval_steps_per_second': 40.521, 'epoch': 3.0}\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'eval_loss': 0.5332269072532654, 'eval_automl_metric': 0.17830000000000001, 'eval_runtime': 15.4452, 'eval_samples_per_second': 647.45, 'eval_steps_per_second': 40.466, 'epoch': 3.39}\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m {'train_runtime': 382.2827, 'train_samples_per_second': 88.597, 'train_steps_per_second': 11.076, 'train_loss': 0.5299136270370808, 'epoch': 3.39}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:43:56,739\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m The following columns in the test set don't have a corresponding argument in `RobertaForMultipleChoice.forward` and have been ignored: ending2, sent1, ending0, sent2, ending3, video-id, gold-source, ending1, startphrase, fold-ind. If ending2, sent1, ending0, sent2, ending3, video-id, gold-source, ending1, startphrase, fold-ind are not expected by `RobertaForMultipleChoice.forward`, you can safely ignore this message.\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m Num examples = 10000\n", "\u001b[2m\u001b[36m(train pid=86195)\u001b[0m Batch size = 16\n", "2022-03-19 14:44:14,271\tINFO tune.py:639 -- Total run time: 795.18 seconds (504.18 seconds for the tuning loop).\n", "[flaml.automl: 03-19 14:44:19] {2837} INFO - selected model: None\n", "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'loss': 0.6603, 'learning_rate': 4.631567529441369e-06, 'epoch': 1.0}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[flaml.automl: 03-19 14:46:08] {2947} INFO - retrain transformer for 109.2s\n", "[flaml.automl: 03-19 14:46:08] {2954} INFO - retrained model: None\n", "[flaml.automl: 03-19 14:46:08] {2283} INFO - fit succeeded\n", "[flaml.automl: 03-19 14:46:08] {2284} INFO - Time taken to find the best model: 319.927033662796\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'train_runtime': 96.899, 'train_samples_per_second': 245.031, 'train_steps_per_second': 30.63, 'train_loss': 0.6602518278346073, 'epoch': 1.0}\n" ] } ], "source": [ "''' import AutoML class from flaml package '''\n", "from flaml import AutoML\n", "automl = AutoML()\n", "\n", "if not ray.is_initialized():\n", " ray.init()\n", "\n", "automl_settings = {\n", " \"time_budget\": 500, # setting the time budget\n", " \"task\": \"multichoice-classification\", # setting the task as multiplechoice-classification\n", " \"fit_kwargs_by_estimator\": { # if model_path is not set, the default model is facebook/muppet-roberta-base: https://huggingface.co/facebook/muppet-roberta-base\n", " \"transformer\": {\n", " \"output_dir\": \"data/output/\", # setting the output directory\n", " \"per_device_eval_batch_size\": 16, # the batch size for validation (inference)\n", " }\n", " },\n", " \"gpu_per_trial\": 1, # set to 0 if no GPU is available\n", " \"log_file_name\": \"seqclass.log\", # set the file to save the log for HPO\n", " \"log_type\": \"all\", # the log type for trials: \"all\" if logging all the trials, \"better\" if only keeping the better trials\n", " \"use_ray\": {\"local_dir\": \"data/output/\"}, # set whether to use Ray\n", " \"n_concurrent_trials\": 4\n", "}\n", "\n", "'''The main flaml automl API'''\n", "automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.00021956991427751982, 'num_train_epochs': 0.3549576494055084, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.07425273520338253, 'weight_decay': 0.03879221030529465, 'adam_epsilon': 3.7880482987985576e-08, 'seed': 43, 'global_max_steps': 444, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 0.00021956991427751982, 'num_train_epochs': 0.3549576494055084, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.07425273520338253, 'weight_decay': 0.03879221030529465, 'adam_epsilon': 3.7880482987985576e-08, 'seed': 43, 'global_max_steps': 444, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.3241899893349513e-06, 'num_train_epochs': 0.4379128434860086, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.257055208282222, 'weight_decay': 0.012652183020312091, 'adam_epsilon': 1.0189125195705357e-07, 'seed': 43, 'global_max_steps': 274, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 0.0002562922748967212, 'num_train_epochs': 0.1802995999606059, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.1809477882684876, 'weight_decay': 0.10305626005953175, 'adam_epsilon': 5.536776887412208e-08, 'seed': 42, 'global_max_steps': 451, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 313, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 6.104513714676502e-06, 'num_train_epochs': 2.3743291981165893, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.23610846764298543, 'weight_decay': 0.20205904544254147, 'adam_epsilon': 5.752964074991208e-08, 'seed': 41, 'global_max_steps': 1251, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 6.104513714676502e-06, 'num_train_epochs': 2.3743291981165893, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.23610846764298543, 'weight_decay': 0.20205904544254147, 'adam_epsilon': 5.752964074991208e-08, 'seed': 41, 'global_max_steps': 1251, 'learner': 'transformer'}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 9.306519250357542e-06, 'num_train_epochs': 0.4664878701006166, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 5.931759315303309e-07, 'seed': 43, 'global_max_steps': 147, 'learner': 'transformer'}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 6.104513714676502e-06, 'num_train_epochs': 2.3743291981165893, 'per_device_train_batch_size': 8, 'warmup_ratio': 0.23610846764298543, 'weight_decay': 0.20205904544254147, 'adam_epsilon': 5.752964074991208e-08, 'seed': 41, 'global_max_steps': 1251, 'learner': 'transformer'}}\n", "6\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from flaml.data import get_output_from_log\n", "time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \\\n", " get_output_from_log(filename=automl_settings['log_file_name'], time_budget=3000)\n", "for config in config_history:\n", " print(config)\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "plt.title('Learning Curve')\n", "plt.xlabel('Wall Clock Time (s)')\n", "plt.ylabel('Validation Accuracy')\n", "print(len(valid_loss_history))\n", "plt.scatter(time_history, 1 - np.array(valid_loss_history))\n", "plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2 Text Summarization Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The text summarization task summarizes a long text into a short sentence. For example:\n", "\n", "- Document: Army explosives experts were called out to deal with a suspect package at the offices on the Newtownards Road on Friday night. Roads were sealed off and traffic diverted as a controlled explosion was carried out. The premises, used by East Belfast MP Naomi Long, have been targeted a number of times. Most recently, petrol bomb attacks were carried out on the offices on consecutive nights in April and May. The attacks began following a Belfast City Council vote in December 2012 restricting the flying of the union flag at the City Hall. Condemning the latest hoax, Alliance MLA Chris Lyttle said: \"It is a serious incident for the local area, it causes serious disruption, it puts people's lives at risk, it can prevent emergency services reaching the area. \"Ultimately we need people with information to share that with the police in order for them to do their job and bring these people to justice.\n", "\n", "- Summary: A suspicious package left outside an Alliance Party office in east Belfast has been declared a hoax.\n", "\n", "In this example, we use FLAML to perform *abstractive summarization* using the t5-small language model, i.e., the summary is generated word-by-word. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using custom data configuration default\n", "Reusing dataset xsum (/home/xliu127/.cache/huggingface/datasets/xsum/default/1.2.0/32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "204045\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using custom data configuration default\n", "Reusing dataset xsum (/home/xliu127/.cache/huggingface/datasets/xsum/default/1.2.0/32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)\n", "Using custom data configuration default\n", "Reusing dataset xsum (/home/xliu127/.cache/huggingface/datasets/xsum/default/1.2.0/32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)\n" ] } ], "source": [ "from datasets import load_dataset\n", "\n", "train_dataset = load_dataset(\"xsum\", split=\"train\").to_pandas()\n", "print(len(train_dataset))\n", "dev_dataset = load_dataset(\"xsum\", split=\"validation\").to_pandas()\n", "test_dataset = load_dataset(\"xsum\", split=\"test\").to_pandas()\n", "\n", "custom_sent_keys = [\"document\"] # specify the column names of the input sentences\n", "label_key = \"summary\" # specify the column name of the label \n", "\n", "X_train, y_train = train_dataset[custom_sent_keys], train_dataset[label_key]\n", "X_val, y_val = dev_dataset[custom_sent_keys], dev_dataset[label_key]\n", "X_test = test_dataset[custom_sent_keys]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "== Status ==
Current time: 2022-03-19 14:55:00 (running for 00:08:31.38)
Memory usage on this node: 23.1/376.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/96 CPUs, 0/4 GPUs, 0.0/250.17 GiB heap, 0.0/111.21 GiB objects (0.0/1.0 accelerator_type:V100)
Current best trial: 08b6571c with val_loss=0.8569452656271894 and parameters={'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 9223372036854775807, 'learner': 'transformer', 'FLAML_sample_size': 10000}
Result logdir: /data/xliu127/projects/hyperopt/FLAML/notebook/data/output/train_2022-03-19_14-46-29
Number of trials: 8/1000000 (8 TERMINATED)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m {'loss': 8.7635, 'learning_rate': 1.2308416834153697e-05, 'epoch': 0.11}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m warnings.warn(\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m warnings.warn(\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m /data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m warnings.warn(\n", "2022-03-19 14:56:00,679\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m {'eval_loss': 6.893245697021484, 'eval_automl_metric': 0.8537338408275918, 'eval_runtime': 102.2734, 'eval_samples_per_second': 110.801, 'eval_steps_per_second': 6.932, 'epoch': 0.11}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:57:00,687\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m {'eval_loss': 7.381210803985596, 'eval_automl_metric': 0.8475751825208984, 'eval_runtime': 107.4032, 'eval_samples_per_second': 105.509, 'eval_steps_per_second': 6.601, 'epoch': 0.16}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m {'eval_loss': 10.150897979736328, 'eval_automl_metric': 0.8566791839938478, 'eval_runtime': 108.2143, 'eval_samples_per_second': 104.718, 'eval_steps_per_second': 6.552, 'epoch': 0.36}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:58:00,697\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m {'eval_loss': 11.665904998779297, 'eval_automl_metric': 0.858011676038827, 'eval_runtime': 109.4667, 'eval_samples_per_second': 103.52, 'eval_steps_per_second': 6.477, 'epoch': 0.38}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m {'eval_loss': 6.893245697021484, 'eval_automl_metric': 0.8537338408275918, 'eval_runtime': 110.7246, 'eval_samples_per_second': 102.344, 'eval_steps_per_second': 6.403, 'epoch': 0.11}\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m {'train_runtime': 220.8946, 'train_samples_per_second': 4.648, 'train_steps_per_second': 0.149, 'train_loss': 8.763471198804451, 'epoch': 0.11}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 14:59:00,706\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m Num examples = 11332\n", "\u001b[2m\u001b[36m(train pid=86232)\u001b[0m Batch size = 16\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m {'eval_loss': 7.381210803985596, 'eval_automl_metric': 0.8475751825208984, 'eval_runtime': 109.1975, 'eval_samples_per_second': 103.775, 'eval_steps_per_second': 6.493, 'epoch': 0.16}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m [nltk_data] Downloading package punkt to /home/xliu127/nltk_data...\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m [nltk_data] Package punkt is already up-to-date!\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m {'train_runtime': 232.9303, 'train_samples_per_second': 10.067, 'train_steps_per_second': 1.262, 'train_loss': 9.880440506280637, 'epoch': 0.16}\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m {'eval_loss': 10.150897979736328, 'eval_automl_metric': 0.8566791839938478, 'eval_runtime': 108.3182, 'eval_samples_per_second': 104.618, 'eval_steps_per_second': 6.546, 'epoch': 0.36}\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m {'train_runtime': 232.4568, 'train_samples_per_second': 92.218, 'train_steps_per_second': 2.887, 'train_loss': 11.215172903878349, 'epoch': 0.36}\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m {'eval_loss': 11.665904998779297, 'eval_automl_metric': 0.858011676038827, 'eval_runtime': 110.526, 'eval_samples_per_second': 102.528, 'eval_steps_per_second': 6.415, 'epoch': 0.38}\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m {'train_runtime': 236.6253, 'train_samples_per_second': 19.714, 'train_steps_per_second': 0.621, 'train_loss': 11.549961930614407, 'epoch': 0.38}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-03-19 15:00:00,942\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m Num examples = 11332\n", "\u001b[2m\u001b[36m(train pid=86184)\u001b[0m Batch size = 16\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m Num examples = 11332\n", "\u001b[2m\u001b[36m(train pid=86160)\u001b[0m Batch size = 16\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m Using amp half precision backend\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m ***** Running Prediction *****\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m Num examples = 11332\n", "\u001b[2m\u001b[36m(train pid=86225)\u001b[0m Batch size = 16\n", "2022-03-19 15:01:00,948\tWARNING ray_trial_executor.py:146 -- Skipping cleanup - trainable.stop did not return in time. Consider making `stop` a faster operation.\n", "2022-03-19 15:02:20,150\tINFO tune.py:639 -- Total run time: 950.87 seconds (500.36 seconds for the tuning loop).\n", "[flaml.automl: 03-19 15:02:25] {2837} INFO - selected model: None\n", "/data/installation/anaconda3/envs/tmp/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n", " warnings.warn(\n", "[flaml.automl: 03-19 15:14:54] {2947} INFO - retrain transformer for 748.2s\n", "[flaml.automl: 03-19 15:14:54] {2954} INFO - retrained model: None\n", "[flaml.automl: 03-19 15:14:54] {2283} INFO - fit succeeded\n", "[flaml.automl: 03-19 15:14:54] {2284} INFO - Time taken to find the best model: 472.3055913448334\n", "[flaml.automl: 03-19 15:14:54] {2295} WARNING - Time taken to find the best model is 94% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "{'train_runtime': 14.6848, 'train_samples_per_second': 13894.959, 'train_steps_per_second': 434.258, 'train_loss': 10.199760437011719, 'epoch': 0.02}\n" ] } ], "source": [ "''' import AutoML class from flaml package '''\n", "from flaml import AutoML\n", "automl = AutoML()\n", "\n", "import ray\n", "if not ray.is_initialized():\n", " ray.init()\n", "\n", "automl_settings = {\n", " \"time_budget\": 500, # setting the time budget\n", " \"task\": \"summarization\", # setting the task as summarization\n", " \"fit_kwargs_by_estimator\": { # if model_path is not set, the default model is t5-small: https://huggingface.co/t5-small\n", " \"transformer\": {\n", " \"output_dir\": \"data/output/\", # setting the output directory\n", " \"model_path\": \"t5-small\",\n", " \"per_device_eval_batch_size\": 16, # the batch size for validation (inference)\n", " }\n", " },\n", " \"gpu_per_trial\": 1, # set to 0 if no GPU is available\n", " \"log_file_name\": \"seqclass.log\", # set the file to save the log for HPO\n", " \"log_type\": \"all\", # the log type for trials: \"all\" if logging all the trials, \"better\" if only keeping the better trials\n", " \"use_ray\": {\"local_dir\": \"data/output/\"}, # set whether to use Ray\n", " \"metric\": \"rouge1\",\n", " \"n_concurrent_trials\": 4, # sample: False # if the time is sufficient (e.g., longer than one trial's running time), you can set \n", "}\n", "\n", "'''The main flaml automl API'''\n", "automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 3.6439277745413994e-06, 'num_train_epochs': 0.454119690781029, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.04654549348562217, 'weight_decay': 0.06669806327326033, 'adam_epsilon': 2.5833461668835812e-08, 'seed': 42, 'global_max_steps': 125, 'learner': 'transformer', 'FLAML_sample_size': 10000}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 3.6439277745413994e-06, 'num_train_epochs': 0.454119690781029, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.04654549348562217, 'weight_decay': 0.06669806327326033, 'adam_epsilon': 2.5833461668835812e-08, 'seed': 42, 'global_max_steps': 125, 'learner': 'transformer', 'FLAML_sample_size': 10000}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 112, 'learner': 'transformer', 'FLAML_sample_size': 10000}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 112, 'learner': 'transformer', 'FLAML_sample_size': 10000}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 3.4236378229097798e-06, 'num_train_epochs': 8.919336644807531, 'per_device_train_batch_size': 4, 'warmup_ratio': 0.022492820063166875, 'weight_decay': 0.27013721375576616, 'adam_epsilon': 6.366959214432801e-08, 'seed': 43, 'global_max_steps': 180, 'learner': 'transformer', 'FLAML_sample_size': 10000}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 112, 'learner': 'transformer', 'FLAML_sample_size': 10000}}\n", "{'Current Learner': 'transformer', 'Current Sample': 10000, 'Current Hyper-parameters': {'learning_rate': 2.83823390666728e-06, 'num_train_epochs': 1.6667827812145841, 'per_device_train_batch_size': 16, 'warmup_ratio': 0.04013366246992448, 'weight_decay': 0.2945152447208819, 'adam_epsilon': 4.694476379503266e-08, 'seed': 43, 'global_max_steps': 163, 'learner': 'transformer', 'FLAML_sample_size': 10000}, 'Best Learner': 'transformer', 'Best Hyper-parameters': {'learning_rate': 1.0000000000000003e-05, 'num_train_epochs': 1.0, 'per_device_train_batch_size': 32, 'warmup_ratio': 0.0, 'weight_decay': 0.0, 'adam_epsilon': 1e-06, 'seed': 42, 'global_max_steps': 112, 'learner': 'transformer', 'FLAML_sample_size': 10000}}\n", "4\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "\n", "from flaml.data import get_output_from_log\n", "time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \\\n", " get_output_from_log(filename=automl_settings['log_file_name'], time_budget=3000)\n", "for config in config_history:\n", " print(config)\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "plt.title('Learning Curve')\n", "plt.xlabel('Wall Clock Time (s)')\n", "plt.ylabel('Rouge 1')\n", "print(len(valid_loss_history))\n", "plt.scatter(time_history, 1 - np.array(valid_loss_history))\n", "plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n", "plt.show()" ] } ], "metadata": { "interpreter": { "hash": "e9d36fc5b7c3dd4177ff1b60184dd696c0acc18150a44682abca4d769811bd46" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 2 }