autogen/notebook/flaml_demo.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Copyright (c) 2020-2021 Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# Demo of AutoML with FLAML Library\n",
    "\n",
    "\n",
    "## 1. Introduction\n",
    "\n",
    "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
    "with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
    "to use and extend, such as adding new learners. FLAML can \n",
    "- serve as an economical AutoML engine,\n",
    "- be used as a fast hyperparameter tuning tool, or \n",
    "- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
    "   tuning tasks.\n",
    "\n",
    "In this notebook, we use one real data example (binary classification) to showcase how to ues FLAML library.\n",
    "\n",
    "FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
    "```bash\n",
    "pip install flaml[notebook]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Collecting flaml[notebook]\n",
      "  Downloading FLAML-0.2.3-py3-none-any.whl (77 kB)\n",
      "Requirement already satisfied: scipy>=1.4.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (1.4.1)  WARNING: The script optuna.exe is installed in 'C:\\Users\\chiw\\Miniconda3\\envs\\flaml\\Scripts' which is not on PATH.\n",
      "  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n",
      "\n",
      "Processing c:\\users\\chiw\\appdata\\local\\pip\\cache\\wheels\\38\\61\\9e\\955ab1890f6cab231b1d756db63f36c711968a324296e0b649\\optuna-2.3.0-py3-none-any.whl\n",
      "Requirement already satisfied: xgboost>=0.90 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (1.3.3)\n",
      "Requirement already satisfied: catboost>=0.23 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (0.23.2)\n",
      "Requirement already satisfied: NumPy>=1.16.2 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (1.18.4)\n",
      "Requirement already satisfied: scikit-learn>=0.23.2 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (0.23.2)\n",
      "Requirement already satisfied: lightgbm>=2.3.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (2.3.1)\n",
      "Requirement already satisfied: matplotlib==3.2.0; extra == \"notebook\" in c:\\users\\chiw\\appdata\\roaming\\python\\python37\\site-packages (from flaml[notebook]) (3.2.0)\n",
      "Requirement already satisfied: rgf-python; extra == \"notebook\" in c:\\users\\chiw\\appdata\\roaming\\python\\python37\\site-packages (from flaml[notebook]) (3.9.0)\n",
      "Requirement already satisfied: openml==0.10.2; extra == \"notebook\" in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (0.10.2)\n",
      "Requirement already satisfied: jupyter; extra == \"notebook\" in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from flaml[notebook]) (1.0.0)\n",
      "Requirement already satisfied: packaging>=20.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (20.4)\n",
      "Requirement already satisfied: alembic in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (1.4.1)\n",
      "Requirement already satisfied: sqlalchemy>=1.1.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (1.3.20)\n",
      "Requirement already satisfied: tqdm in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (4.56.1)\n",
      "Requirement already satisfied: cliff in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (3.5.0)\n",
      "Requirement already satisfied: joblib in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (0.14.1)\n",
      "Requirement already satisfied: cmaes>=0.6.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (0.7.0)\n",
      "Requirement already satisfied: colorlog in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from optuna==2.3.0->flaml[notebook]) (4.6.2)\n",
      "Requirement already satisfied: graphviz in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from catboost>=0.23->flaml[notebook]) (0.14.1)\n",
      "Requirement already satisfied: plotly in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from catboost>=0.23->flaml[notebook]) (4.9.0)\n",
      "Requirement already satisfied: pandas>=0.24.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from catboost>=0.23->flaml[notebook]) (0.24.2)\n",
      "Requirement already satisfied: six in c:\\users\\chiw\\appdata\\roaming\\python\\python37\\site-packages (from catboost>=0.23->flaml[notebook]) (1.14.0)\n",
      "Requirement already satisfied: threadpoolctl>=2.0.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from scikit-learn>=0.23.2->flaml[notebook]) (2.0.0)\n",
      "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from matplotlib==3.2.0; extra == \"notebook\"->flaml[notebook]) (2.4.7)\n",
      "Requirement already satisfied: cycler>=0.10 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from matplotlib==3.2.0; extra == \"notebook\"->flaml[notebook]) (0.10.0)\n",
      "Requirement already satisfied: kiwisolver>=1.0.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from matplotlib==3.2.0; extra == \"notebook\"->flaml[notebook]) (1.2.0)\n",
      "Requirement already satisfied: python-dateutil>=2.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from matplotlib==3.2.0; extra == \"notebook\"->flaml[notebook]) (2.8.1)\n",
      "Requirement already satisfied: requests in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from openml==0.10.2; extra == \"notebook\"->flaml[notebook]) (2.25.0)\n",
      "Requirement already satisfied: xmltodict in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from openml==0.10.2; extra == \"notebook\"->flaml[notebook]) (0.12.0)\n",
      "Requirement already satisfied: liac-arff>=2.4.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from openml==0.10.2; extra == \"notebook\"->flaml[notebook]) (2.4.0)\n",
      "Requirement already satisfied: qtconsole in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (4.7.7)\n",
      "Requirement already satisfied: notebook in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (6.1.3)\n",
      "Requirement already satisfied: nbconvert in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (5.6.1)\n",
      "Requirement already satisfied: ipykernel in c:\\users\\chiw\\appdata\\roaming\\python\\python37\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (5.3.4)\n",
      "Requirement already satisfied: jupyter-console in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (6.2.0)\n",
      "Requirement already satisfied: ipywidgets in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from jupyter; extra == \"notebook\"->flaml[notebook]) (7.5.1)\n",
      "Requirement already satisfied: python-editor>=0.3 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from alembic->optuna==2.3.0->flaml[notebook]) (1.0.4)\n",
      "Requirement already satisfied: Mako in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from alembic->optuna==2.3.0->flaml[notebook]) (1.1.3)\n",
      "Requirement already satisfied: cmd2!=0.8.3,>=0.8.0 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from cliff->optuna==2.3.0->flaml[notebook]) (1.4.0)\n",
      "Requirement already satisfied: PrettyTable<0.8,>=0.7.2 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from cliff->optuna==2.3.0->flaml[notebook]) (0.7.2)\n",
      "Requirement already satisfied: stevedore>=2.0.1 in c:\\users\\chiw\\miniconda3\\envs\\flaml\\lib\\site-packages (from cliff->optuna==2.3.0->flaml[notebook]) (3.2.2)"
     ]
    }
   ],
   "source": [
    "!pip install flaml[notebook];"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 2. Real Data Example\n",
    "### Load data and preprocess\n",
    "\n",
    "Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "load dataset from ./openml_ds1169.pkl\nDataset name: airlines\nX_train.shape: (404537, 7), y_train.shape: (404537,);\nX_test.shape: (134846, 7), y_test.shape: (134846,)\n"
     ]
    }
   ],
   "source": [
    "from flaml.data import load_openml_dataset\n",
    "X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id = 1169, data_dir = './')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Run FLAML\n",
    "In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' import AutoML class from flaml package '''\n",
    "from flaml import AutoML\n",
    "automl = AutoML()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "settings = {\n",
    "    \"time_budget\": 300, # total running time in seconds\n",
    "    \"metric\": 'accuracy', # primary metrics can be chosen from: ['accuracy','roc_auc','f1','log_loss','mae','mse','r2']\n",
    "    # \"estimator_list\": ['lgbm', 'rf', 'xgboost'], # list of ML learners\n",
    "    \"task\": 'classification', # task type    \n",
    "    # \"sample\": False, # whether to subsample training data\n",
    "    \"log_file_name\": 'airlines_experiment.log', # cache directory of flaml log files \n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": [
     "outputPrepend"
    ]
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "error=0.3600,\tbest catboost's error=0.3600\n",
      "[flaml.automl: 02-17 13:53:08] {939} INFO - iteration 22  current learner catboost\n",
      "INFO - iteration 22  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:10] {1093} INFO -  at 11.5s,\tbest catboost's error=0.3600,\tbest catboost's error=0.3600\n",
      "INFO -  at 11.5s,\tbest catboost's error=0.3600,\tbest catboost's error=0.3600\n",
      "[flaml.automl: 02-17 13:53:10] {939} INFO - iteration 23  current learner rf\n",
      "INFO - iteration 23  current learner rf\n",
      "[flaml.automl: 02-17 13:53:10] {1093} INFO -  at 12.0s,\tbest rf's error=0.4000,\tbest catboost's error=0.3600\n",
      "INFO -  at 12.0s,\tbest rf's error=0.4000,\tbest catboost's error=0.3600\n",
      "[flaml.automl: 02-17 13:53:10] {939} INFO - iteration 24  current learner catboost\n",
      "INFO - iteration 24  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:11] {1093} INFO -  at 12.7s,\tbest catboost's error=0.3599,\tbest catboost's error=0.3599\n",
      "INFO -  at 12.7s,\tbest catboost's error=0.3599,\tbest catboost's error=0.3599\n",
      "[flaml.automl: 02-17 13:53:11] {939} INFO - iteration 25  current learner xgboost\n",
      "INFO - iteration 25  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:11] {1093} INFO -  at 12.9s,\tbest xgboost's error=0.3787,\tbest catboost's error=0.3599\n",
      "INFO -  at 12.9s,\tbest xgboost's error=0.3787,\tbest catboost's error=0.3599\n",
      "[flaml.automl: 02-17 13:53:11] {939} INFO - iteration 26  current learner extra_tree\n",
      "INFO - iteration 26  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:53:12] {1093} INFO -  at 13.4s,\tbest extra_tree's error=0.3967,\tbest catboost's error=0.3599\n",
      "INFO -  at 13.4s,\tbest extra_tree's error=0.3967,\tbest catboost's error=0.3599\n",
      "[flaml.automl: 02-17 13:53:12] {939} INFO - iteration 27  current learner catboost\n",
      "INFO - iteration 27  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:13] {1093} INFO -  at 14.2s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "INFO -  at 14.2s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:13] {939} INFO - iteration 28  current learner xgboost\n",
      "INFO - iteration 28  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:13] {1093} INFO -  at 14.4s,\tbest xgboost's error=0.3757,\tbest catboost's error=0.3598\n",
      "INFO -  at 14.4s,\tbest xgboost's error=0.3757,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:13] {939} INFO - iteration 29  current learner xgboost\n",
      "INFO - iteration 29  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:13] {1093} INFO -  at 14.4s,\tbest xgboost's error=0.3756,\tbest catboost's error=0.3598\n",
      "INFO -  at 14.4s,\tbest xgboost's error=0.3756,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:13] {939} INFO - iteration 30  current learner catboost\n",
      "INFO - iteration 30  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:13] {1093} INFO -  at 15.1s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "INFO -  at 15.1s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:13] {939} INFO - iteration 31  current learner lgbm\n",
      "INFO - iteration 31  current learner lgbm\n",
      "[flaml.automl: 02-17 13:53:14] {1093} INFO -  at 16.0s,\tbest lgbm's error=0.3618,\tbest catboost's error=0.3598\n",
      "INFO -  at 16.0s,\tbest lgbm's error=0.3618,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:14] {939} INFO - iteration 32  current learner catboost\n",
      "INFO - iteration 32  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:15] {1093} INFO -  at 17.2s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "INFO -  at 17.2s,\tbest catboost's error=0.3598,\tbest catboost's error=0.3598\n",
      "[flaml.automl: 02-17 13:53:15] {939} INFO - iteration 33  current learner catboost\n",
      "INFO - iteration 33  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:17] {1093} INFO -  at 19.0s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.0s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:17] {939} INFO - iteration 34  current learner xgboost\n",
      "INFO - iteration 34  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:17] {1093} INFO -  at 19.2s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.2s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:17] {939} INFO - iteration 35  current learner xgboost\n",
      "INFO - iteration 35  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:18] {1093} INFO -  at 19.4s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.4s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:18] {939} INFO - iteration 36  current learner xgboost\n",
      "INFO - iteration 36  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:18] {1093} INFO -  at 19.5s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.5s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:18] {939} INFO - iteration 37  current learner xgboost\n",
      "INFO - iteration 37  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:18] {1093} INFO -  at 19.7s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.7s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:18] {939} INFO - iteration 38  current learner xgboost\n",
      "INFO - iteration 38  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:18] {1093} INFO -  at 19.9s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "INFO -  at 19.9s,\tbest xgboost's error=0.3620,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:18] {939} INFO - iteration 39  current learner xgboost\n",
      "INFO - iteration 39  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:18] {1093} INFO -  at 20.2s,\tbest xgboost's error=0.3598,\tbest catboost's error=0.3592\n",
      "INFO -  at 20.2s,\tbest xgboost's error=0.3598,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:18] {939} INFO - iteration 40  current learner xgboost\n",
      "INFO - iteration 40  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:19] {1093} INFO -  at 20.4s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 20.4s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:19] {939} INFO - iteration 41  current learner xgboost\n",
      "INFO - iteration 41  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:19] {1093} INFO -  at 20.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 20.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:19] {939} INFO - iteration 42  current learner xgboost\n",
      "INFO - iteration 42  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:19] {1093} INFO -  at 21.0s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 21.0s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:19] {939} INFO - iteration 43  current learner catboost\n",
      "INFO - iteration 43  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:20] {1093} INFO -  at 22.1s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "INFO -  at 22.1s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:20] {939} INFO - iteration 44  current learner xgboost\n",
      "INFO - iteration 44  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:21] {1093} INFO -  at 22.3s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 22.3s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:21] {939} INFO - iteration 45  current learner extra_tree\n",
      "INFO - iteration 45  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:53:21] {1093} INFO -  at 22.8s,\tbest extra_tree's error=0.3915,\tbest catboost's error=0.3592\n",
      "INFO -  at 22.8s,\tbest extra_tree's error=0.3915,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:21] {939} INFO - iteration 46  current learner xgboost\n",
      "INFO - iteration 46  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:21] {1093} INFO -  at 23.1s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 23.1s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:21] {939} INFO - iteration 47  current learner xgboost\n",
      "INFO - iteration 47  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:23] {1093} INFO -  at 24.3s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 24.3s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:23] {939} INFO - iteration 48  current learner xgboost\n",
      "INFO - iteration 48  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:24] {1093} INFO -  at 25.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 25.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:24] {939} INFO - iteration 49  current learner catboost\n",
      "INFO - iteration 49  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:25] {1093} INFO -  at 26.8s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "INFO -  at 26.8s,\tbest catboost's error=0.3592,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:25] {939} INFO - iteration 50  current learner xgboost\n",
      "INFO - iteration 50  current learner xgboost\n",
      "[flaml.automl: 02-17 13:53:26] {1093} INFO -  at 27.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "INFO -  at 27.6s,\tbest xgboost's error=0.3593,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:26] {939} INFO - iteration 51  current learner extra_tree\n",
      "INFO - iteration 51  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:53:26] {1093} INFO -  at 28.2s,\tbest extra_tree's error=0.3910,\tbest catboost's error=0.3592\n",
      "INFO -  at 28.2s,\tbest extra_tree's error=0.3910,\tbest catboost's error=0.3592\n",
      "[flaml.automl: 02-17 13:53:26] {939} INFO - iteration 52  current learner catboost\n",
      "INFO - iteration 52  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:32] {1093} INFO -  at 34.1s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 34.1s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:32] {939} INFO - iteration 53  current learner catboost\n",
      "INFO - iteration 53  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:34] {1093} INFO -  at 36.0s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 36.0s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:34] {939} INFO - iteration 54  current learner catboost\n",
      "INFO - iteration 54  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:42] {1093} INFO -  at 43.7s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 43.7s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:42] {939} INFO - iteration 55  current learner lrl1\n",
      "INFO - iteration 55  current learner lrl1\n",
      "[flaml.automl: 02-17 13:53:42] {1093} INFO -  at 44.1s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "INFO -  at 44.1s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:42] {939} INFO - iteration 56  current learner catboost\n",
      "INFO - iteration 56  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:47] {1093} INFO -  at 48.3s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 48.3s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:47] {939} INFO - iteration 57  current learner lrl1\n",
      "INFO - iteration 57  current learner lrl1\n",
      "[flaml.automl: 02-17 13:53:47] {1093} INFO -  at 48.7s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "INFO -  at 48.7s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:47] {939} INFO - iteration 58  current learner lrl1\n",
      "INFO - iteration 58  current learner lrl1\n",
      "[flaml.automl: 02-17 13:53:47] {1093} INFO -  at 49.0s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "INFO -  at 49.0s,\tbest lrl1's error=0.4338,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:47] {939} INFO - iteration 59  current learner catboost\n",
      "INFO - iteration 59  current learner catboost\n",
      "[flaml.automl: 02-17 13:53:54] {1093} INFO -  at 55.4s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 55.4s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:53:54] {939} INFO - iteration 60  current learner catboost\n",
      "INFO - iteration 60  current learner catboost\n",
      "[flaml.automl: 02-17 13:54:00] {1093} INFO -  at 61.8s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "INFO -  at 61.8s,\tbest catboost's error=0.3553,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:54:00] {939} INFO - iteration 61  current learner lgbm\n",
      "INFO - iteration 61  current learner lgbm\n",
      "[flaml.automl: 02-17 13:54:01] {1093} INFO -  at 62.6s,\tbest lgbm's error=0.3618,\tbest catboost's error=0.3553\n",
      "INFO -  at 62.6s,\tbest lgbm's error=0.3618,\tbest catboost's error=0.3553\n",
      "[flaml.automl: 02-17 13:54:01] {939} INFO - iteration 62  current learner catboost\n",
      "INFO - iteration 62  current learner catboost\n",
      "[flaml.automl: 02-17 13:54:40] {1093} INFO -  at 101.8s,\tbest catboost's error=0.3476,\tbest catboost's error=0.3476\n",
      "INFO -  at 101.8s,\tbest catboost's error=0.3476,\tbest catboost's error=0.3476\n",
      "[flaml.automl: 02-17 13:54:40] {939} INFO - iteration 63  current learner catboost\n",
      "INFO - iteration 63  current learner catboost\n",
      "[flaml.automl: 02-17 13:54:48] {1093} INFO -  at 109.9s,\tbest catboost's error=0.3476,\tbest catboost's error=0.3476\n",
      "INFO -  at 109.9s,\tbest catboost's error=0.3476,\tbest catboost's error=0.3476\n",
      "[flaml.automl: 02-17 13:54:48] {939} INFO - iteration 64  current learner xgboost\n",
      "INFO - iteration 64  current learner xgboost\n",
      "[flaml.automl: 02-17 13:54:50] {1093} INFO -  at 112.0s,\tbest xgboost's error=0.3424,\tbest xgboost's error=0.3424\n",
      "INFO -  at 112.0s,\tbest xgboost's error=0.3424,\tbest xgboost's error=0.3424\n",
      "[flaml.automl: 02-17 13:54:50] {939} INFO - iteration 65  current learner xgboost\n",
      "INFO - iteration 65  current learner xgboost\n",
      "[flaml.automl: 02-17 13:54:56] {1093} INFO -  at 117.6s,\tbest xgboost's error=0.3424,\tbest xgboost's error=0.3424\n",
      "INFO -  at 117.6s,\tbest xgboost's error=0.3424,\tbest xgboost's error=0.3424\n",
      "[flaml.automl: 02-17 13:54:56] {939} INFO - iteration 66  current learner xgboost\n",
      "INFO - iteration 66  current learner xgboost\n",
      "[flaml.automl: 02-17 13:55:03] {1093} INFO -  at 125.1s,\tbest xgboost's error=0.3400,\tbest xgboost's error=0.3400\n",
      "INFO -  at 125.1s,\tbest xgboost's error=0.3400,\tbest xgboost's error=0.3400\n",
      "[flaml.automl: 02-17 13:55:03] {939} INFO - iteration 67  current learner xgboost\n",
      "INFO - iteration 67  current learner xgboost\n",
      "[flaml.automl: 02-17 13:55:06] {1093} INFO -  at 127.4s,\tbest xgboost's error=0.3400,\tbest xgboost's error=0.3400\n",
      "INFO -  at 127.4s,\tbest xgboost's error=0.3400,\tbest xgboost's error=0.3400\n",
      "[flaml.automl: 02-17 13:55:06] {939} INFO - iteration 68  current learner xgboost\n",
      "INFO - iteration 68  current learner xgboost\n",
      "[flaml.automl: 02-17 13:55:20] {1093} INFO -  at 141.8s,\tbest xgboost's error=0.3366,\tbest xgboost's error=0.3366\n",
      "INFO -  at 141.8s,\tbest xgboost's error=0.3366,\tbest xgboost's error=0.3366\n",
      "[flaml.automl: 02-17 13:55:20] {939} INFO - iteration 69  current learner xgboost\n",
      "INFO - iteration 69  current learner xgboost\n",
      "[flaml.automl: 02-17 13:55:25] {1093} INFO -  at 147.0s,\tbest xgboost's error=0.3366,\tbest xgboost's error=0.3366\n",
      "INFO -  at 147.0s,\tbest xgboost's error=0.3366,\tbest xgboost's error=0.3366\n",
      "[flaml.automl: 02-17 13:55:25] {939} INFO - iteration 70  current learner catboost\n",
      "INFO - iteration 70  current learner catboost\n",
      "[flaml.automl: 02-17 13:56:11] {1093} INFO -  at 192.7s,\tbest catboost's error=0.3476,\tbest xgboost's error=0.3366\n",
      "INFO -  at 192.7s,\tbest catboost's error=0.3476,\tbest xgboost's error=0.3366\n",
      "[flaml.automl: 02-17 13:56:11] {939} INFO - iteration 71  current learner xgboost\n",
      "INFO - iteration 71  current learner xgboost\n",
      "[flaml.automl: 02-17 13:56:29] {1093} INFO -  at 210.7s,\tbest xgboost's error=0.3317,\tbest xgboost's error=0.3317\n",
      "INFO -  at 210.7s,\tbest xgboost's error=0.3317,\tbest xgboost's error=0.3317\n",
      "[flaml.automl: 02-17 13:56:29] {939} INFO - iteration 72  current learner xgboost\n",
      "INFO - iteration 72  current learner xgboost\n",
      "[flaml.automl: 02-17 13:56:59] {1093} INFO -  at 240.5s,\tbest xgboost's error=0.3268,\tbest xgboost's error=0.3268\n",
      "INFO -  at 240.5s,\tbest xgboost's error=0.3268,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:56:59] {939} INFO - iteration 73  current learner xgboost\n",
      "INFO - iteration 73  current learner xgboost\n",
      "[flaml.automl: 02-17 13:57:14] {1093} INFO -  at 255.9s,\tbest xgboost's error=0.3268,\tbest xgboost's error=0.3268\n",
      "INFO -  at 255.9s,\tbest xgboost's error=0.3268,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:57:32] {1109} INFO - retrain xgboost for 18.0s\n",
      "INFO - retrain xgboost for 18.0s\n",
      "[flaml.automl: 02-17 13:57:32] {939} INFO - iteration 74  current learner extra_tree\n",
      "INFO - iteration 74  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:57:32] {1093} INFO -  at 274.2s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "INFO -  at 274.2s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:57:46] {1109} INFO - retrain extra_tree for 13.2s\n",
      "INFO - retrain extra_tree for 13.2s\n",
      "[flaml.automl: 02-17 13:57:46] {939} INFO - iteration 75  current learner extra_tree\n",
      "INFO - iteration 75  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:57:46] {1093} INFO -  at 287.8s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "INFO -  at 287.8s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:57:52] {1109} INFO - retrain extra_tree for 5.9s\n",
      "INFO - retrain extra_tree for 5.9s\n",
      "[flaml.automl: 02-17 13:57:52] {939} INFO - iteration 76  current learner extra_tree\n",
      "INFO - iteration 76  current learner extra_tree\n",
      "[flaml.automl: 02-17 13:57:52] {1093} INFO -  at 293.9s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "INFO -  at 293.9s,\tbest extra_tree's error=0.3910,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:57:56] {1109} INFO - retrain extra_tree for 3.8s\n",
      "INFO - retrain extra_tree for 3.8s\n",
      "[flaml.automl: 02-17 13:57:56] {939} INFO - iteration 77  current learner lgbm\n",
      "INFO - iteration 77  current learner lgbm\n",
      "[flaml.automl: 02-17 13:57:57] {1093} INFO -  at 299.0s,\tbest lgbm's error=0.3563,\tbest xgboost's error=0.3268\n",
      "INFO -  at 299.0s,\tbest lgbm's error=0.3563,\tbest xgboost's error=0.3268\n",
      "[flaml.automl: 02-17 13:57:58] {1109} INFO - retrain lgbm for 0.9s\n",
      "INFO - retrain lgbm for 0.9s\n",
      "[flaml.automl: 02-17 13:57:58] {1133} INFO - selected model: XGBClassifier(base_score=0.5, booster='gbtree',\n",
      "              colsample_bylevel=0.8909660754557278, colsample_bynode=1,\n",
      "              colsample_bytree=0.9330310727361396, gamma=0, gpu_id=-1,\n",
      "              grow_policy='lossguide', importance_type='gain',\n",
      "              interaction_constraints='', learning_rate=0.16464534671449255,\n",
      "              max_delta_step=0, max_depth=0, max_leaves=28,\n",
      "              min_child_weight=20.0, missing=nan, monotone_constraints='()',\n",
      "              n_estimators=1221, n_jobs=-1, num_parallel_tree=1, random_state=0,\n",
      "              reg_alpha=1e-10, reg_lambda=0.003747467958239166,\n",
      "              scale_pos_weight=1, subsample=1.0, tree_method='hist',\n",
      "              validate_parameters=1, verbosity=0)\n",
      "INFO - selected model: XGBClassifier(base_score=0.5, booster='gbtree',\n",
      "              colsample_bylevel=0.8909660754557278, colsample_bynode=1,\n",
      "              colsample_bytree=0.9330310727361396, gamma=0, gpu_id=-1,\n",
      "              grow_policy='lossguide', importance_type='gain',\n",
      "              interaction_constraints='', learning_rate=0.16464534671449255,\n",
      "              max_delta_step=0, max_depth=0, max_leaves=28,\n",
      "              min_child_weight=20.0, missing=nan, monotone_constraints='()',\n",
      "              n_estimators=1221, n_jobs=-1, num_parallel_tree=1, random_state=0,\n",
      "              reg_alpha=1e-10, reg_lambda=0.003747467958239166,\n",
      "              scale_pos_weight=1, subsample=1.0, tree_method='hist',\n",
      "              validate_parameters=1, verbosity=0)\n",
      "[flaml.automl: 02-17 13:57:58] {894} INFO - fit succeeded\n",
      "INFO - fit succeeded\n"
     ]
    }
   ],
   "source": [
    "'''The main flaml automl API'''\n",
    "automl.fit(X_train = X_train, y_train = y_train, **settings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Best model and metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Best ML leaner: xgboost\nBest hyperparmeter config: {'n_estimators': 1389.0, 'max_leaves': 28.0, 'min_child_weight': 20.0, 'learning_rate': 0.16464534671449255, 'subsample': 1.0, 'colsample_bylevel': 0.8909660754557278, 'colsample_bytree': 0.9330310727361396, 'reg_alpha': 1e-10, 'reg_lambda': 0.003747467958239166, 'FLAML_sample_size': 364083}\nBest accuracy on validation data: 0.6732\nTraining duration of best run: 29.74 s\n"
     ]
    }
   ],
   "source": [
    "''' retrieve best config and best learner'''\n",
    "print('Best ML leaner:', automl.best_estimator)\n",
    "print('Best hyperparmeter config:', automl.best_config)\n",
    "print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
    "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "XGBClassifier(base_score=0.5, booster='gbtree',\n",
       "              colsample_bylevel=0.8909660754557278, colsample_bynode=1,\n",
       "              colsample_bytree=0.9330310727361396, gamma=0, gpu_id=-1,\n",
       "              grow_policy='lossguide', importance_type='gain',\n",
       "              interaction_constraints='', learning_rate=0.16464534671449255,\n",
       "              max_delta_step=0, max_depth=0, max_leaves=28,\n",
       "              min_child_weight=20.0, missing=nan, monotone_constraints='()',\n",
       "              n_estimators=1221, n_jobs=-1, num_parallel_tree=1, random_state=0,\n",
       "              reg_alpha=1e-10, reg_lambda=0.003747467958239166,\n",
       "              scale_pos_weight=1, subsample=1.0, tree_method='hist',\n",
       "              validate_parameters=1, verbosity=0)"
      ]
     },
     "metadata": {},
     "execution_count": 7
    }
   ],
   "source": [
    "automl.model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' pickle and save the best model '''\n",
    "import pickle\n",
    "with open('best_model.pkl', 'wb') as f:\n",
    "    pickle.dump(automl.model, f, pickle.HIGHEST_PROTOCOL)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Predicted labels [1 0 1 ... 1 0 0]\nTrue labels [0 0 0 ... 0 1 0]\n"
     ]
    }
   ],
   "source": [
    "''' compute predictions of testing dataset ''' \n",
    "y_pred = automl.predict(X_test)\n",
    "print('Predicted labels', y_pred)\n",
    "print('True labels', y_test)\n",
    "y_pred_proba = automl.predict_proba(X_test)[:,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "accuracy = 0.6721222728149148\nroc_auc = 0.7252473500166565\nlog_loss = 0.6035663268278709\nf1 = 0.5905710872605036\n"
     ]
    }
   ],
   "source": [
    "''' compute different metric values on testing dataset'''\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))\n",
    "print('roc_auc', '=', 1 - sklearn_metric_loss_score('roc_auc', y_pred_proba, y_test))\n",
    "print('log_loss', '=', sklearn_metric_loss_score('log_loss', y_pred_proba, y_test))\n",
    "print('f1', '=', 1 - sklearn_metric_loss_score('f1', y_pred, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "See Section 4 for an accuracy comparison with default LightGBM and XGBoost.\n",
    "\n",
    "### Log history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 20.0, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 1e-10, 'reg_lambda': 1.0, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 20.0, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 1e-10, 'reg_lambda': 1.0, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 0.46335414315327306, 'subsample': 0.9339389930838808, 'log_max_bin': 10.0, 'colsample_bytree': 0.9904286645657556, 'reg_alpha': 2.841147337412889e-10, 'reg_lambda': 0.12000833497054482, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 0.46335414315327306, 'subsample': 0.9339389930838808, 'log_max_bin': 10.0, 'colsample_bytree': 0.9904286645657556, 'reg_alpha': 2.841147337412889e-10, 'reg_lambda': 0.12000833497054482, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 1.0, 'subsample': 0.9917683183663918, 'log_max_bin': 10.0, 'colsample_bytree': 0.9858892907525497, 'reg_alpha': 3.8783982645515837e-10, 'reg_lambda': 0.36607431863072826, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 1.0, 'subsample': 0.9917683183663918, 'log_max_bin': 10.0, 'colsample_bytree': 0.9858892907525497, 'reg_alpha': 3.8783982645515837e-10, 'reg_lambda': 0.36607431863072826, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 11.0, 'max_leaves': 17.0, 'min_child_weight': 14.947587304572773, 'learning_rate': 0.6092558236172073, 'subsample': 0.9659256891661986, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 3.816590663384559e-08, 'reg_lambda': 0.4482946615262561, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 11.0, 'max_leaves': 17.0, 'min_child_weight': 14.947587304572773, 'learning_rate': 0.6092558236172073, 'subsample': 0.9659256891661986, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 3.816590663384559e-08, 'reg_lambda': 0.4482946615262561, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 6.0, 'max_leaves': 4.0, 'min_child_weight': 2.776007506782275, 'learning_rate': 0.7179196339383696, 'subsample': 0.8746997476758036, 'log_max_bin': 9.0, 'colsample_bytree': 1.0, 'reg_alpha': 9.69511928836042e-10, 'reg_lambda': 0.17744769739709204, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 6.0, 'max_leaves': 4.0, 'min_child_weight': 2.776007506782275, 'learning_rate': 0.7179196339383696, 'subsample': 0.8746997476758036, 'log_max_bin': 9.0, 'colsample_bytree': 1.0, 'reg_alpha': 9.69511928836042e-10, 'reg_lambda': 0.17744769739709204, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'catboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'early_stopping_rounds': 10, 'learning_rate': 0.1, 'FLAML_sample_size': 10000}, 'Best Learner': 'catboost', 'Best Hyper-parameters': {'early_stopping_rounds': 10, 'learning_rate': 0.1, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'catboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'early_stopping_rounds': 11.0, 'learning_rate': 0.2, 'FLAML_sample_size': 10000}, 'Best Learner': 'catboost', 'Best Hyper-parameters': {'early_stopping_rounds': 11.0, 'learning_rate': 0.2, 'FLAML_sample_size': 10000}}\n{'Current Learner': 'catboost', 'Current Sample
     ]
    }
   ],
   "source": [
    "from flaml.data import get_output_from_log\n",
    "time_history, best_valid_loss_history, valid_loss_history, config_history, train_loss_history = \\\n",
    "    get_output_from_log(filename = settings['log_file_name'], time_budget = 60)\n",
    "\n",
    "for config in config_history:\n",
    "    print(config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "output_type": "display_data",
     "data": {
      "text/plain": "<Figure size 432x288 with 1 Axes>",
      "image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n  \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<!-- Created with matplotlib (https://matplotlib.org/) -->\r\n<svg height=\"277.314375pt\" version=\"1.1\" viewBox=\"0 0 398.50625 277.314375\" width=\"398.50625pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <defs>\r\n  <style type=\"text/css\">\r\n*{stroke-linecap:butt;stroke-linejoin:round;}\r\n  </style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n  <g id=\"patch_1\">\r\n   <path d=\"M 0 277.314375 \r\nL 398.50625 277.314375 \r\nL 398.50625 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:none;\"/>\r\n  </g>\r\n  <g id=\"axes_1\">\r\n   <g id=\"patch_2\">\r\n    <path d=\"M 56.50625 239.758125 \r\nL 391.30625 239.758125 \r\nL 391.30625 22.318125 \r\nL 56.50625 22.318125 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n   </g>\r\n   <g id=\"PathCollection_1\">\r\n    <defs>\r\n     <path d=\"M 0 3 \r\nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \r\nC 2.683901 1.55874 3 0.795609 3 0 \r\nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \r\nC 1.55874 -2.683901 0.795609 -3 0 -3 \r\nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \r\nC -2.683901 -1.55874 -3 -0.795609 -3 0 \r\nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \r\nC -1.55874 2.683901 -0.795609 3 0 3 \r\nz\r\n\" id=\"m2d37a8094e\" style=\"stroke:#1f77b4;\"/>\r\n    </defs>\r\n    <g clip-path=\"url(#p6482469d98)\">\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"71.724432\" xlink:href=\"#m2d37a8094e\" y=\"229.874489\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"75.639105\" xlink:href=\"#m2d37a8094e\" y=\"134.747216\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"99.750161\" xlink:href=\"#m2d37a8094e\" y=\"128.419943\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"102.492259\" xlink:href=\"#m2d37a8094e\" y=\"105.510852\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"105.153178\" xlink:href=\"#m2d37a8094e\" y=\"92.638125\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"155.053739\" xlink:href=\"#m2d37a8094e\" y=\"73.874489\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"182.521396\" xlink:href=\"#m2d37a8094e\" y=\"72.78358\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"196.247719\" xlink:href=\"#m2d37a8094e\" y=\"72.347216\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"239.272443\" xlink:href=\"#m2d37a8094e\" y=\"67.110852\"/>\r\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"376.088068\" xlink:href=\"#m2d37a8094e\" y=\"32.201761\"/>\r\n    </g>\r\n   </g>\r\n   <g id=\"matplotlib.axis_1\">\r\n    <g id=\"xtick_1\">\r\n     <g id=\"line2d_1\">\r\n      <defs>\r\n       <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"m4b9c2c4fdc\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n      </defs>\r\n      <g>\r\n       <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"68.065112\" xlink:href=\"#m4b9c2c4fdc\" y=\"239.758125\"/>\r\n      </g>\r\n     </g>\r\n     <g id=\"text_1\">\r\n      <!-- 0 -->\r\n      <defs>\r\n       <path d=\"M 31.78125 66.40625 \r\nQ 24.171875 66.40625 20.328125 58.90625 \r\nQ 16.5 51.421875 16.5 36.375 \r\nQ 16.5 21.390625 20.328125 13.890625 \r\nQ 24.171875 6.390625 31.78125 6.390625 \r\nQ 39.453125 6.390625 43.28125 13.890625 \r\nQ 47.125 21.390625 47.125 36.375 \r\nQ 47.125 51.421875 43.28125 58.90625 \r\nQ 39.453125 66.40625 31.78125 66.40625 \r\nz\r\nM 31.78125 74.21875 \r\nQ 44.046875 74.21875 50.515625 64.515625 \r\nQ 56.984375 54.828125 56.984375 36.375 \r\nQ 56.984375 17.96875 50.515625 8.265625 \r\nQ 44.046875 -1.421875 31.78125 -1.421875 \r\nQ 19.53125 -1.421875 13.0625 8.265625 \r\nQ 6.59375 17.96875 6.59375 36.375 \r\nQ 6.59375 54.828125 13.0625 64.515625 \r\nQ 19.53125 74.21875 31.78125 74.21875 \r\nz\r\n\" id=\"DejaVuSans-48\"/>\r\n      </defs>\r\n      <g transform=\"translate(64.883862 254.356562)scale(0.1 -0.1)\">\r\n       <use xlink:href=\"#DejaVuSans-48\"/>\r\n      </g>\r\n     </g>\r\n    </g>
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEWCAYAAABxMXBSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nO3de5xdVX338c83k4EMQkiQYENCDCqJopVEUqjiBag2EdQE7/B4Ka0CbWmptMFgiwV96Cs14u0RyQOUCq0oF2OIGokUTCgI5GJCLsTYSDFkkpJEiEAcCSS//rHXSU4OZ87sncyeOTPzfb9e85qz1157n9/ZhPObtdfaaykiMDMzy2tQbwdgZmZ9ixOHmZkV4sRhZmaFOHGYmVkhThxmZlaIE4eZmRXixGHWzSS9RdK63o7DrCxOHNavSHpM0tt7M4aI+M+IGF/W+SVNlnSvpGckbZW0SNJ7yno/s1pOHGYFSWrpxfd+P3AbcBMwGngZ8Fng3ftxLknyd4AV5n80NiBIGiRphqRfSvq1pFslHVG1/zZJ/yPpN+mv+ddW7fumpGskzZe0AzgttWz+TtLKdMwtkoak+qdK2lh1fKd10/5LJG2WtEnSJySFpFfV+QwCvgR8PiKuj4jfRMTuiFgUEZ9MdS6X9O9Vx4xN5xucthdKulLS/cBvgc9IWlrzPp+SNC+9PljSFyVtkPSEpNmS2g7wP4f1cU4cNlD8NTANeBtwNPAUcHXV/h8BxwFHAT8DvlVz/DnAlcBhwH2p7IPAFOBY4PXAnzR4/7p1JU0BLgbeDrwqxdeZ8cAxwO0N6uTxUeA8ss/y/4Dxko6r2n8OcHN6/c/AOGBCim8UWQvHBjAnDhsozgf+PiI2RsRzwOXA+yt/iUfEDRHxTNW+EyQdXnX8HRFxf/oL/3ep7GsRsSkingS+T/bl2pnO6n4Q+NeIWBMRvwWuaHCOl6bfm3N/6vq+md7vhYj4DXAHcDZASiCvBualFs4ngU9FxJMR8QzwT8CHD/D9rY9z4rCB4uXA9yRtl7QdWAvsAl4mqUXSzHQb62ngsXTMkVXHP17nnP9T9fq3wKEN3r+zukfXnLve+1T8Ov0e2aBOHrXvcTMpcZC1NuamJDYCOARYVnXd7kzlNoA5cdhA8TjwzogYVvUzJCLayb4sp5LdLjocGJuOUdXxZU0jvZmsk7vimAZ115F9jvc1qLOD7Mu+4vfq1Kn9LD8GjpQ0gSyBVG5TbQM6gNdWXbPDI6JRgrQBwInD+qNWSUOqfgYDs4ErJb0cQNIISVNT/cOA58j+oj+E7HZMT7kVOFfSayQdQoP+g8jWQLgYuEzSuZKGpk7/N0u6NlVbAbxV0ph0q+3SrgKIiBfI+k1mAUcAd6Xy3cB1wJclHQUgaZSkyfv9aa1fcOKw/mg+2V/KlZ/Lga8C84AfS3oGeBA4OdW/CfgV0A48kvb1iIj4EfA14CfAeuCBtOu5TurfDnwI+FNgE/AE8H/J+imIiLuAW4CVwDLgBzlDuZmsxXVbSiQVn05xPZhu4/0HWSe9DWDyQk5mzUPSa4DVwME1X+BmTcMtDrNeJuksSQdJGk42/PX7ThrWzJw4zHrf+cBW4JdkI73+vHfDMWvMt6rMzKwQtzjMzKyQwb0dQE848sgjY+zYsb0dhplZn7Js2bJtEfGiBz4HROIYO3YsS5cu7bqimZntIelX9cp9q8rMzApx4jAzs0KcOMzMrJBSE4ekKZLWSVovaUYndU6VtELSGkmLava1SFou6QdVZZdLak/HrJB0RpmfwczM9lVa53haXvNq4B3ARmCJpHkR8UhVnWHAN4ApEbGhMpFalYvIpr8eWlP+5Yj4Ylmxm5lZ58pscZwErI+IRyNiJ/Adsqmrq50DzImIDQARsaWyQ9Jo4Ezg+hJjNDPrl+Yub+eUmfdw7IwfcsrMe5i7vL3bzl1m4hjFvgvGbExl1cYBw9M6yMskfaxq31eAS4Dddc59YVq/+YY0v8+LSDpP0lJJS7du3XoAH8PMrG+Zu7ydS+eson17BwG0b+/g0jmrui15lJk4VKesdn6TwcCJZC2LyWTrDIyT9C5gS0Qsq3OOa4BXki29uRm4qt6bR8S1ETEpIiaNGOEFy8xs4Ji1YB0dz+/ap6zj+V3MWrCuW85f5gOAG9l3NbPRZOsH1NbZFhE7gB2S7gVOAN4AvCd1fA8Bhkr694j4SEQ8UTlY0nXkX2/AzGxA2LS9o1B5UWW2OJYAx0k6VtJBZAvcz6upcwfwFkmD0+pnJwNrI+LSiBgdEWPTcfdExEcAJFWvt3wW2doFZmaWHD2srVB5UaUljrSewIXAArKRUbdGxBpJF0i6INVZC9xJtlrZYuD6iOgqEXxB0ipJK4HTgE+V9RnMzPqi6ZPH09bask9ZW2sL0yd3z+KNA2Ja9UmTJoXnqjKzgWTu8nYuuX0lO3ftZtSwNqZPHs+0ibXjkxqTtCwiJtWWD4hJDs3MBpppE0fx7cUbALjl/Dd267k95YiZmRXixGFmZoU4cZiZWSFOHGZmVogTh5mZFeLEYWZmhThxmJlZIU4cZmZWiBOHmZkV4sRhZmaFOHGYmVkhThxmZlaIE4eZmRXixGFmZoU4cZiZWSFOHGZmVogTh5mZFeLEYWZmhThxmJlZIU4cZmZWiBOHmZkV4sRhZmaFOHGYmVkhThxmZlaIE4eZmRXixGFmZoU4cZiZWSGDezsAMytu7vJ2Zi1Yx6btHRw9rI3pk8czbeKo3g7LBggnDrM+Zu7ydi6ds4qO53cB0L69g0vnrAJw8rAe4cRh1sfMWrBuT9Ko6Hh+F5fcvpJvL97QS1FZM3pk89McP3Jot5/XfRxmfcym7R11y3fu2t3DkVizO37kUKZO6P5WqFscZn3M0cPaaK+TPEYNa+OW89/YCxHZQOMWh1kyd3k7p8y8h2Nn/JBTZt7D3OXtvR1SXdMnj6ettWWfsrbWFqZPHt9LEdlA4xaH9Tv7M+KoL3U4V+K55PaV7Ny1m1EeVWU9rNTEIWkK8FWgBbg+ImbWqXMq8BWgFdgWEW+r2tcCLAXaI+JdqewI4BZgLPAY8MGIeKrMz2EHpieHju5vAuiLHc4Htw5i4phhvj1lPa60xJG+9K8G3gFsBJZImhcRj1TVGQZ8A5gSERskHVVzmouAtUD1sIAZwN0RMVPSjLT96bI+hx2Ynv5Lfn8TQL0+A2juDueyOj7NulJmi+MkYH1EPAog6TvAVOCRqjrnAHMiYgNARGyp7JA0GjgTuBK4uOqYqcCp6fWNwEKcOJpWT/8lv78J4KCWQXXruMPZ7MXKTByjgMertjcCJ9fUGQe0SloIHAZ8NSJuSvu+AlySyqu9LCI2A0TE5jqtFAAknQecBzBmzJgD+Bh2IHp66Oj+JoDalhG4w9msM2UmDtUpizrvfyLwR0Ab8ICkB8kSypaIWJb6QAqLiGuBawEmTZpU+77WQ3p66Oj+JoDKbTNP42HWtTITx0bgmKrt0cCmOnW2RcQOYIeke4ETgDcA75F0BjAEGCrp3yPiI8ATkkam1sZIYAvWtKZPHt+jf8kfSAKYNnGUE4VZDooo549xSYOBX5C1JtqBJcA5EbGmqs5rgK8Dk4GDgMXAhyNidVWdU4G/qxpVNQv4dVXn+BERcUmjWCZNmhRLly7tzo9nBcxd3u6ho2Z9kKRlETGptry0FkdEvCDpQmAB2XDcGyJijaQL0v7ZEbFW0p3ASmA32ZDd1Z2fFYCZwK2S/gzYAHygrM9g3WPaxFF7OsLd0WzW95X6HEdEzAfm15TNrtmeBcxqcI6FZCOnKtu/JmvFmJlZL/CUI2ZmVogTh5mZFeLEYWZmhXSZONLcUGZmZkC+FsdDkm6TdIakeg/1mZnZAJIncYwjewL7o8B6Sf8kaVy5YZmZWbPqMnFE5q6IOBv4BPBxYLGkRZI8KN/MbIDp8jkOSS8FPkLW4ngC+CtgHjABuA04tswAzcysueR5APAB4N+AaRGxsap8qaTZnRxjZmb9VJ7EMT46mdAqIv65m+MxM7Mml6dz/MdppT4AJA2XtKDEmKz
     },
     "metadata": {
      "needs_background": "light"
     }
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "plt.title('Learning Curve')\n",
    "plt.xlabel('Wall Clock Time (s)')\n",
    "plt.ylabel('Validation Accuracy')\n",
    "plt.scatter(time_history, 1-np.array(valid_loss_history))\n",
    "plt.step(time_history, 1-np.array(best_valid_loss_history), where='post')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 3. Customized Learner"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Some experienced automl users may have a preferred model to tune or may already have a reasonably by-hand-tuned model before launching the automl experiment. They need to select optimal configurations for the customized model mixed with standard built-in learners. \n",
    "\n",
    "FLAML can easily incorporate customized/new learners (preferably with sklearn API) provided by users in a real-time manner, as demonstrated below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Example of Regularized Greedy Forest\n",
    "\n",
    "[Regularized Greedy Forest](https://arxiv.org/abs/1109.0887) (RGF) is a machine learning method currently not included in FLAML. The RGF has many tuning parameters, the most critical of which are: `[max_leaf, n_iter, n_tree_search, opt_interval, min_samples_leaf]`. To run a customized/new learner, the user needs to provide the following information:\n",
    "* an implementation of the customized/new learner\n",
    "* a list of hyperparameter names and types\n",
    "* rough ranges of hyperparameters (i.e., upper/lower bounds)\n",
    "* choose initial value corresponding to low cost for cost-related hyperparameters (e.g., initial value for max_leaf and n_iter should be small)\n",
    "\n",
    "In this example, the above information for RGF is wrapped in a python class called *MyRegularizedGreedyForest* that exposes the hyperparameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' SKLearnEstimator is the super class for a sklearn learner '''\n",
    "from flaml.model import SKLearnEstimator\n",
    "from flaml import tune\n",
    "from rgf.sklearn import RGFClassifier, RGFRegressor\n",
    "\n",
    "\n",
    "class MyRegularizedGreedyForest(SKLearnEstimator):\n",
    "\n",
    "\n",
    "    def __init__(self, task = 'binary:logistic', n_jobs = 1, **params):\n",
    "        '''Constructor\n",
    "        \n",
    "        Args:\n",
    "            task: A string of the task type, one of\n",
    "                'binary:logistic', 'multi:softmax', 'regression'\n",
    "            n_jobs: An integer of the number of parallel threads\n",
    "            params: A dictionary of the hyperparameter names and values\n",
    "        '''\n",
    "\n",
    "        super().__init__(task, **params)\n",
    "\n",
    "        '''task=regression for RGFRegressor; \n",
    "        binary:logistic and multiclass:softmax for RGFClassifier'''\n",
    "        if 'regression' in task:\n",
    "            self.estimator_class = RGFRegressor\n",
    "        else:\n",
    "            self.estimator_class = RGFClassifier\n",
    "\n",
    "        # convert to int for integer hyperparameters\n",
    "        self.params = {\n",
    "            \"n_jobs\": n_jobs,\n",
    "            'max_leaf': int(params['max_leaf']),\n",
    "            'n_iter': int(params['n_iter']),\n",
    "            'n_tree_search': int(params['n_tree_search']),\n",
    "            'opt_interval': int(params['opt_interval']),\n",
    "            'learning_rate': params['learning_rate'],\n",
    "            'min_samples_leaf':int(params['min_samples_leaf'])\n",
    "        }    \n",
    "\n",
    "    @classmethod\n",
    "    def search_space(cls, data_size, task):\n",
    "        '''[required method] search space\n",
    "\n",
    "        Returns:\n",
    "            A dictionary of the search space. \n",
    "            Each key is the name of a hyperparameter, and value is a dict with\n",
    "                its domain and init_value (optional), cat_hp_cost (optional) \n",
    "                e.g., \n",
    "                {'domain': tune.randint(lower=1, upper=10), 'init_value': 1}\n",
    "        '''\n",
    "        space = {        \n",
    "        'max_leaf': {'domain': tune.qloguniform(lower = 4, upper = data_size, q = 1), 'init_value': 4},\n",
    "        'n_iter': {'domain': tune.qloguniform(lower = 1, upper = data_size, q = 1), 'init_value': 1},\n",
    "        'n_tree_search': {'domain': tune.qloguniform(lower = 1, upper = 32768, q = 1), 'init_value': 1},\n",
    "        'opt_interval': {'domain': tune.qloguniform(lower = 1, upper = 10000, q = 1), 'init_value': 100},\n",
    "        'learning_rate': {'domain': tune.loguniform(lower = 0.01, upper = 20.0)},\n",
    "        'min_samples_leaf': {'domain': tune.qloguniform(lower = 1, upper = 20, q = 1), 'init_value': 20},\n",
    "        }\n",
    "        return space\n",
    "\n",
    "    @classmethod\n",
    "    def size(cls, config):\n",
    "        '''[optional method] memory size of the estimator in bytes\n",
    "        \n",
    "        Args:\n",
    "            config - the dict of the hyperparameter config\n",
    "\n",
    "        Returns:\n",
    "            A float of the memory size required by the estimator to train the\n",
    "            given config\n",
    "        '''\n",
    "        max_leaves = int(round(config['max_leaf']))\n",
    "        n_estimators = int(round(config['n_iter']))\n",
    "        return (max_leaves*3 + (max_leaves-1)*4 + 1.0)*n_estimators*8\n",
    "\n",
    "    @classmethod\n",
    "    def cost_relative2lgbm(cls):\n",
    "        '''[optional method] relative cost compared to lightgbm\n",
    "        '''\n",
    "        return 1.0\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Add Customized Learner and Run FLAML AutoML\n",
    "\n",
    "After adding RGF into the list of learners, we run automl by tuning hyperpameters of RGF as well as the default learners. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "automl = AutoML()\n",
    "automl.add_learner(learner_name = 'RGF', learner_class = MyRegularizedGreedyForest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "[flaml.automl: 02-17 13:58:01] {839} INFO - Evaluation method: holdout\n",
      "INFO - Evaluation method: holdout\n",
      "[flaml.automl: 02-17 13:58:01] {564} INFO - Using StratifiedKFold\n",
      "INFO - Using StratifiedKFold\n",
      "[flaml.automl: 02-17 13:58:01] {860} INFO - Minimizing error metric: 1-accuracy\n",
      "INFO - Minimizing error metric: 1-accuracy\n",
      "[flaml.automl: 02-17 13:58:01] {880} INFO - List of ML learners in AutoML Run: ['RGF', 'lgbm', 'rf', 'xgboost']\n",
      "INFO - List of ML learners in AutoML Run: ['RGF', 'lgbm', 'rf', 'xgboost']\n",
      "[flaml.automl: 02-17 13:58:01] {939} INFO - iteration 0  current learner RGF\n",
      "INFO - iteration 0  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:02] {1093} INFO -  at 1.4s,\tbest RGF's error=0.3787,\tbest RGF's error=0.3787\n",
      "INFO -  at 1.4s,\tbest RGF's error=0.3787,\tbest RGF's error=0.3787\n",
      "[flaml.automl: 02-17 13:58:02] {939} INFO - iteration 1  current learner RGF\n",
      "INFO - iteration 1  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:04] {1093} INFO -  at 2.9s,\tbest RGF's error=0.3787,\tbest RGF's error=0.3787\n",
      "INFO -  at 2.9s,\tbest RGF's error=0.3787,\tbest RGF's error=0.3787\n",
      "[flaml.automl: 02-17 13:58:04] {939} INFO - iteration 2  current learner lgbm\n",
      "INFO - iteration 2  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:04] {1093} INFO -  at 3.1s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "INFO -  at 3.1s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 02-17 13:58:04] {939} INFO - iteration 3  current learner lgbm\n",
      "INFO - iteration 3  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:04] {1093} INFO -  at 3.3s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "INFO -  at 3.3s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 02-17 13:58:04] {939} INFO - iteration 4  current learner RGF\n",
      "INFO - iteration 4  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:06] {1093} INFO -  at 4.8s,\tbest RGF's error=0.3787,\tbest lgbm's error=0.3777\n",
      "INFO -  at 4.8s,\tbest RGF's error=0.3787,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 02-17 13:58:06] {939} INFO - iteration 5  current learner lgbm\n",
      "INFO - iteration 5  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:06] {1093} INFO -  at 5.0s,\tbest lgbm's error=0.3669,\tbest lgbm's error=0.3669\n",
      "INFO -  at 5.0s,\tbest lgbm's error=0.3669,\tbest lgbm's error=0.3669\n",
      "[flaml.automl: 02-17 13:58:06] {939} INFO - iteration 6  current learner lgbm\n",
      "INFO - iteration 6  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:06] {1093} INFO -  at 5.2s,\tbest lgbm's error=0.3669,\tbest lgbm's error=0.3669\n",
      "INFO -  at 5.2s,\tbest lgbm's error=0.3669,\tbest lgbm's error=0.3669\n",
      "[flaml.automl: 02-17 13:58:06] {939} INFO - iteration 7  current learner lgbm\n",
      "INFO - iteration 7  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:06] {1093} INFO -  at 5.3s,\tbest lgbm's error=0.3662,\tbest lgbm's error=0.3662\n",
      "INFO -  at 5.3s,\tbest lgbm's error=0.3662,\tbest lgbm's error=0.3662\n",
      "[flaml.automl: 02-17 13:58:06] {939} INFO - iteration 8  current learner lgbm\n",
      "INFO - iteration 8  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 5.7s,\tbest lgbm's error=0.3636,\tbest lgbm's error=0.3636\n",
      "INFO -  at 5.7s,\tbest lgbm's error=0.3636,\tbest lgbm's error=0.3636\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 9  current learner lgbm\n",
      "INFO - iteration 9  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 5.9s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 5.9s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 10  current learner lgbm\n",
      "INFO - iteration 10  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 6.2s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 6.2s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 11  current learner lgbm\n",
      "INFO - iteration 11  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 6.3s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 6.3s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 12  current learner lgbm\n",
      "INFO - iteration 12  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 6.4s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 6.4s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 13  current learner lgbm\n",
      "INFO - iteration 13  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:07] {1093} INFO -  at 6.6s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 6.6s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:07] {939} INFO - iteration 14  current learner lgbm\n",
      "INFO - iteration 14  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:08] {1093} INFO -  at 6.9s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 6.9s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:08] {939} INFO - iteration 15  current learner lgbm\n",
      "INFO - iteration 15  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:09] {1093} INFO -  at 8.6s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "INFO -  at 8.6s,\tbest lgbm's error=0.3621,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:09] {939} INFO - iteration 16  current learner xgboost\n",
      "INFO - iteration 16  current learner xgboost\n",
      "[flaml.automl: 02-17 13:58:10] {1093} INFO -  at 8.7s,\tbest xgboost's error=0.3787,\tbest lgbm's error=0.3621\n",
      "INFO -  at 8.7s,\tbest xgboost's error=0.3787,\tbest lgbm's error=0.3621\n",
      "[flaml.automl: 02-17 13:58:10] {939} INFO - iteration 17  current learner lgbm\n",
      "INFO - iteration 17  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:12] {1093} INFO -  at 10.7s,\tbest lgbm's error=0.3611,\tbest lgbm's error=0.3611\n",
      "INFO -  at 10.7s,\tbest lgbm's error=0.3611,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:12] {939} INFO - iteration 18  current learner xgboost\n",
      "INFO - iteration 18  current learner xgboost\n",
      "[flaml.automl: 02-17 13:58:12] {1093} INFO -  at 10.9s,\tbest xgboost's error=0.3787,\tbest lgbm's error=0.3611\n",
      "INFO -  at 10.9s,\tbest xgboost's error=0.3787,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:12] {939} INFO - iteration 19  current learner xgboost\n",
      "INFO - iteration 19  current learner xgboost\n",
      "[flaml.automl: 02-17 13:58:12] {1093} INFO -  at 11.0s,\tbest xgboost's error=0.3757,\tbest lgbm's error=0.3611\n",
      "INFO -  at 11.0s,\tbest xgboost's error=0.3757,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:12] {939} INFO - iteration 20  current learner xgboost\n",
      "INFO - iteration 20  current learner xgboost\n",
      "[flaml.automl: 02-17 13:58:12] {1093} INFO -  at 11.1s,\tbest xgboost's error=0.3756,\tbest lgbm's error=0.3611\n",
      "INFO -  at 11.1s,\tbest xgboost's error=0.3756,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:12] {939} INFO - iteration 21  current learner rf\n",
      "INFO - iteration 21  current learner rf\n",
      "[flaml.automl: 02-17 13:58:13] {1093} INFO -  at 11.8s,\tbest rf's error=0.4012,\tbest lgbm's error=0.3611\n",
      "INFO -  at 11.8s,\tbest rf's error=0.4012,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:13] {939} INFO - iteration 22  current learner RGF\n",
      "INFO - iteration 22  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:14] {1093} INFO -  at 13.2s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3611\n",
      "INFO -  at 13.2s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3611\n",
      "[flaml.automl: 02-17 13:58:14] {939} INFO - iteration 23  current learner lgbm\n",
      "INFO - iteration 23  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:16] {1093} INFO -  at 14.7s,\tbest lgbm's error=0.3585,\tbest lgbm's error=0.3585\n",
      "INFO -  at 14.7s,\tbest lgbm's error=0.3585,\tbest lgbm's error=0.3585\n",
      "[flaml.automl: 02-17 13:58:16] {939} INFO - iteration 24  current learner rf\n",
      "INFO - iteration 24  current learner rf\n",
      "[flaml.automl: 02-17 13:58:16] {1093} INFO -  at 15.3s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3585\n",
      "INFO -  at 15.3s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3585\n",
      "[flaml.automl: 02-17 13:58:16] {939} INFO - iteration 25  current learner xgboost\n",
      "INFO - iteration 25  current learner xgboost\n",
      "[flaml.automl: 02-17 13:58:16] {1093} INFO -  at 15.5s,\tbest xgboost's error=0.3756,\tbest lgbm's error=0.3585\n",
      "INFO -  at 15.5s,\tbest xgboost's error=0.3756,\tbest lgbm's error=0.3585\n",
      "[flaml.automl: 02-17 13:58:16] {939} INFO - iteration 26  current learner lgbm\n",
      "INFO - iteration 26  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:18] {1093} INFO -  at 16.9s,\tbest lgbm's error=0.3585,\tbest lgbm's error=0.3585\n",
      "INFO -  at 16.9s,\tbest lgbm's error=0.3585,\tbest lgbm's error=0.3585\n",
      "[flaml.automl: 02-17 13:58:18] {939} INFO - iteration 27  current learner lgbm\n",
      "INFO - iteration 27  current learner lgbm\n",
      "[flaml.automl: 02-17 13:58:21] {1093} INFO -  at 19.6s,\tbest lgbm's error=0.3531,\tbest lgbm's error=0.3531\n",
      "INFO -  at 19.6s,\tbest lgbm's error=0.3531,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:58:21] {939} INFO - iteration 28  current learner rf\n",
      "INFO - iteration 28  current learner rf\n",
      "[flaml.automl: 02-17 13:58:21] {1093} INFO -  at 20.3s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3531\n",
      "INFO -  at 20.3s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:58:21] {939} INFO - iteration 29  current learner rf\n",
      "INFO - iteration 29  current learner rf\n",
      "[flaml.automl: 02-17 13:58:22] {1093} INFO -  at 20.9s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3531\n",
      "INFO -  at 20.9s,\tbest rf's error=0.3977,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:58:22] {939} INFO - iteration 30  current learner RGF\n",
      "INFO - iteration 30  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:23] {1093} INFO -  at 21.9s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "INFO -  at 21.9s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:58:23] {939} INFO - iteration 31  current learner RGF\n",
      "INFO - iteration 31  current learner RGF\n",
      "[flaml.automl: 02-17 13:58:24] {1093} INFO -  at 23.3s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "INFO -  at 23.3s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:58:24] {939} INFO - iteration 32  current learner RGF\n",
      "INFO - iteration 32  current learner RGF\n",
      "[flaml.automl: 02-17 13:59:08] {1093} INFO -  at 67.1s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "INFO -  at 67.1s,\tbest RGF's error=0.3674,\tbest lgbm's error=0.3531\n",
      "[flaml.automl: 02-17 13:59:08] {1133} INFO - selected model: LGBMClassifier(learning_rate=0.1564464373197609, max_bin=511,\n",
      "               min_child_weight=1.4188300323104601, n_estimators=12,\n",
      "               num_leaves=45, objective='binary',\n",
      "               reg_alpha=3.209664512322882e-10, reg_lambda=0.8927146483558472,\n",
      "               subsample=0.96058565726185)\n",
      "INFO - selected model: LGBMClassifier(learning_rate=0.1564464373197609, max_bin=511,\n",
      "               min_child_weight=1.4188300323104601, n_estimators=12,\n",
      "               num_leaves=45, objective='binary',\n",
      "               reg_alpha=3.209664512322882e-10, reg_lambda=0.8927146483558472,\n",
      "               subsample=0.96058565726185)\n",
      "[flaml.automl: 02-17 13:59:08] {894} INFO - fit succeeded\n",
      "INFO - fit succeeded\n"
     ]
    }
   ],
   "source": [
    "settings = {\n",
    "    \"time_budget\": 60, # total running time in seconds\n",
    "    \"metric\": 'accuracy', \n",
    "    \"estimator_list\": ['RGF', 'lgbm', 'rf', 'xgboost'], # list of ML learners\n",
    "    \"task\": 'classification', # task type    \n",
    "    \"sample\": True, # whether to subsample training data\n",
    "    \"log_file_name\": 'airlines_experiment.log', # cache directory of flaml log files \n",
    "    \"log_training_metric\": True, # whether to log training metric\n",
    "}\n",
    "\n",
    "'''The main flaml automl API'''\n",
    "automl.fit(X_train = X_train, y_train = y_train, **settings)"
   ]
  },
  {
   "source": [
    "## 4. Comparison with alternatives\n",
    "\n",
    "### FLAML's accuracy"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "flaml accuracy = 0.6721222728149148\n"
     ]
    }
   ],
   "source": [
    "print('flaml accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  },
  {
   "source": [
    "### Default LightGBM"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lightgbm import LGBMClassifier\n",
    "lgbm = LGBMClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "LGBMClassifier()"
      ]
     },
     "metadata": {},
     "execution_count": 18
    }
   ],
   "source": [
    "lgbm.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "default lgbm accuracy = 0.6602123904305652\n"
     ]
    }
   ],
   "source": [
    "y_pred = lgbm.predict(X_test)\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('default lgbm accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  },
  {
   "source": [
    "### Default XGBoost"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "from xgboost import XGBClassifier\n",
    "xgb = XGBClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "output_type": "execute_result",
     "data": {
      "text/plain": [
       "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
       "              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
       "              importance_type='gain', interaction_constraints='',\n",
       "              learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
       "              min_child_weight=1, missing=nan, monotone_constraints='()',\n",
       "              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,\n",
       "              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n",
       "              tree_method='exact', validate_parameters=1, verbosity=None)"
      ]
     },
     "metadata": {},
     "execution_count": 21
    }
   ],
   "source": [
    "xgb.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "default xgboost accuracy = 0.6676060098186078\n"
     ]
    }
   ],
   "source": [
    "y_pred = xgb.predict(X_test)\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('default xgboost accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3",
   "language": "python"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}