autogen/notebook/flaml_automl.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Copyright (c) 2020-2021 Microsoft Corporation. All rights reserved. \n",
    "\n",
    "Licensed under the MIT License.\n",
    "\n",
    "# AutoML with FLAML Library\n",
    "\n",
    "\n",
    "## 1. Introduction\n",
    "\n",
    "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
    "with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
    "to use and extend, such as adding new learners. FLAML can \n",
    "- serve as an economical AutoML engine,\n",
    "- be used as a fast hyperparameter tuning tool, or \n",
    "- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
    "   tuning tasks.\n",
    "\n",
    "In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library.\n",
    "\n",
    "FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
    "```bash\n",
    "pip install flaml[notebook]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install flaml[notebook];"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 2. Classification Example\n",
    "### Load data and preprocess\n",
    "\n",
    "Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "load dataset from ./openml_ds1169.pkl\n",
      "Dataset name: airlines\n",
      "X_train.shape: (404537, 7), y_train.shape: (404537,);\n",
      "X_test.shape: (134846, 7), y_test.shape: (134846,)\n"
     ]
    }
   ],
   "source": [
    "from flaml.data import load_openml_dataset\n",
    "X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Run FLAML\n",
    "In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' import AutoML class from flaml package '''\n",
    "from flaml import AutoML\n",
    "automl = AutoML()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "settings = {\n",
    "    \"time_budget\": 300,  # total running time in seconds\n",
    "    \"metric\": 'accuracy',  # primary metrics can be chosen from: ['accuracy','roc_auc','roc_auc_ovr','roc_auc_ovo','f1','log_loss','mae','mse','r2']\n",
    "    \"task\": 'classification',  # task type    \n",
    "    \"log_file_name\": 'airlines_experiment.log',  # flaml log file\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": [
     "outputPrepend"
    ]
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "or=0.3575,\tbest xgboost's error=0.3575\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 27, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:09] {1160} INFO -  at 2.4s,\tbest extra_tree's error=0.4013,\tbest xgboost's error=0.3575\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 28, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:09] {1160} INFO -  at 2.4s,\tbest extra_tree's error=0.4013,\tbest xgboost's error=0.3575\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 29, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:09] {1160} INFO -  at 2.5s,\tbest extra_tree's error=0.4013,\tbest xgboost's error=0.3575\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 30, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:09] {1160} INFO -  at 2.7s,\tbest xgboost's error=0.3575,\tbest xgboost's error=0.3575\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 31, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:09] {1160} INFO -  at 3.0s,\tbest xgboost's error=0.3567,\tbest xgboost's error=0.3567\n",
      "[flaml.automl: 07-06 10:20:09] {1012} INFO - iteration 32, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:10] {1160} INFO -  at 3.3s,\tbest xgboost's error=0.3567,\tbest xgboost's error=0.3567\n",
      "[flaml.automl: 07-06 10:20:10] {1012} INFO - iteration 33, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:10] {1160} INFO -  at 3.4s,\tbest extra_tree's error=0.3918,\tbest xgboost's error=0.3567\n",
      "[flaml.automl: 07-06 10:20:10] {1012} INFO - iteration 34, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:10] {1160} INFO -  at 3.9s,\tbest xgboost's error=0.3505,\tbest xgboost's error=0.3505\n",
      "[flaml.automl: 07-06 10:20:10] {1012} INFO - iteration 35, current learner catboost\n",
      "[flaml.automl: 07-06 10:20:11] {1160} INFO -  at 4.6s,\tbest catboost's error=0.3624,\tbest xgboost's error=0.3505\n",
      "[flaml.automl: 07-06 10:20:11] {1012} INFO - iteration 36, current learner catboost\n",
      "[flaml.automl: 07-06 10:20:12] {1160} INFO -  at 5.6s,\tbest catboost's error=0.3624,\tbest xgboost's error=0.3505\n",
      "[flaml.automl: 07-06 10:20:12] {1012} INFO - iteration 37, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:12] {1160} INFO -  at 5.7s,\tbest extra_tree's error=0.3918,\tbest xgboost's error=0.3505\n",
      "[flaml.automl: 07-06 10:20:12] {1012} INFO - iteration 38, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:12] {1160} INFO -  at 6.0s,\tbest xgboost's error=0.3505,\tbest xgboost's error=0.3505\n",
      "[flaml.automl: 07-06 10:20:12] {1012} INFO - iteration 39, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:14] {1160} INFO -  at 8.0s,\tbest xgboost's error=0.3504,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:14] {1012} INFO - iteration 40, current learner catboost\n",
      "[flaml.automl: 07-06 10:20:15] {1160} INFO -  at 8.2s,\tbest catboost's error=0.3614,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:15] {1012} INFO - iteration 41, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:15] {1160} INFO -  at 9.0s,\tbest xgboost's error=0.3504,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:15] {1012} INFO - iteration 42, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:15] {1160} INFO -  at 9.0s,\tbest extra_tree's error=0.3918,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:15] {1012} INFO - iteration 43, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:16] {1160} INFO -  at 9.2s,\tbest lgbm's error=0.3681,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:16] {1012} INFO - iteration 44, current learner xgboost\n",
      "[flaml.automl: 07-06 10:20:22] {1160} INFO -  at 15.9s,\tbest xgboost's error=0.3504,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:22] {1012} INFO - iteration 45, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:22] {1160} INFO -  at 16.1s,\tbest extra_tree's error=0.3883,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:22] {1012} INFO - iteration 46, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.2s,\tbest lgbm's error=0.3681,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 47, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.4s,\tbest lgbm's error=0.3607,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 48, current learner rf\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.5s,\tbest rf's error=0.4019,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 49, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.7s,\tbest lgbm's error=0.3607,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 50, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.8s,\tbest lgbm's error=0.3607,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 51, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 16.9s,\tbest extra_tree's error=0.3883,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 52, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:23] {1160} INFO -  at 17.1s,\tbest lgbm's error=0.3591,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:23] {1012} INFO - iteration 53, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:24] {1160} INFO -  at 17.3s,\tbest lgbm's error=0.3591,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:24] {1012} INFO - iteration 54, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:24] {1160} INFO -  at 17.6s,\tbest lgbm's error=0.3591,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:24] {1012} INFO - iteration 55, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:24] {1160} INFO -  at 17.7s,\tbest lgbm's error=0.3591,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:24] {1012} INFO - iteration 56, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:24] {1160} INFO -  at 18.0s,\tbest extra_tree's error=0.3877,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:24] {1012} INFO - iteration 57, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:25] {1160} INFO -  at 18.3s,\tbest lgbm's error=0.3532,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:25] {1012} INFO - iteration 58, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:25] {1160} INFO -  at 18.4s,\tbest lgbm's error=0.3532,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:25] {1012} INFO - iteration 59, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:25] {1160} INFO -  at 18.9s,\tbest lgbm's error=0.3532,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:25] {1012} INFO - iteration 60, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:25] {1160} INFO -  at 19.1s,\tbest lgbm's error=0.3532,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:25] {1012} INFO - iteration 61, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:26] {1160} INFO -  at 19.9s,\tbest lgbm's error=0.3532,\tbest xgboost's error=0.3504\n",
      "[flaml.automl: 07-06 10:20:26] {1012} INFO - iteration 62, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:28] {1160} INFO -  at 21.5s,\tbest lgbm's error=0.3476,\tbest lgbm's error=0.3476\n",
      "[flaml.automl: 07-06 10:20:28] {1012} INFO - iteration 63, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:29] {1160} INFO -  at 22.9s,\tbest lgbm's error=0.3476,\tbest lgbm's error=0.3476\n",
      "[flaml.automl: 07-06 10:20:29] {1012} INFO - iteration 64, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:31] {1160} INFO -  at 24.8s,\tbest lgbm's error=0.3470,\tbest lgbm's error=0.3470\n",
      "[flaml.automl: 07-06 10:20:31] {1012} INFO - iteration 65, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:33] {1160} INFO -  at 26.2s,\tbest lgbm's error=0.3470,\tbest lgbm's error=0.3470\n",
      "[flaml.automl: 07-06 10:20:33] {1012} INFO - iteration 66, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:35] {1160} INFO -  at 28.7s,\tbest lgbm's error=0.3470,\tbest lgbm's error=0.3470\n",
      "[flaml.automl: 07-06 10:20:35] {1012} INFO - iteration 67, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:36] {1160} INFO -  at 29.8s,\tbest lgbm's error=0.3470,\tbest lgbm's error=0.3470\n",
      "[flaml.automl: 07-06 10:20:36] {1012} INFO - iteration 68, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:42] {1160} INFO -  at 35.3s,\tbest lgbm's error=0.3321,\tbest lgbm's error=0.3321\n",
      "[flaml.automl: 07-06 10:20:42] {1012} INFO - iteration 69, current learner lrl1\n",
      "No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'.\n",
      "/Users/qingyun/miniconda3/envs/py38/lib/python3.8/site-packages/sklearn/linear_model/_sag.py:328: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
      "  warnings.warn(\"The max_iter was reached which means \"\n",
      "[flaml.automl: 07-06 10:20:42] {1160} INFO -  at 35.5s,\tbest lrl1's error=0.4338,\tbest lgbm's error=0.3321\n",
      "[flaml.automl: 07-06 10:20:42] {1012} INFO - iteration 70, current learner lrl1\n",
      "/Users/qingyun/miniconda3/envs/py38/lib/python3.8/site-packages/sklearn/linear_model/_sag.py:328: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
      "  warnings.warn(\"The max_iter was reached which means \"\n",
      "[flaml.automl: 07-06 10:20:42] {1160} INFO -  at 35.6s,\tbest lrl1's error=0.4338,\tbest lgbm's error=0.3321\n",
      "[flaml.automl: 07-06 10:20:42] {1012} INFO - iteration 71, current learner lrl1\n",
      "/Users/qingyun/miniconda3/envs/py38/lib/python3.8/site-packages/sklearn/linear_model/_sag.py:328: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
      "  warnings.warn(\"The max_iter was reached which means \"\n",
      "[flaml.automl: 07-06 10:20:42] {1160} INFO -  at 35.8s,\tbest lrl1's error=0.4338,\tbest lgbm's error=0.3321\n",
      "[flaml.automl: 07-06 10:20:42] {1012} INFO - iteration 72, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:46] {1160} INFO -  at 40.1s,\tbest lgbm's error=0.3321,\tbest lgbm's error=0.3321\n",
      "[flaml.automl: 07-06 10:20:46] {1012} INFO - iteration 73, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:55] {1160} INFO -  at 48.6s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:20:55] {1012} INFO - iteration 74, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:20:55] {1160} INFO -  at 48.8s,\tbest extra_tree's error=0.3875,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:20:55] {1012} INFO - iteration 75, current learner lgbm\n",
      "[flaml.automl: 07-06 10:20:58] {1160} INFO -  at 51.1s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:20:58] {1012} INFO - iteration 76, current learner catboost\n",
      "[flaml.automl: 07-06 10:20:58] {1160} INFO -  at 52.0s,\tbest catboost's error=0.3614,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:20:58] {1012} INFO - iteration 77, current learner lgbm\n",
      "[flaml.automl: 07-06 10:21:34] {1160} INFO -  at 88.0s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:34] {1012} INFO - iteration 78, current learner lgbm\n",
      "[flaml.automl: 07-06 10:21:43] {1160} INFO -  at 96.5s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:43] {1012} INFO - iteration 79, current learner catboost\n",
      "[flaml.automl: 07-06 10:21:44] {1160} INFO -  at 97.3s,\tbest catboost's error=0.3550,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:44] {1012} INFO - iteration 80, current learner catboost\n",
      "[flaml.automl: 07-06 10:21:48] {1160} INFO -  at 101.7s,\tbest catboost's error=0.3550,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:48] {1012} INFO - iteration 81, current learner lgbm\n",
      "[flaml.automl: 07-06 10:21:54] {1160} INFO -  at 107.7s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:54] {1012} INFO - iteration 82, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:21:54] {1160} INFO -  at 108.0s,\tbest extra_tree's error=0.3875,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:54] {1012} INFO - iteration 83, current learner rf\n",
      "[flaml.automl: 07-06 10:21:54] {1160} INFO -  at 108.0s,\tbest rf's error=0.4019,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:54] {1012} INFO - iteration 84, current learner catboost\n",
      "[flaml.automl: 07-06 10:21:58] {1160} INFO -  at 111.4s,\tbest catboost's error=0.3488,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:21:58] {1012} INFO - iteration 85, current learner catboost\n",
      "[flaml.automl: 07-06 10:22:39] {1160} INFO -  at 152.8s,\tbest catboost's error=0.3488,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:39] {1012} INFO - iteration 86, current learner catboost\n",
      "[flaml.automl: 07-06 10:22:44] {1160} INFO -  at 157.5s,\tbest catboost's error=0.3472,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:44] {1012} INFO - iteration 87, current learner rf\n",
      "[flaml.automl: 07-06 10:22:44] {1160} INFO -  at 157.7s,\tbest rf's error=0.4019,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:44] {1012} INFO - iteration 88, current learner rf\n",
      "[flaml.automl: 07-06 10:22:45] {1160} INFO -  at 158.1s,\tbest rf's error=0.3922,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:45] {1012} INFO - iteration 89, current learner rf\n",
      "[flaml.automl: 07-06 10:22:45] {1160} INFO -  at 158.5s,\tbest rf's error=0.3922,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:45] {1012} INFO - iteration 90, current learner rf\n",
      "[flaml.automl: 07-06 10:22:45] {1160} INFO -  at 158.9s,\tbest rf's error=0.3922,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:45] {1012} INFO - iteration 91, current learner rf\n",
      "[flaml.automl: 07-06 10:22:46] {1160} INFO -  at 159.6s,\tbest rf's error=0.3851,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:46] {1012} INFO - iteration 92, current learner rf\n",
      "[flaml.automl: 07-06 10:22:46] {1160} INFO -  at 159.9s,\tbest rf's error=0.3851,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:46] {1012} INFO - iteration 93, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:22:46] {1160} INFO -  at 160.0s,\tbest extra_tree's error=0.3875,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:46] {1012} INFO - iteration 94, current learner rf\n",
      "[flaml.automl: 07-06 10:22:47] {1160} INFO -  at 160.5s,\tbest rf's error=0.3851,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:47] {1012} INFO - iteration 95, current learner rf\n",
      "[flaml.automl: 07-06 10:22:48] {1160} INFO -  at 161.2s,\tbest rf's error=0.3844,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:22:48] {1012} INFO - iteration 96, current learner lgbm\n",
      "[flaml.automl: 07-06 10:23:04] {1160} INFO -  at 178.0s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:04] {1012} INFO - iteration 97, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:05] {1160} INFO -  at 178.4s,\tbest extra_tree's error=0.3860,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:05] {1012} INFO - iteration 98, current learner rf\n",
      "[flaml.automl: 07-06 10:23:05] {1160} INFO -  at 178.8s,\tbest rf's error=0.3844,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:05] {1012} INFO - iteration 99, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:06] {1160} INFO -  at 179.6s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:06] {1012} INFO - iteration 100, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:06] {1160} INFO -  at 180.0s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:06] {1012} INFO - iteration 101, current learner lgbm\n",
      "[flaml.automl: 07-06 10:23:11] {1160} INFO -  at 184.2s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:11] {1012} INFO - iteration 102, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:12] {1160} INFO -  at 185.7s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:12] {1012} INFO - iteration 103, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:13] {1160} INFO -  at 186.2s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:13] {1012} INFO - iteration 104, current learner lgbm\n",
      "[flaml.automl: 07-06 10:23:55] {1160} INFO -  at 228.7s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:55] {1012} INFO - iteration 105, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:23:55] {1160} INFO -  at 229.1s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:55] {1012} INFO - iteration 106, current learner lgbm\n",
      "[flaml.automl: 07-06 10:23:58] {1160} INFO -  at 231.2s,\tbest lgbm's error=0.3281,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:23:58] {1012} INFO - iteration 107, current learner catboost\n",
      "[flaml.automl: 07-06 10:24:00] {1160} INFO -  at 233.9s,\tbest catboost's error=0.3472,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:24:00] {1012} INFO - iteration 108, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:24:01] {1160} INFO -  at 234.1s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3281\n",
      "[flaml.automl: 07-06 10:24:01] {1012} INFO - iteration 109, current learner lgbm\n",
      "[flaml.automl: 07-06 10:24:09] {1160} INFO -  at 242.2s,\tbest lgbm's error=0.3261,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:09] {1012} INFO - iteration 110, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:24:09] {1160} INFO -  at 243.0s,\tbest extra_tree's error=0.3824,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:09] {1012} INFO - iteration 111, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:24:12] {1160} INFO -  at 245.6s,\tbest extra_tree's error=0.3813,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:12] {1012} INFO - iteration 112, current learner lgbm\n",
      "[flaml.automl: 07-06 10:24:21] {1160} INFO -  at 254.3s,\tbest lgbm's error=0.3261,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:21] {1012} INFO - iteration 113, current learner extra_tree\n",
      "[flaml.automl: 07-06 10:24:25] {1160} INFO -  at 258.7s,\tbest extra_tree's error=0.3813,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:25] {1012} INFO - iteration 114, current learner rf\n",
      "[flaml.automl: 07-06 10:24:26] {1160} INFO -  at 259.7s,\tbest rf's error=0.3821,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:26] {1012} INFO - iteration 115, current learner lrl1\n",
      "/Users/qingyun/miniconda3/envs/py38/lib/python3.8/site-packages/sklearn/linear_model/_sag.py:328: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
      "  warnings.warn(\"The max_iter was reached which means \"\n",
      "[flaml.automl: 07-06 10:24:26] {1160} INFO -  at 259.9s,\tbest lrl1's error=0.4338,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:26] {1012} INFO - iteration 116, current learner lgbm\n",
      "[flaml.automl: 07-06 10:24:36] {1160} INFO -  at 269.8s,\tbest lgbm's error=0.3261,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:36] {1012} INFO - iteration 117, current learner lgbm\n",
      "[flaml.automl: 07-06 10:24:42] {1160} INFO -  at 276.0s,\tbest lgbm's error=0.3261,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:42] {1012} INFO - iteration 118, current learner catboost\n",
      "[flaml.automl: 07-06 10:24:46] {1160} INFO -  at 279.5s,\tbest catboost's error=0.3472,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:46] {1012} INFO - iteration 119, current learner rf\n",
      "[flaml.automl: 07-06 10:24:47] {1160} INFO -  at 280.3s,\tbest rf's error=0.3815,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:47] {1012} INFO - iteration 120, current learner lgbm\n",
      "[flaml.automl: 07-06 10:24:51] {1160} INFO -  at 284.2s,\tbest lgbm's error=0.3261,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:24:58] {1183} INFO - retrain lgbm for 7.5s\n",
      "[flaml.automl: 07-06 10:24:58] {1012} INFO - iteration 121, current learner rf\n",
      "[flaml.automl: 07-06 10:24:59] {1160} INFO -  at 292.4s,\tbest rf's error=0.3815,\tbest lgbm's error=0.3261\n",
      "[flaml.automl: 07-06 10:25:06] {1183} INFO - retrain rf for 7.5s\n",
      "[flaml.automl: 07-06 10:25:06] {1206} INFO - selected model: LGBMClassifier(colsample_bytree=0.7264845266978395,\n",
      "               learning_rate=0.19101023272120005, max_bin=256,\n",
      "               min_child_samples=38, n_estimators=96, num_leaves=1176,\n",
      "               objective='binary', reg_alpha=0.23464496750365973,\n",
      "               reg_lambda=381.05540209167094, subsample=0.8560685526719122)\n",
      "[flaml.automl: 07-06 10:25:06] {963} INFO - fit succeeded\n"
     ]
    }
   ],
   "source": [
    "'''The main flaml automl API'''\n",
    "automl.fit(X_train=X_train, y_train=y_train, **settings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Best model and metric"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Best ML leaner: lgbm\n",
      "Best hyperparmeter config: {'n_estimators': 96.0, 'num_leaves': 1176.0, 'min_child_samples': 38.0, 'learning_rate': 0.19101023272120005, 'subsample': 0.8560685526719122, 'log_max_bin': 9.0, 'colsample_bytree': 0.7264845266978395, 'reg_alpha': 0.23464496750365973, 'reg_lambda': 381.05540209167094, 'FLAML_sample_size': 364083}\n",
      "Best accuracy on validation data: 0.6739\n",
      "Training duration of best run: 8.084 s\n"
     ]
    }
   ],
   "source": [
    "''' retrieve best config and best learner'''\n",
    "print('Best ML leaner:', automl.best_estimator)\n",
    "print('Best hyperparmeter config:', automl.best_config)\n",
    "print('Best accuracy on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
    "print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LGBMClassifier(colsample_bytree=0.6204654035998071,\n",
       "               learning_rate=0.17783122919583272, max_bin=16,\n",
       "               min_child_samples=17, n_estimators=197, num_leaves=340,\n",
       "               objective='binary', reg_alpha=0.07967521254431058,\n",
       "               reg_lambda=6.332908973055842, subsample=0.8413048297641477)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "automl.model.estimator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' pickle and save the automl object '''\n",
    "import pickle\n",
    "with open('automl.pkl', 'wb') as f:\n",
    "    pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predicted labels [1 0 1 ... 1 0 0]\n",
      "True labels [0 0 0 ... 0 1 0]\n"
     ]
    }
   ],
   "source": [
    "''' compute predictions of testing dataset ''' \n",
    "y_pred = automl.predict(X_test)\n",
    "print('Predicted labels', y_pred)\n",
    "print('True labels', y_test)\n",
    "y_pred_proba = automl.predict_proba(X_test)[:,1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "accuracy = 0.6715957462586951\n",
      "roc_auc = 0.7253027586499301\n",
      "log_loss = 0.6034784793498795\n",
      "f1 = 0.5884386617100371\n"
     ]
    }
   ],
   "source": [
    "''' compute different metric values on testing dataset'''\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))\n",
    "print('roc_auc', '=', 1 - sklearn_metric_loss_score('roc_auc', y_pred_proba, y_test))\n",
    "print('log_loss', '=', sklearn_metric_loss_score('log_loss', y_pred_proba, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "See Section 4 for an accuracy comparison with default LightGBM and XGBoost.\n",
    "\n",
    "### Log history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Current Learner': 'lgbm', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0, 'FLAML_sample_size': 10000}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 3.8156120279609143, 'learning_rate': 0.03859136192132085, 'subsample': 1.0, 'colsample_bylevel': 0.8148474110627004, 'colsample_bytree': 0.9777234800442423, 'reg_alpha': 0.0009765625, 'reg_lambda': 5.525802807180917, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 3.8156120279609143, 'learning_rate': 0.03859136192132085, 'subsample': 1.0, 'colsample_bylevel': 0.8148474110627004, 'colsample_bytree': 0.9777234800442423, 'reg_alpha': 0.0009765625, 'reg_lambda': 5.525802807180917, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 0.9999999999999981, 'learning_rate': 0.09999999999999995, 'subsample': 0.9266743941610592, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.9999999999999992, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 0.9999999999999981, 'learning_rate': 0.09999999999999995, 'subsample': 0.9266743941610592, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.9999999999999992, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 7.108570598095146, 'learning_rate': 0.3879619372390862, 'subsample': 0.8513627344387318, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.946138073111236, 'reg_alpha': 0.0018311776973217071, 'reg_lambda': 1.5417906668008217, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 7.108570598095146, 'learning_rate': 0.3879619372390862, 'subsample': 0.8513627344387318, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.946138073111236, 'reg_alpha': 0.0018311776973217071, 'reg_lambda': 1.5417906668008217, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 8.0, 'min_child_weight': 0.9999999999999981, 'learning_rate': 0.09999999999999995, 'subsample': 0.9266743941610592, 'colsample_bylevel': 0.9168331919232143, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.9999999999999984, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 8.0, 'min_child_weight': 0.9999999999999981, 'learning_rate': 0.09999999999999995, 'subsample': 0.9266743941610592, 'colsample_bylevel': 0.9168331919232143, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.9999999999999984, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 12.0, 'max_leaves': 8.0, 'min_child_weight': 3.1718521304832716, 'learning_rate': 0.18850082505120708, 'subsample': 0.9647550813352507, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0010352743615901622, 'reg_lambda': 0.4380234559597813, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 12.0, 'max_leaves': 8.0, 'min_child_weight': 3.1718521304832716, 'learning_rate': 0.18850082505120708, 'subsample': 0.9647550813352507, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0010352743615901622, 'reg_lambda': 0.4380234559597813, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 10000, 'Current Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 6.0, 'min_child_weight': 6.460451502502143, 'learning_rate': 0.4839966785164543, 'subsample': 1.0, 'colsample_bylevel': 0.8811171114303163, 'colsample_bytree': 0.8499027725496043, 'reg_alpha': 0.0016804960453779686, 'reg_lambda': 1.9570976003429221, 'FLAML_sample_size': 10000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 6.0, 'min_child_weight': 6.460451502502143, 'learning_rate': 0.4839966785164543, 'subsample': 1.0, 'colsample_bylevel': 0.8811171114303163, 'colsample_bytree': 0.8499027725496043, 'reg_alpha': 0.0016804960453779686, 'reg_lambda': 1.9570976003429221, 'FLAML_sample_size': 10000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 40000, 'Current Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 6.0, 'min_child_weight': 6.460451502502143, 'learning_rate': 0.4839966785164543, 'subsample': 1.0, 'colsample_bylevel': 0.8811171114303163, 'colsample_bytree': 0.8499027725496043, 'reg_alpha': 0.0016804960453779686, 'reg_lambda': 1.9570976003429221, 'FLAML_sample_size': 40000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 23.0, 'max_leaves': 6.0, 'min_child_weight': 6.460451502502143, 'learning_rate': 0.4839966785164543, 'subsample': 1.0, 'colsample_bylevel': 0.8811171114303163, 'colsample_bytree': 0.8499027725496043, 'reg_alpha': 0.0016804960453779686, 'reg_lambda': 1.9570976003429221, 'FLAML_sample_size': 40000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 40000, 'Current Hyper-parameters': {'n_estimators': 74.0, 'max_leaves': 4.0, 'min_child_weight': 7.678451859748732, 'learning_rate': 0.17743258768982648, 'subsample': 1.0, 'colsample_bylevel': 0.6993908476086765, 'colsample_bytree': 0.804982542436943, 'reg_alpha': 0.0009765625, 'reg_lambda': 3.547311998768567, 'FLAML_sample_size': 40000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 74.0, 'max_leaves': 4.0, 'min_child_weight': 7.678451859748732, 'learning_rate': 0.17743258768982648, 'subsample': 1.0, 'colsample_bylevel': 0.6993908476086765, 'colsample_bytree': 0.804982542436943, 'reg_alpha': 0.0009765625, 'reg_lambda': 3.547311998768567, 'FLAML_sample_size': 40000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 40000, 'Current Hyper-parameters': {'n_estimators': 135.0, 'max_leaves': 7.0, 'min_child_weight': 1.1024151666996367, 'learning_rate': 0.29597808772418305, 'subsample': 1.0, 'colsample_bylevel': 0.508550359279992, 'colsample_bytree': 0.7208090706891741, 'reg_alpha': 0.0017607866203119683, 'reg_lambda': 1.8488863473486097, 'FLAML_sample_size': 40000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 135.0, 'max_leaves': 7.0, 'min_child_weight': 1.1024151666996367, 'learning_rate': 0.29597808772418305, 'subsample': 1.0, 'colsample_bylevel': 0.508550359279992, 'colsample_bytree': 0.7208090706891741, 'reg_alpha': 0.0017607866203119683, 'reg_lambda': 1.8488863473486097, 'FLAML_sample_size': 40000}}\n",
      "{'Current Learner': 'xgboost', 'Current Sample': 40000, 'Current Hyper-parameters': {'n_estimators': 292.0, 'max_leaves': 16.0, 'min_child_weight': 0.8072004842817196, 'learning_rate': 0.09228694613650908, 'subsample': 0.8895588746662894, 'colsample_bylevel': 0.35630670144162413, 'colsample_bytree': 0.6863451794740817, 'reg_alpha': 0.0027488949929569983, 'reg_lambda': 0.7489028833779001, 'FLAML_sample_size': 40000}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 292.0, 'max_leaves': 16.0, 'min_child_weight': 0.8072004842817196, 'learning_rate': 0.09228694613650908, 'subsample': 0.8895588746662894, 'colsample_bylevel': 0.35630670144162413, 'colsample_bytree': 0.6863451794740817, 'reg_alpha': 0.0027488949929569983, 'reg_lambda': 0.7489028833779001, 'FLAML_sample_size': 40000}}\n",
      "{'Current Learner': 'lgbm', 'Current Sample': 364083, 'Current Hyper-parameters': {'n_estimators': 29.0, 'num_leaves': 30.0, 'min_child_samples': 27.0, 'learning_rate': 0.3345600006903613, 'subsample': 1.0, 'log_max_bin': 6.0, 'colsample_bytree': 0.6138481769580465, 'reg_alpha': 0.02608844295136239, 'reg_lambda': 4.068656226566239, 'FLAML_sample_size': 364083}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 29.0, 'num_leaves': 30.0, 'min_child_samples': 27.0, 'learning_rate': 0.3345600006903613, 'subsample': 1.0, 'log_max_bin': 6.0, 'colsample_bytree': 0.6138481769580465, 'reg_alpha': 0.02608844295136239, 'reg_lambda': 4.068656226566239, 'FLAML_sample_size': 364083}}\n",
      "{'Current Learner': 'lgbm', 'Current Sample': 364083, 'Current Hyper-parameters': {'n_estimators': 32.0, 'num_leaves': 66.0, 'min_child_samples': 30.0, 'learning_rate': 0.12647892799791985, 'subsample': 0.9860465287537004, 'log_max_bin': 6.0, 'colsample_bytree': 0.6645176750515542, 'reg_alpha': 0.0018225057315840252, 'reg_lambda': 30.9118880488899, 'FLAML_sample_size': 364083}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 32.0, 'num_leaves': 66.0, 'min_child_samples': 30.0, 'learning_rate': 0.12647892799791985, 'subsample': 0.9860465287537004, 'log_max_bin': 6.0, 'colsample_bytree': 0.6645176750515542, 'reg_alpha': 0.0018225057315840252, 'reg_lambda': 30.9118880488899, 'FLAML_sample_size': 364083}}\n",
      "{'Current Learner': 'lgbm', 'Current Sample': 364083, 'Current Hyper-parameters': {'n_estimators': 125.0, 'num_leaves': 186.0, 'min_child_samples': 50.0, 'learning_rate': 0.0951684364825494, 'subsample': 1.0, 'log_max_bin': 7.0, 'colsample_bytree': 0.6606135030668829, 'reg_alpha': 0.01077083294762061, 'reg_lambda': 74.25759126075202, 'FLAML_sample_size': 364083}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 125.0, 'num_leaves': 186.0, 'min_child_samples': 50.0, 'learning_rate': 0.0951684364825494, 'subsample': 1.0, 'log_max_bin': 7.0, 'colsample_bytree': 0.6606135030668829, 'reg_alpha': 0.01077083294762061, 'reg_lambda': 74.25759126075202, 'FLAML_sample_size': 364083}}\n",
      "{'Current Learner': 'lgbm', 'Current Sample': 364083, 'Current Hyper-parameters': {'n_estimators': 164.0, 'num_leaves': 304.0, 'min_child_samples': 75.0, 'learning_rate': 0.21886405778268478, 'subsample': 0.9048064340763577, 'log_max_bin': 9.0, 'colsample_bytree': 0.632220807242231, 'reg_alpha': 0.03154355161993957, 'reg_lambda': 190.9985711118577, 'FLAML_sample_size': 364083}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 164.0, 'num_leaves': 304.0, 'min_child_samples': 75.0, 'learning_rate': 0.21886405778268478, 'subsample': 0.9048064340763577, 'log_max_bin': 9.0, 'colsample_bytree': 0.632220807242231, 'reg_alpha': 0.03154355161993957, 'reg_lambda': 190.9985711118577, 'FLAML_sample_size': 364083}}\n"
     ]
    }
   ],
   "source": [
    "from flaml.data import get_output_from_log\n",
    "time_history, best_valid_loss_history, valid_loss_history, config_history, train_loss_history = \\\n",
    "    get_output_from_log(filename=settings['log_file_name'], time_budget=60)\n",
    "\n",
    "for config in config_history:\n",
    "    print(config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nO3df5xWdZ338dfbEXWscDS0GwYVvQUKV4Miy8xNvWuh7hIyI7XdLbuT2tbuWrdxZdvKtbW1m2rveixbN3aX6R2msjhRmZObpq2pMIaBjGEE/mAwQWTSdBQYPvcf51x4uDgzcwFzrmvmut7Px2MeM+d7vuecz8FxPtf3+z3f71FEYGZmVu6AWgdgZmbDkxOEmZnlcoIwM7NcThBmZpbLCcLMzHI5QZiZWS4nCLN9IOl0SWtqHYdZkZwgbMSR9Iikt9Uyhoj4RURMLur8kmZIukvSs5I2S7pT0tlFXc8sjxOEWQ5JTTW89rnATcC1wHjgVcDngHfvw7kkyf+f2z7xL47VDUkHSLpM0u8kbZF0o6QjMvtvkvR7SX9IP52fmNl3jaRvSLpF0nPAmWlL5dOSVqbH3CDpkLT+GZI2ZI7vt266/1JJT0jaKOkjkkLSCTn3IOCrwBci4lsR8YeI2BkRd0bERWmdyyX9v8wxE9LzHZhu/1zSlZLuBp4H2iR1ll3nbyQtTX8+WNKXJT0m6UlJ35TUvJ//OawOOEFYPfkEMBt4KzAO2AosyOz/CTAROAr4FfC9suMvAK4EXgH8Z1o2B5gJHAecDHxogOvn1pU0E7gEeBtwAnDGAOeYDBwNLB6gTiX+AphLci/fBCZLmpjZfwGwKP35KmASMDWNr5WkxWINzgnC6snHgM9ExIaIeBG4HDi39Mk6Ir4dEc9m9r1W0mGZ438QEXenn9hfSMu+HhEbI+Jp4Ickf0T701/dOcB3ImJ1RDyfXrs/r0y/P1HpTffjmvR6OyLiD8APgPMB0kTxamBp2mKZC/xNRDwdEc8CXwTO28/rWx1wgrB6cixws6QeST3AQ0Af8CpJTZKuSrufngEeSY8Zkzn+8Zxz/j7z8/PAywe4fn91x5WdO+86JVvS72MHqFOJ8mssIk0QJK2H9jRZHQkcCtyf+Xe7NS23BucEYfXkceAdEdGS+TokIrpJ/ijOIunmOQyYkB6jzPFFLW38BMlgc8nRA9RdQ3If7x2gznMkf9RL/ktOnfJ7uQ04UtJUkkRR6l56CugFTsz8mx0WEQMlQmsQThA2Uo2SdEjm60CSvvYrJR0LIOlISbPS+q8AXiT5hH4oSTdKtdwIXCjpNZIOBT7bX8VI1t+/BPispAsljU4H398iaWFa7QHgTyUdk3aRzRssgIjYTvJk1HzgCJKEQUTsBK4G/kXSUQCSWiXN2Oe7tbrhBGEj1S0kn3xLX5cDXwOWAj+V9CxwL/DGtP61wKNAN9CV7quKiPgJ8HXgDmBt5tov9lN/MfB+4MPARuBJ4J9IxhGIiNuAG4CVwP3AjyoMZRFJC+qmiNiRKf+7Ulxp99t/kAyWW4OTXxhkVl2SXgM8CBxc9ofabFhxC8KsCiS9J51vcDjwJeCHTg423DlBmFXHR4FNwO9Inqz6q9qGYzY4dzGZmVkutyDMzCzXgbUOYKiMGTMmJkyYUOswzMxGlPvvv/+piMidGFk3CWLChAl0dnYOXtHMzHaR9Gh/+9zFZGZmuZwgzMwslxOEmZnlcoIwM7NcThBmZparbp5iMjNrNO0rupnfsYaNPb2Ma2mmbcZkZk9rHbLzO0GYmY1A7Su6mbdkFb3b+wDo7ull3pJVAEOWJNzFZGY2As3vWLMrOZT0bu9jfseaIbuGE4SZ2Qi0sad3r8r3hROEmdkINK6lea/K94UThJnZCNQ2YzLNo5p2K2se1UTbjKF7GaAHqc3MRqDSQPSli1eyrW8nrX6KyczMSmZPa+X6ZY8BcMNHTx3y8xfaxSRppqQ1ktZKuqyfOnMkdUlaLWlRWnampAcyXy9Iml1krGZmtrvCWhCSmoAFwNuBDcBySUsjoitTZyIwDzgtIrZKOgogIu4ApqZ1jgDWAj8tKlYzM9tTkS2IU4C1EbEuIrYB3wdmldW5CFgQEVsBImJTznnOBX4SEc8XGKuZmZUpMkG0Ao9ntjekZVmTgEmS7pZ0r6SZOec5D7g+7wKS5krqlNS5efPmIQnazMwStX7M9UBgInAGcD5wtaSW0k5JY4GTgI68gyNiYURMj4jpRx6Z+8Y8MzPbR0UmiG7g6Mz2+LQsawOwNCK2R8R64GGShFEyB7g5IrYXGKeZmeUoMkEsByZKOk7SQSRdRUvL6rSTtB6QNIaky2ldZv/59NO9ZGZmxSosQUTEDuBiku6hh4AbI2K1pCsknZ1W6wC2SOoC7gDaImILgKQJJC2QO4uK0czM+lfoRLmIuAW4pazsc5mfA7gk/So/9hH2HNQ2sxGs6PcX2NDyTGozq4pqvL/AhpYThJlVRX/vL7h08cpdy0XY3ut64hmmjB1dyLlr/ZirmTWI/t5TsK1vZ5UjqS9Txo5m1tRiWmBuQZhZVYxraaY7J0m0tjQXstCc7T+3IMysKqrx/gIbWm5BmFlVVOP9BTa0nCDMrGqKfn+BDS13MZmZWS4nCDMzy+UEYWZmuZwgzMwslxOEmZnlcoIwM7NcThBmZpbLCcLMzHI5QZiZWS4nCDMzy+UEYWZmuZwgzMwslxOEmZnlcoIwM7NcThBmZpbLCcLMzHI5QZiZWS4nCDMzy+UEYWZmuZwgzMwslxOEmZnlcoIwM7NcThBmZpar0AQhaaakNZLWSrqsnzpzJHVJWi1pUab8GEk/lfRQun9CkbGamdnuDizqxJKagAXA24ENwHJJSyOiK1NnIjAPOC0itko6KnOKa4ErI+I2SS8HdhYVq5mZ7anIFsQpwNqIWBcR24DvA7PK6lwELIiIrQARsQlA0hTgwIi4LS3/Y0Q8X2CsZmZWpsgE0Qo8ntnekJZlTQImSbpb0r2SZmbKeyQtkbRC0vy0RWJmZlVS60HqA4GJwBnA+cDVklrS8tOBTwNvAI4HPlR+sKS5kjoldW7evLlaMZuZNYQiE0Q3cHRme3xalrUBWBoR2yNiPfAwScLYADyQdk/tANqB15VfICIWRsT0iJh+5JFHFnITZmaNqsgEsRyYKOk4SQcB5wFLy+q0k7QekDSGpGtpXXpsi6TSX/2zgC7MzKxqBk0Qkl65LydOP/lfDHQADwE3RsRqSVdIOjut1gFskdQF3AG0RcSWiOgj6V76maRVgICr9yUOMzPbN5U85nqvpAeA7wA/iYio9OQRcQtwS1nZ5zI/B3BJ+lV+7G3AyZVey8zMhlYlXUyTgIXAXwC/lfRFSZOKDcvMzGpt0AQRidsi4nySeQsfBJZJulPSqYVHaGZmNTFoF1M6BvHnJC2IJ4FPkAw2TwVuAo4rMkAzM6uNSsYg7gGuA2ZHxIZMeaekbxYTlpmZ1VolCWJyfwPTEfGlIY7HzMyGiUoGqX+azm4GQNLhkjoKjMnMzIaBShLEkRHRU9pIF9Y7aoD6ZmZWBypJEH2SjiltSDoWqHguhJmZjUyVjEF8BvhPSXeSzGg+HZhbaFRmZlZzgyaIiLhV0uuAN6VFn4qIp4oNy8zMaq3SN8r1AZuAQ4ApkoiIu4oLy8zMaq2SiXIfAT5Jslz3AyQtiXtIVlg1M7M6Vckg9SdJXtrzaEScCUwDegY+xMzMRrpKEsQLEfECgKSDI+I3wORiwzIzs1qrZAxiQzpRrh24TdJW4NFiwzIzs1qr5Cmm96Q/Xi7pDuAw4NZCozIzs5obMEFIagJWR8SrASLizqpEZWZmNTfgGET66s812ZnUZmbWGCoZgzgcWC1pGfBcqTAizu7/EDMzG+kqSRCfLTwKMzMbdioZpPa4g9l+al/RzfyONWzs6WVcSzNtMyYze1prrcMyG1AlM6mf5aXVWw8CRgHPRcToIgMzqxftK7qZt2QVvdv7AOju6WXeklUAThI2rFXSgnhF6WdJAmbx0sJ9ZnVnqD/tz+9Ysys5lPRu7+P
      "image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n  \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Created with matplotlib (https://matplotlib.org/) -->\n<svg height=\"277.314375pt\" version=\"1.1\" viewBox=\"0 0 392.14375 277.314375\" width=\"392.14375pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n <defs>\n  <style type=\"text/css\">\n*{stroke-linecap:butt;stroke-linejoin:round;}\n  </style>\n </defs>\n <g id=\"figure_1\">\n  <g id=\"patch_1\">\n   <path d=\"M 0 277.314375 \nL 392.14375 277.314375 \nL 392.14375 0 \nL 0 0 \nz\n\" style=\"fill:none;\"/>\n  </g>\n  <g id=\"axes_1\">\n   <g id=\"patch_2\">\n    <path d=\"M 50.14375 239.758125 \nL 384.94375 239.758125 \nL 384.94375 22.318125 \nL 50.14375 22.318125 \nz\n\" style=\"fill:#ffffff;\"/>\n   </g>\n   <g id=\"PathCollection_1\">\n    <defs>\n     <path d=\"M 0 3 \nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \nC 2.683901 1.55874 3 0.795609 3 0 \nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \nC 1.55874 -2.683901 0.795609 -3 0 -3 \nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \nC -2.683901 -1.55874 -3 -0.795609 -3 0 \nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \nC -1.55874 2.683901 -0.795609 3 0 3 \nz\n\" id=\"m1368645b99\" style=\"stroke:#1f77b4;\"/>\n    </defs>\n    <g clip-path=\"url(#p96c4cf00e0)\">\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"65.361932\" xlink:href=\"#m1368645b99\" y=\"229.874489\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"66.780914\" xlink:href=\"#m1368645b99\" y=\"225.534369\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"67.594589\" xlink:href=\"#m1368645b99\" y=\"225.337091\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"67.854304\" xlink:href=\"#m1368645b99\" y=\"222.18064\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"68.12411\" xlink:href=\"#m1368645b99\" y=\"196.435839\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"68.837626\" xlink:href=\"#m1368645b99\" y=\"166.153639\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"70.904459\" xlink:href=\"#m1368645b99\" y=\"154.810145\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"77.501877\" xlink:href=\"#m1368645b99\" y=\"149.483634\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"81.886801\" xlink:href=\"#m1368645b99\" y=\"146.327183\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"87.896391\" xlink:href=\"#m1368645b99\" y=\"121.272856\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"113.654029\" xlink:href=\"#m1368645b99\" y=\"121.075577\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"198.56935\" xlink:href=\"#m1368645b99\" y=\"110.028\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"219.244\" xlink:href=\"#m1368645b99\" y=\"107.562023\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"285.823096\" xlink:href=\"#m1368645b99\" y=\"48.082654\"/>\n     <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"369.725568\" xlink:href=\"#m1368645b99\" y=\"32.201761\"/>\n    </g>\n   </g>\n   <g id=\"matplotlib.axis_1\">\n    <g id=\"xtick_1\">\n     <g id=\"line2d_1\">\n      <defs>\n       <path d=\"M 0 0 \nL 0 3.5 \n\" id=\"m1316acdbfe\" style=\"stroke:#000000;stroke-width:0.8;\"/>\n      </defs>\n      <g>\n       <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"62.99569\" xlink:href=\"#m1316acdbfe\" y=\"239.758125\"/>\n      </g>\n     </g>\n     <g id=\"text_1\">\n      <!-- 0 -->\n      <defs>\n       <path d=\"M 31.78125 66.40625 \nQ 24.171875 66.40625 20.328125 58.90625 \nQ 16.5 51.421875 16.5 36.375 \nQ 16.5 21.390625 20.328125 13.890625 \nQ 24.171875 6.390625 31.78125 6.390625 \nQ 39.453125 6.390625 43.28125 13.890625 \nQ 47.125 21.390625 47.125 36.375 \nQ 47.125 51.421875 43.28125 58.90625 \nQ 39.453125 66.40625 31.78125 66.40625 \nz\nM 31.78125 74.21875 \nQ 44.046875 74.21875 50.515625 64.515625 \nQ 56.984375 54.828125 56.984375 36.375 \nQ 56.984375 17.96875 50.515625 8.265625 \nQ 44.046875 -1.421875 31.7
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "plt.title('Learning Curve')\n",
    "plt.xlabel('Wall Clock Time (s)')\n",
    "plt.ylabel('Validation Accuracy')\n",
    "plt.scatter(time_history, 1 - np.array(valid_loss_history))\n",
    "plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## 3. Customized Learner"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Some experienced automl users may have a preferred model to tune or may already have a reasonably by-hand-tuned model before launching the automl experiment. They need to select optimal configurations for the customized model mixed with standard built-in learners. \n",
    "\n",
    "FLAML can easily incorporate customized/new learners (preferably with sklearn API) provided by users in a real-time manner, as demonstrated below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Example of Regularized Greedy Forest\n",
    "\n",
    "[Regularized Greedy Forest](https://arxiv.org/abs/1109.0887) (RGF) is a machine learning method currently not included in FLAML. The RGF has many tuning parameters, the most critical of which are: `[max_leaf, n_iter, n_tree_search, opt_interval, min_samples_leaf]`. To run a customized/new learner, the user needs to provide the following information:\n",
    "* an implementation of the customized/new learner\n",
    "* a list of hyperparameter names and types\n",
    "* rough ranges of hyperparameters (i.e., upper/lower bounds)\n",
    "* choose initial value corresponding to low cost for cost-related hyperparameters (e.g., initial value for max_leaf and n_iter should be small)\n",
    "\n",
    "In this example, the above information for RGF is wrapped in a python class called *MyRegularizedGreedyForest* that exposes the hyperparameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "''' SKLearnEstimator is the super class for a sklearn learner '''\n",
    "from flaml.model import SKLearnEstimator\n",
    "from flaml import tune\n",
    "from rgf.sklearn import RGFClassifier, RGFRegressor\n",
    "\n",
    "\n",
    "class MyRegularizedGreedyForest(SKLearnEstimator):\n",
    "\n",
    "\n",
    "    def __init__(self, task='binary:logistic', n_jobs=1, **params):\n",
    "        '''Constructor\n",
    "        \n",
    "        Args:\n",
    "            task: A string of the task type, one of\n",
    "                'binary:logistic', 'multi:softmax', 'regression'\n",
    "            n_jobs: An integer of the number of parallel threads\n",
    "            params: A dictionary of the hyperparameter names and values\n",
    "        '''\n",
    "\n",
    "        super().__init__(task, **params)\n",
    "\n",
    "        '''task=regression for RGFRegressor; \n",
    "        binary:logistic and multiclass:softmax for RGFClassifier'''\n",
    "        if 'regression' in task:\n",
    "            self.estimator_class = RGFRegressor\n",
    "        else:\n",
    "            self.estimator_class = RGFClassifier\n",
    "\n",
    "        # convert to int for integer hyperparameters\n",
    "        self.params = {\n",
    "            \"n_jobs\": n_jobs,\n",
    "            'max_leaf': int(params['max_leaf']),\n",
    "            'n_iter': int(params['n_iter']),\n",
    "            'n_tree_search': int(params['n_tree_search']),\n",
    "            'opt_interval': int(params['opt_interval']),\n",
    "            'learning_rate': params['learning_rate'],\n",
    "            'min_samples_leaf': int(params['min_samples_leaf'])\n",
    "        }    \n",
    "\n",
    "    @classmethod\n",
    "    def search_space(cls, data_size, task):\n",
    "        '''[required method] search space\n",
    "\n",
    "        Returns:\n",
    "            A dictionary of the search space. \n",
    "            Each key is the name of a hyperparameter, and value is a dict with\n",
    "                its domain and init_value (optional), cat_hp_cost (optional) \n",
    "                e.g., \n",
    "                {'domain': tune.randint(lower=1, upper=10), 'init_value': 1}\n",
    "        '''\n",
    "        space = {        \n",
    "            'max_leaf': {'domain': tune.lograndint(lower=4, upper=data_size), 'init_value': 4, 'low_cost_init_value': 4},\n",
    "            'n_iter': {'domain': tune.lograndint(lower=1, upper=data_size), 'init_value': 1, 'low_cost_init_value': 1},\n",
    "            'n_tree_search': {'domain': tune.lograndint(lower=1, upper=32768), 'init_value': 1, 'low_cost_init_value': 1},\n",
    "            'opt_interval': {'domain': tune.lograndint(lower=1, upper=10000), 'init_value': 100},\n",
    "            'learning_rate': {'domain': tune.loguniform(lower=0.01, upper=20.0)},\n",
    "            'min_samples_leaf': {'domain': tune.lograndint(lower=1, upper=20), 'init_value': 20},\n",
    "        }\n",
    "        return space\n",
    "\n",
    "    @classmethod\n",
    "    def size(cls, config):\n",
    "        '''[optional method] memory size of the estimator in bytes\n",
    "        \n",
    "        Args:\n",
    "            config - the dict of the hyperparameter config\n",
    "\n",
    "        Returns:\n",
    "            A float of the memory size required by the estimator to train the\n",
    "            given config\n",
    "        '''\n",
    "        max_leaves = int(round(config['max_leaf']))\n",
    "        n_estimators = int(round(config['n_iter']))\n",
    "        return (max_leaves * 3 + (max_leaves - 1) * 4 + 1.0) * n_estimators * 8\n",
    "\n",
    "    @classmethod\n",
    "    def cost_relative2lgbm(cls):\n",
    "        '''[optional method] relative cost compared to lightgbm\n",
    "        '''\n",
    "        return 1.0\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Add Customized Learner and Run FLAML AutoML\n",
    "\n",
    "After adding RGF into the list of learners, we run automl by tuning hyperpameters of RGF as well as the default learners. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "automl = AutoML()\n",
    "automl.add_learner(learner_name='RGF', learner_class=MyRegularizedGreedyForest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[flaml.automl: 07-06 10:25:22] {908} INFO - Evaluation method: holdout\n",
      "[flaml.automl: 07-06 10:25:22] {607} INFO - Using StratifiedKFold\n",
      "[flaml.automl: 07-06 10:25:22] {929} INFO - Minimizing error metric: 1-accuracy\n",
      "[flaml.automl: 07-06 10:25:22] {948} INFO - List of ML learners in AutoML Run: ['RGF', 'lgbm', 'rf', 'xgboost']\n",
      "[flaml.automl: 07-06 10:25:22] {1012} INFO - iteration 0, current learner RGF\n",
      "/Users/qingyun/miniconda3/envs/py38/lib/python3.8/site-packages/rgf/utils.py:224: UserWarning: Cannot find FastRGF executable files. FastRGF estimators will be unavailable for usage.\n",
      "  warnings.warn(\"Cannot find FastRGF executable files. \"\n",
      "[flaml.automl: 07-06 10:25:23] {1160} INFO -  at 1.3s,\tbest RGF's error=0.3840,\tbest RGF's error=0.3840\n",
      "[flaml.automl: 07-06 10:25:23] {1012} INFO - iteration 1, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:24] {1160} INFO -  at 1.9s,\tbest RGF's error=0.3840,\tbest RGF's error=0.3840\n",
      "[flaml.automl: 07-06 10:25:24] {1012} INFO - iteration 2, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:24] {1160} INFO -  at 2.5s,\tbest RGF's error=0.3840,\tbest RGF's error=0.3840\n",
      "[flaml.automl: 07-06 10:25:24] {1012} INFO - iteration 3, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:24] {1160} INFO -  at 2.5s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:24] {1012} INFO - iteration 4, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.1s,\tbest RGF's error=0.3840,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 5, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.2s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 6, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.2s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 7, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.3s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 8, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.3s,\tbest lgbm's error=0.3777,\tbest lgbm's error=0.3777\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 9, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:25] {1160} INFO -  at 3.5s,\tbest lgbm's error=0.3765,\tbest lgbm's error=0.3765\n",
      "[flaml.automl: 07-06 10:25:25] {1012} INFO - iteration 10, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:26] {1160} INFO -  at 3.6s,\tbest lgbm's error=0.3765,\tbest lgbm's error=0.3765\n",
      "[flaml.automl: 07-06 10:25:26] {1012} INFO - iteration 11, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:26] {1160} INFO -  at 3.8s,\tbest lgbm's error=0.3752,\tbest lgbm's error=0.3752\n",
      "[flaml.automl: 07-06 10:25:26] {1012} INFO - iteration 12, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:26] {1160} INFO -  at 4.0s,\tbest lgbm's error=0.3587,\tbest lgbm's error=0.3587\n",
      "[flaml.automl: 07-06 10:25:26] {1012} INFO - iteration 13, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:26] {1160} INFO -  at 4.2s,\tbest lgbm's error=0.3587,\tbest lgbm's error=0.3587\n",
      "[flaml.automl: 07-06 10:25:26] {1012} INFO - iteration 14, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:26] {1160} INFO -  at 4.5s,\tbest lgbm's error=0.3519,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:26] {1012} INFO - iteration 15, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:27] {1160} INFO -  at 4.7s,\tbest lgbm's error=0.3519,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:27] {1012} INFO - iteration 16, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:27] {1160} INFO -  at 5.3s,\tbest RGF's error=0.3840,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:27] {1012} INFO - iteration 17, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:27] {1160} INFO -  at 5.5s,\tbest lgbm's error=0.3519,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:27] {1012} INFO - iteration 18, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:28] {1160} INFO -  at 5.9s,\tbest lgbm's error=0.3519,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:28] {1012} INFO - iteration 19, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:28] {1160} INFO -  at 6.2s,\tbest lgbm's error=0.3519,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:28] {1012} INFO - iteration 20, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:29] {1160} INFO -  at 6.8s,\tbest RGF's error=0.3762,\tbest lgbm's error=0.3519\n",
      "[flaml.automl: 07-06 10:25:29] {1012} INFO - iteration 21, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 8.6s,\tbest lgbm's error=0.3500,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 22, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 8.7s,\tbest xgboost's error=0.3787,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 23, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 8.7s,\tbest xgboost's error=0.3766,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 24, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 8.8s,\tbest xgboost's error=0.3765,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 25, current learner rf\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 8.9s,\tbest rf's error=0.4032,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 26, current learner rf\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 9.0s,\tbest rf's error=0.4032,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 27, current learner rf\n",
      "[flaml.automl: 07-06 10:25:31] {1160} INFO -  at 9.0s,\tbest rf's error=0.4028,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:31] {1012} INFO - iteration 28, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:32] {1160} INFO -  at 10.5s,\tbest lgbm's error=0.3500,\tbest lgbm's error=0.3500\n",
      "[flaml.automl: 07-06 10:25:32] {1012} INFO - iteration 29, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:35] {1160} INFO -  at 13.0s,\tbest lgbm's error=0.3440,\tbest lgbm's error=0.3440\n",
      "[flaml.automl: 07-06 10:25:35] {1012} INFO - iteration 30, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:36] {1160} INFO -  at 13.6s,\tbest RGF's error=0.3762,\tbest lgbm's error=0.3440\n",
      "[flaml.automl: 07-06 10:25:36] {1012} INFO - iteration 31, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:36] {1160} INFO -  at 14.2s,\tbest RGF's error=0.3762,\tbest lgbm's error=0.3440\n",
      "[flaml.automl: 07-06 10:25:36] {1012} INFO - iteration 32, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:38] {1160} INFO -  at 15.8s,\tbest lgbm's error=0.3440,\tbest lgbm's error=0.3440\n",
      "[flaml.automl: 07-06 10:25:38] {1012} INFO - iteration 33, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:42] {1160} INFO -  at 19.9s,\tbest lgbm's error=0.3374,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:42] {1012} INFO - iteration 34, current learner RGF\n",
      "[flaml.automl: 07-06 10:25:43] {1160} INFO -  at 20.8s,\tbest RGF's error=0.3759,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:43] {1012} INFO - iteration 35, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 22.7s,\tbest lgbm's error=0.3374,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 36, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 22.8s,\tbest xgboost's error=0.3757,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 37, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 22.8s,\tbest xgboost's error=0.3693,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 38, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 22.9s,\tbest xgboost's error=0.3693,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 39, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 23.0s,\tbest xgboost's error=0.3617,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 40, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 23.0s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 41, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:45] {1160} INFO -  at 23.1s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3374\n",
      "[flaml.automl: 07-06 10:25:45] {1012} INFO - iteration 42, current learner lgbm\n",
      "[flaml.automl: 07-06 10:25:59] {1160} INFO -  at 37.3s,\tbest lgbm's error=0.3344,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:25:59] {1012} INFO - iteration 43, current learner xgboost\n",
      "[flaml.automl: 07-06 10:25:59] {1160} INFO -  at 37.4s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:26:05] {1183} INFO - retrain xgboost for 6.1s\n",
      "[flaml.automl: 07-06 10:26:05] {1012} INFO - iteration 44, current learner xgboost\n",
      "[flaml.automl: 07-06 10:26:06] {1160} INFO -  at 43.6s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:26:06] {1012} INFO - iteration 45, current learner xgboost\n",
      "[flaml.automl: 07-06 10:26:06] {1160} INFO -  at 43.7s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:26:06] {1012} INFO - iteration 46, current learner lgbm\n",
      "[flaml.automl: 07-06 10:26:15] {1160} INFO -  at 53.2s,\tbest lgbm's error=0.3344,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:26:21] {1183} INFO - retrain lgbm for 5.5s\n",
      "[flaml.automl: 07-06 10:26:21] {1012} INFO - iteration 47, current learner xgboost\n",
      "[flaml.automl: 07-06 10:26:21] {1160} INFO -  at 58.9s,\tbest xgboost's error=0.3589,\tbest lgbm's error=0.3344\n",
      "[flaml.automl: 07-06 10:26:22] {1183} INFO - retrain xgboost for 1.1s\n",
      "[flaml.automl: 07-06 10:26:22] {1206} INFO - selected model: LGBMClassifier(colsample_bytree=0.6204654035998071,\n",
      "               learning_rate=0.17783122919583272, max_bin=16,\n",
      "               min_child_samples=17, n_estimators=197, num_leaves=340,\n",
      "               objective='binary', reg_alpha=0.07967521254431058,\n",
      "               reg_lambda=6.332908973055842, subsample=0.8413048297641477)\n",
      "[flaml.automl: 07-06 10:26:22] {963} INFO - fit succeeded\n"
     ]
    }
   ],
   "source": [
    "settings = {\n",
    "    \"time_budget\": 60,  # total running time in seconds\n",
    "    \"metric\": 'accuracy', \n",
    "    \"estimator_list\": ['RGF', 'lgbm', 'rf', 'xgboost'],  # list of ML learners\n",
    "    \"task\": 'classification',  # task type    \n",
    "    \"log_file_name\": 'airlines_experiment_custom.log',  # flaml log file \n",
    "    \"log_training_metric\": True,  # whether to log training metric\n",
    "}\n",
    "\n",
    "'''The main flaml automl API'''\n",
    "automl.fit(X_train = X_train, y_train = y_train, **settings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Comparison with alternatives\n",
    "\n",
    "### FLAML's accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "flaml accuracy = 0.6715957462586951\n"
     ]
    }
   ],
   "source": [
    "print('flaml accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Default LightGBM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lightgbm import LGBMClassifier\n",
    "lgbm = LGBMClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LGBMClassifier()"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lgbm.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "default lgbm accuracy = 0.6602346380315323\n"
     ]
    }
   ],
   "source": [
    "y_pred = lgbm.predict(X_test)\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('default lgbm accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Default XGBoost"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "from xgboost import XGBClassifier\n",
    "xgb = XGBClassifier()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
       "              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
       "              importance_type='gain', interaction_constraints='',\n",
       "              learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
       "              min_child_weight=1, missing=nan, monotone_constraints='()',\n",
       "              n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,\n",
       "              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n",
       "              tree_method='exact', validate_parameters=1, verbosity=None)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xgb.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "default xgboost accuracy = 0.6676060098186078\n"
     ]
    }
   ],
   "source": [
    "y_pred = xgb.predict(X_test)\n",
    "from flaml.ml import sklearn_metric_loss_score\n",
    "print('default xgboost accuracy', '=', 1 - sklearn_metric_loss_score('accuracy', y_pred, y_test))"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "4502d015faca2560a557f35a41b6dd402f7fdfc08e843ae17a9c41947939f10c"
  },
  "kernelspec": {
   "display_name": "Python 3.8.10 64-bit ('py38': conda)",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}