2021-02-22 22:10:41 -08:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Copyright (c) 2020-2021 Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Tune LightGBM with FLAML Library\n",
"\n",
"\n",
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
" tuning tasks.\n",
"\n",
"In this notebook, we demonstrate how to use FLAML library to tune hyperparameters of LightGBM with a regression example.\n",
"\n",
"FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
"```bash\n",
"pip install flaml[notebook]\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install flaml[notebook];"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2. Regression Example\n",
"### Load data and preprocess\n",
"\n",
"Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region."
]
},
{
"cell_type": "code",
2021-04-08 09:29:55 -07:00
"execution_count": 2,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "load dataset from./openml_ds537.pkl\nDataset name:houses\nX_train.shape: (15480, 8), y_train.shape: (15480,);\nX_test.shape: (5160, 8), y_test.shape: (5160,)\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"from flaml.data import load_openml_dataset\n",
2021-04-08 09:29:55 -07:00
"X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')"
2021-02-22 22:10:41 -08:00
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. "
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 4,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
2021-04-10 21:14:28 -04:00
},
"tags": []
2021-02-22 22:10:41 -08:00
},
"outputs": [],
"source": [
"''' import AutoML class from flaml package '''\n",
"from flaml import AutoML\n",
"automl = AutoML()"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 6,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"settings = {\n",
2021-04-08 09:29:55 -07:00
" \"time_budget\": 120, # total running time in seconds\n",
" \"metric\": 'r2', # primary metrics for regression can be chosen from: ['mae','mse','r2']\n",
" \"estimator_list\": ['lgbm'], # list of ML learners; we tune lightgbm in this example\n",
" \"task\": 'regression', # task type \n",
" \"log_file_name\": 'houses_experiment.log', # flaml log file\n",
2021-02-22 22:10:41 -08:00
"}"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 7,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
2021-04-10 21:14:28 -04:00
"text": "[flaml.automl: 04-09 19:57:58] {890} INFO - Evaluation method: cv\n[flaml.automl: 04-09 19:57:58] {606} INFO - Using RepeatedKFold\n[flaml.automl: 04-09 19:57:58] {911} INFO - Minimizing error metric: 1-r2\n[flaml.automl: 04-09 19:57:58] {930} INFO - List of ML learners in AutoML Run: ['lgbm']\n[flaml.automl: 04-09 19:57:58] {994} INFO - iteration 0, current learner lgbm\n[flaml.automl: 04-09 19:57:58] {1147} INFO - at 0.2s,\tbest lgbm's error=0.7383,\tbest lgbm's error=0.7383\n[flaml.automl: 04-09 19:57:58] {994} INFO - iteration 1, current learner lgbm\n[flaml.automl: 04-09 19:57:58] {1147} INFO - at 0.3s,\tbest lgbm's error=0.7383,\tbest lgbm's error=0.7383\n[flaml.automl: 04-09 19:57:58] {994} INFO - iteration 2, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 0.3s,\tbest lgbm's error=0.3888,\tbest lgbm's error=0.3888\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 3, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 0.4s,\tbest lgbm's error=0.3888,\tbest lgbm's error=0.3888\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 4, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 0.6s,\tbest lgbm's error=0.2657,\tbest lgbm's error=0.2657\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 5, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 0.8s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 6, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 0.9s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 7, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 1.1s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 8, current learner lgbm\n[flaml.automl: 04-09 19:57:59] {1147} INFO - at 1.2s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:57:59] {994} INFO - iteration 9, current learner lgbm\n[flaml.automl: 04-09 19:58:00] {1147} INFO - at 1.4s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:58:00] {994} INFO - iteration 10, current learner lgbm\n[flaml.automl: 04-09 19:58:00] {1147} INFO - at 1.5s,\tbest lgbm's error=0.2256,\tbest lgbm's error=0.2256\n[flaml.automl: 04-09 19:58:00] {994} INFO - iteration 11, current learner lgbm\n[flaml.automl: 04-09 19:58:00] {1147} INFO - at 2.0s,\tbest lgbm's error=0.2099,\tbest lgbm's error=0.2099\n[flaml.automl: 04-09 19:58:00] {994} INFO - iteration 12, current learner lgbm\n[flaml.automl: 04-09 19:58:01] {1147} INFO - at 2.9s,\tbest lgbm's error=0.2099,\tbest lgbm's error=0.2099\n[flaml.automl: 04-09 19:58:01] {994} INFO - iteration 13, current learner lgbm\n[flaml.automl: 04-09 19:58:01] {1147} INFO - at 3.0s,\tbest lgbm's error=0.2099,\tbest lgbm's error=0.2099\n[flaml.automl: 04-09 19:58:01] {994} INFO - iteration 14, current learner lgbm\n[flaml.automl: 04-09 19:58:03] {1147} INFO - at 4.7s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.automl: 04-09 19:58:03] {994} INFO - iteration 15, current learner lgbm\n[flaml.automl: 04-09 19:58:04] {1147} INFO - at 5.3s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.automl: 04-09 19:58:04] {994} INFO - iteration 16, current learner lgbm\n[flaml.automl: 04-09 19:58:13] {1147} INFO - at 14.6s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.automl: 04-09 19:58:13] {994} INFO - iteration 17, current learner lgbm\n[flaml.automl: 04-09 19:58:14] {1147} INFO - at 15.4s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.automl: 04-09 19:58:14] {994} INFO - iteration 18, current learner lgbm\n[flaml.automl: 04-09 19:58:18] {1147} INFO - at 20.0s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.automl: 04-09 19:58:18] {994} INFO - iteration 19, current learner lgbm\n[flaml.automl: 04-09 19:58:19] {1147} INFO - at 20.7s,\tbest lgbm's error=0.1644,\tbest lgbm's error=0.1644\n[flaml.auto
2021-02-22 22:10:41 -08:00
}
],
"source": [
"'''The main flaml automl API'''\n",
2021-04-08 09:29:55 -07:00
"automl.fit(X_train=X_train, y_train=y_train, **settings)"
2021-02-22 22:10:41 -08:00
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Best model and metric"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 8,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "Best hyperparmeter config:{'n_estimators': 95.0, 'num_leaves': 254.0, 'min_child_samples': 21.0, 'learning_rate': 0.10418050364992694, 'subsample': 0.9097941662911945, 'log_max_bin': 7.0, 'colsample_bytree': 0.7586723794764185, 'reg_alpha': 0.09228337080759572, 'reg_lambda': 0.46673178167010676}\nBest r2 on validation data: 0.8396\nTraining duration of best run: 7.868 s\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"''' retrieve best config'''\n",
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 9,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"output_type": "execute_result",
"data": {
2021-04-10 21:14:28 -04:00
"text/plain": "LGBMRegressor(colsample_bytree=0.7586723794764185,\n learning_rate=0.10418050364992694, max_bin=127,\n min_child_samples=21, n_estimators=95, num_leaves=254,\n objective='regression', reg_alpha=0.09228337080759572,\n reg_lambda=0.46673178167010676, subsample=0.9097941662911945)"
2021-02-22 22:10:41 -08:00
},
"metadata": {},
2021-04-10 21:14:28 -04:00
"execution_count": 9
2021-02-22 22:10:41 -08:00
}
],
"source": [
"automl.model"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 10,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
2021-03-16 22:13:35 -07:00
"''' pickle and save the automl object '''\n",
2021-02-22 22:10:41 -08:00
"import pickle\n",
2021-03-16 22:13:35 -07:00
"with open('automl.pkl', 'wb') as f:\n",
" pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)"
2021-02-22 22:10:41 -08:00
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 11,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "Predicted labels[150367.25556214 263353.37798151 136897.76625025 ... 190606.68038356\n 237816.02972335 263063.11183796]\nTrue labels[136900. 241300. 200700. ... 160900. 227300. 265600.]\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"''' compute predictions of testing dataset ''' \n",
"y_pred = automl.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 12,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "r2=0.8500929784828137\nmse=1981546944.5284543\nmae=29485.579651356835\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"''' compute different metric values on testing dataset'''\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))\n",
"print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))\n",
"print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 13,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4.0, 'num_leaves': 4.0, 'min_child_samples': 25.0, 'learning_rate': 1.0, 'subsample': 0.8513627344387318, 'log_max_bin': 10.0, 'colsample_bytree': 0.9684145930669938, 'reg_alpha': 0.001831177697321707, 'reg_lambda': 0.2790165919053839}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4.0, 'num_leaves': 4.0, 'min_child_samples': 25.0, 'learning_rate': 1.0, 'subsample': 0.8513627344387318, 'log_max_bin': 10.0, 'colsample_bytree': 0.9684145930669938, 'reg_alpha': 0.001831177697321707, 'reg_lambda': 0.2790165919053839}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 20.0, 'num_leaves': 4.0, 'min_child_samples': 48.0, 'learning_rate': 1.0, 'subsample': 0.9814787163243813, 'log_max_bin': 10.0, 'colsample_bytree': 0.9534346594834143, 'reg_alpha': 0.002208534076096185, 'reg_lambda': 0.5460627024738886}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 20.0, 'num_leaves': 4.0, 'min_child_samples': 48.0, 'learning_rate': 1.0, 'subsample': 0.9814787163243813, 'log_max_bin': 10.0, 'colsample_bytree': 0.9534346594834143, 'reg_alpha': 0.002208534076096185, 'reg_lambda': 0.5460627024738886}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 11.0, 'num_leaves': 15.0, 'min_child_samples': 42.0, 'learning_rate': 0.4743416464891248, 'subsample': 0.9233328006239466, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.034996420228767956, 'reg_lambda': 0.6169079461473814}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 11.0, 'num_leaves': 15.0, 'min_child_samples': 42.0, 'learning_rate': 0.4743416464891248, 'subsample': 0.9233328006239466, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.034996420228767956, 'reg_lambda': 0.6169079461473814}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 22.0, 'num_leaves': 44.0, 'min_child_samples': 33.0, 'learning_rate': 0.7277554644304967, 'subsample': 0.8890322269681047, 'log_max_bin': 9.0, 'colsample_bytree': 0.8917187085424868, 'reg_alpha': 0.3477637978466495, 'reg_lambda': 0.24655709710146537}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 22.0, 'num_leaves': 44.0, 'min_child_samples': 33.0, 'learning_rate': 0.7277554644304967, 'subsample': 0.8890322269681047, 'log_max_bin': 9.0, 'colsample_bytree': 0.8917187085424868, 'reg_alpha': 0.3477637978466495, 'reg_lambda': 0.24655709710146537}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 60.0, 'num_leaves': 72.0, 'min_child_samples': 37.0, 'learning_rate': 0.23811059538783155, 'subsample': 1.0, 'log_max_bin': 8.0, 'colsample_bytree': 0.9162072323824675, 'reg_alpha': 0.7017839907881602, 'reg_lambda': 0.23027329389914142}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 60.0, 'num_leaves': 72.0, 'min_child_samples': 37.0, 'learning_rate': 0.23811059538783155, 'subsample': 1.0, 'log_max_bin': 8.0, 'colsample_bytree': 0.9162072323824675, 'reg_alpha': 0.7017839907881602, 'reg_lambda': 0.23027329389914142}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 95.0, 'num_leaves': 254.0, 'min_child_samples': 21.0, 'learning_rate': 0.10418050364992694, 'subsample': 0.9097941662911945, 'log_max_bin': 7.0, 'colsample_bytree': 0.7586723794764185, 'reg_alpha': 0.09228337080759572, 'reg_lambda': 0.
2021-02-22 22:10:41 -08:00
}
],
"source": [
"from flaml.data import get_output_from_log\n",
"time_history, best_valid_loss_history, valid_loss_history, config_history, train_loss_history = \\\n",
2021-04-08 09:29:55 -07:00
" get_output_from_log(filename=settings['log_file_name'], time_budget=60)\n",
2021-02-22 22:10:41 -08:00
"\n",
"for config in config_history:\n",
" print(config)"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 14,
2021-02-22 22:10:41 -08:00
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
2021-04-10 21:14:28 -04:00
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Created with matplotlib (https://matplotlib.org/) -->\n<svg height=\"277.314375pt\" version=\"1.1\" viewBox=\"0 0 385.78125 277.314375\" width=\"385.78125pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n <defs>\n <style type=\"text/css\">\n*{stroke-linecap:butt;stroke-linejoin:round;}\n </style>\n </defs>\n <g id=\"figure_1\">\n <g id=\"patch_1\">\n <path d=\"M 0 277.314375 \nL 385.78125 277.314375 \nL 385.78125 0 \nL 0 0 \nz\n\" style=\"fill:none;\"/>\n </g>\n <g id=\"axes_1\">\n <g id=\"patch_2\">\n <path d=\"M 43.78125 239.758125 \nL 378.58125 239.758125 \nL 378.58125 22.318125 \nL 43.78125 22.318125 \nz\n\" style=\"fill:#ffffff;\"/>\n </g>\n <g id=\"PathCollection_1\">\n <defs>\n <path d=\"M 0 3 \nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \nC 2.683901 1.55874 3 0.795609 3 0 \nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \nC 1.55874 -2.683901 0.795609 -3 0 -3 \nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \nC -2.683901 -1.55874 -3 -0.795609 -3 0 \nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \nC -1.55874 2.683901 -0.795609 3 0 3 \nz\n\" id=\"m3ea415d71d\" style=\"stroke:#1f77b4;\"/>\n </defs>\n <g clip-path=\"url(#pdee6128df0)\">\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"58.999432\" xlink:href=\"#m3ea415d71d\" y=\"229.874489\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"60.18118\" xlink:href=\"#m3ea415d71d\" y=\"110.31857\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"61.844299\" xlink:href=\"#m3ea415d71d\" y=\"68.231069\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"63.513647\" xlink:href=\"#m3ea415d71d\" y=\"54.493868\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"71.55585\" xlink:href=\"#m3ea415d71d\" y=\"49.119324\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"89.927617\" xlink:href=\"#m3ea415d71d\" y=\"33.574395\"/>\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"363.363068\" xlink:href=\"#m3ea415d71d\" y=\"32.201761\"/>\n </g>\n </g>\n <g id=\"matplotlib.axis_1\">\n <g id=\"xtick_1\">\n <g id=\"line2d_1\">\n <defs>\n <path d=\"M 0 0 \nL 0 3.5 \n\" id=\"m8b8351637c\" style=\"stroke:#000000;stroke-width:0.8;\"/>\n </defs>\n <g>\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"57.768897\" xlink:href=\"#m8b8351637c\" y=\"239.758125\"/>\n </g>\n </g>\n <g id=\"text_1\">\n <!-- 0 -->\n <defs>\n <path d=\"M 31.78125 66.40625 \nQ 24.171875 66.40625 20.328125 58.90625 \nQ 16.5 51.421875 16.5 36.375 \nQ 16.5 21.390625 20.328125 13.890625 \nQ 24.171875 6.390625 31.78125 6.390625 \nQ 39.453125 6.390625 43.28125 13.890625 \nQ 47.125 21.390625 47.125 36.375 \nQ 47.125 51.421875 43.28125 58.90625 \nQ 39.453125 66.40625 31.78125 66.40625 \nz\nM 31.78125 74.21875 \nQ 44.046875 74.21875 50.515625 64.515625 \nQ 56.984375 54.828125 56.984375 36.375 \nQ 56.984375 17.96875 50.515625 8.265625 \nQ 44.046875 -1.421875 31.78125 -1.421875 \nQ 19.53125 -1.421875 13.0625 8.265625 \nQ 6.59375 17.96875 6.59375 36.375 \nQ 6.59375 54.828125 13.0625 64.515625 \nQ 19.53125 74.21875 31.78125 74.21875 \nz\n\" id=\"DejaVuSans-48\"/>\n </defs>\n <g transform=\"translate(54.587647 254.356562)scale(0.1 -0.1)\">\n <use xlink:href=\"#DejaVuSans-48\"/>\n </g>\n </g>\n </g>\n <g id=\"xtick_2\">\n <g id=\"line2d_2\">\n <g>\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"126.823568\" xlink:href=\"#m8b8351637c\" y=\"239.758125\"/>\n </g>\n </g>\n <g id=\"text_2\">\n <!-- 10 -->\n <defs>\n <path d=\"M 12.40625 8.296875 \nL 28.515625 8.296875 \nL 28.515625 63.921875 \nL 10.984375 60.40625 \nL 10.984375 69.390625 \nL 28.421875 72.90625 \nL 38.28125 72.90625 \nL 38.28125 8.296875 \nL 54.390625 8.296875 \nL 54.390625 0 \nL 12.40625 0 \nz\n\" id=\"DejaVuSans
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAbhElEQVR4nO3df5xWdZ338dfbEQNLRGJycQChlShMkyLNfqqrC5oJpbno3nuXbVG76bbaUtCquXp7ry27tva4qW50XbX1txFikaybv0pNQTERDBfRhMEUVJRsEoHP/nHO6MU111xzAXOua+b6vp+PxzzmOt/zPed85ijznvPrexQRmJlZunZrdAFmZtZYDgIzs8Q5CMzMEucgMDNLnIPAzCxxDgIzs8Q5CMyqkPQRSSsbXYdZkRwE1mdJekrS0Y2sISJ+HhHjilq/pEmS7pa0SdJ6SXdJOqGo7ZlV4iCwpElqaeC2TwJuBK4CRgD7AucCn9iJdUmS/z3bTvH/ONbvSNpN0kxJT0h6XtINkoaWzL9R0m8lvZT/tX1gybwrJH1P0kJJrwBH5kcefyfpkXyZ6yUNzPsfIWltyfLd9s3nf03SM5LWSfq8pJB0QIWfQcDFwAURcVlEvBQR2yLiroj4Qt7nPEn/UbLM6Hx9u+fTd0q6UNI9wO+BGZKWlG3nTEkL8s9vkvTPkp6W9Kyk70satIv/OawJOAisPzoDmAp8DNgPeBGYUzL/p8BY4G3AQ8DVZcufClwI7AX8Im87GZgMjAEOBj5bZfsV+0qaDJwFHA0cABxRZR3jgJHATVX61OIvgOlkP8v3gXGSxpbMPxW4Jv98EfAO4JC8vjayIxBLnIPA+qMvAX8fEWsj4lXgPOCkzr+UI+LyiNhUMu89kvYuWf7miLgn/wv8D3nbdyJiXUS8ANxC9suyO931PRn494hYHhG/z7fdnbfm35+p9YfuxhX59rZExEvAzcApAHkgvBNYkB+BTAfOjIgXImIT8H+Babu4fWsCDgLrj/YHfiRpo6SNwGPAVmBfSS2SLspPG70MPJUvM6xk+TUV1vnbks+/B95SZfvd9d2vbN2VttPp+fz78Cp9alG+jWvIg4DsaGB+HkqtwJ7AgyX77da83RLnILD+aA1wbEQMKfkaGBHtZL/8ppCdntkbGJ0vo5Llixpy9xmyi76dRlbpu5Ls5zixSp9XyH55d/qjCn3Kf5bbgFZJh5AFQudpoQ1AB3BgyT7bOyKqBZ4lwkFgfd0ASQNLvnYnOxd+oaT9ASS1SpqS998LeJXsL+49yU5/1MsNwGmS3iVpT+Cc7jpGNv77WcA5kk6TNDi/CP5hSXPzbg8DH5U0Kj+1NaunAiLiNbI7kWYDQ8mCgYjYBlwKfFvS2wAktUmatNM/rTUNB4H1dQvJ/pLt/DoPuARYAPynpE3AL4HD8v5XAb8B2oEV+by6iIifAt8B7gBWlWz71W763wT8GfA5YB3wLPB/yM7zExG3AdcDjwAPAj+usZRryI6IboyILSXtX++sKz9t9l9kF60tcfKLacyKIeldwKPAm8p+IZv1KT4iMOtFkj6Z36+/D/At4BaHgPV1DgKz3vVF4DngCbI7mf6qseWY9cynhszMEucjAjOzxO3e6AJ21LBhw2L06NGNLsPMrF958MEHN0RExQcI+10QjB49miVLlvTc0czMXifpN93N86khM7PEOQjMzBLnIDAzS5yDwMwscQ4CM7PE9bu7hszMUjN/aTuzF61k3cYO9hsyiBmTxjF1Qluvrd9BYGbWh81f2s6secvoeG0rAO0bO5g1bxlAr4WBg6BgRSe5mTW32YtWvh4CnTpe28rsRSsdBP1BPZLczJrbuo0dO9S+MxwEBeouyb920yNc+8DTDarKzPqTAS27sXnrti7t+w0Z1Gvb8F1DBeousSv9RzUzq2Tk0EHspu3bBg1oYcak3nu5nI8ICrTfkEG0VwiDtiGDuP6LhzegIjPrj3zXUD82Y9K47a4RQO8nuZk1v6kT2gq9ruggKFDnf7iv3fQIm7duo813DZlZH+QgKNjUCW2vXxj26SAz64t8sdjMLHEOAjOzxDkIzMwS5yAwM0tcoUEgabKklZJWSZpZYf4oSXdIWirpEUnHFVmPmZl1VVgQSGoB5gDHAuOBUySNL+t2NnBDREwApgHfLaoeMzOrrMgjgkOBVRGxOiI2A9cBU8r6BDA4/7w3sK7AeszMrIIinyNoA9aUTK8FDivrcx7wn5LOAN4MHF1pRZKmA9MBRo0a1euFVuLho80sFY2+WHwKcEVEjACOA34gqUtNETE3IiZGxMTW1tbCi+ocPrp9YwfBG8NHz1/aXvi2zczqrcgjgnZgZMn0iLyt1F8CkwEi4j5JA4FhwHMF1tWj3h4+esUzLzN++OCeO5qZNUCRRwSLgbGSxkjag+xi8IKyPk8DfwIg6V3AQGB9gTXVpLeHjx4/fDBTDvFpJTPrmwo7IoiILZJOBxYBLcDlEbFc0vnAkohYAHwVuFTSmWQXjj8bEVFUTbXy8NFmlpJCB52LiIXAwrK2c0s+rwA+VGQNO8PDR5tZSpIdfbTaXUEePtrMUpJkENTyUnkPH21mqUgyCGq9K8h3+5hZChr9HEFD1HpXkO/2MbMUJHlE4LuCzMzekOQRwYxJ4xg0oGW7Nt8VZGapSvKIwHcFmZm9IckgAN8VZGbWKclTQ2Zm9gYHgZlZ4hwEZmaJcxCYmSXOQWBmlrjk7hoqHWxuQMtujBw6qNElmZk1VFJHBOWvoNy8dRtPbnjFr6A0s6QlFQSVBpvbFlm7mVmqkgqC7gab667dzCwFSQXBfkMqXw/ort3MLAVJBYEHmzMz6yqpu4Y82JyZWVdJBQF4sDkzs3JJnRoyM7OuHARmZolzEJiZJc5BYGaWuEKDQNJkSSslrZI0s8L8b0t6OP96XNLGIusxM7OuCrtrSFILMAc4BlgLLJa0ICJWdPaJiDNL+p8BTCiqHjMzq6zII4JDgVURsToiNgPXAVOq9D8FuLbAeszMrIIig6ANWFMyvTZv60LS/sAY4PZu5k+XtETSkvXr1/d6oWZmKesrF4unATdFxNZKMyNibkRMjIiJra2tdS7NzKy5FRkE7cDIkukReVsl0/BpITOzhigyCBYDYyWNkbQH2S/7BeWdJL0T2Ae4r8BazMysG4UFQURsAU4HFgGPATdExHJJ50s6oaTrNOC6iIiiajEzs+4VOuhcRCwEFpa1nVs2fV6RNZiZWXV95WKxmZk1iIPAzCxxDgIzs8Q5CMzMEucgMDNLnIPAzCxxDgIzs8Q5CMzMEucgMDNLnIPAzCxxDgIzs8Q5CMzMEucgMDNLnIPAzCxxDgIzs8Q5CMzMElc1CCQNlvTHFdoPLq4kMzOrp26DQNLJwK+BH0paLun9JbOvKLowMzOrj2pHBN8A3hcRhwCnAT+Q9Ml8ngqvzMzM6qLaO4tbIuIZgIh4QNKRwI8ljQT8onkzsyZR7YhgU+n1gTwUjgCmAAcWXJeZmdVJtSOCv6LsFFBEbJI0GTi50KoKMH9pO7MXrWTdxg4GtOzGyKGDGl2SmVmf0O0RQUT8CnhS0h1l7a9FxNWFV9aL5i9tZ9a8ZbRv7CCAzVu38eSGV5i/tL3RpZmZNVzV20cjYiuwTdLedaqnELMXraTjta3btW2LrN3MLHXVTg11+h2wTNJtwCudjRHxN4VV1cvWbezYoXYzs5TU8mTxPOAc4G7gwZKvHkmaLGmlpFWSZnbT52RJK/JnFa6ptfAdsd+QytcDums3M0tJj0cEEXHlzqxYUgswBzgGWAsslrQgIlaU9BkLzAI+FBEvSnrbzmyrJzMmjWPWvGXbnR4aNKCFGZPGFbE5M7N+pcixhg4FVkXE6ojYDFxHdutpqS8AcyLiRYCIeK6IQqZOaOMfP3UQe7RkP27bkEH846cOYuqEtiI2Z2bWr9RyjWBntQFrSqbXAoeV9XkHgKR7gBbgvIi4tXxFkqYD0wFGjRq1U8VMndDGtQ88DcD1Xzx8p9ZhZtaMGj366O7AWLIH1U4BLpU0pLxTRMyNiIk
2021-02-22 22:10:41 -08:00
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"plt.title('Learning Curve')\n",
"plt.xlabel('Wall Clock Time (s)')\n",
"plt.ylabel('Validation r2')\n",
2021-04-08 09:29:55 -07:00
"plt.scatter(time_history, 1 - np.array(valid_loss_history))\n",
"plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n",
2021-02-22 22:10:41 -08:00
"plt.show()"
]
},
{
"source": [
"## 3. Comparison with alternatives\n",
"\n",
"### FLAML's accuracy"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 15,
"metadata": {
"tags": []
},
2021-02-22 22:10:41 -08:00
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "flaml r2=0.8500929784828137\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"print('flaml r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
},
{
"source": [
"### Default LightGBM"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 16,
2021-02-22 22:10:41 -08:00
"metadata": {},
"outputs": [],
"source": [
"from lightgbm import LGBMRegressor\n",
"lgbm = LGBMRegressor()"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 17,
2021-02-22 22:10:41 -08:00
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
2021-04-10 21:14:28 -04:00
"text/plain": "LGBMRegressor()"
2021-02-22 22:10:41 -08:00
},
"metadata": {},
2021-04-10 21:14:28 -04:00
"execution_count": 17
2021-02-22 22:10:41 -08:00
}
],
"source": [
"lgbm.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 18,
"metadata": {
"tags": []
},
2021-02-22 22:10:41 -08:00
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "default lgbm r2=0.8296179648694404\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"y_pred = lgbm.predict(X_test)\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('default lgbm r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
},
{
"source": [
"### Optuna LightGBM Tuner"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 39,
2021-02-22 22:10:41 -08:00
"metadata": {},
"outputs": [],
"source": [
2021-04-08 09:29:55 -07:00
"# !pip install optuna==2.5.0;"
2021-02-22 22:10:41 -08:00
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 40,
2021-02-22 22:10:41 -08:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"train_x, val_x, train_y, val_y = train_test_split(X_train, y_train, test_size=0.1)\n",
"import optuna.integration.lightgbm as lgb\n",
"dtrain = lgb.Dataset(train_x, label=train_y)\n",
"dval = lgb.Dataset(val_x, label=val_y)\n",
"params = {\n",
" \"objective\": \"regression\",\n",
" \"metric\": \"regression\",\n",
" \"verbosity\": -1,\n",
"}\n"
]
},
{
"cell_type": "code",
2021-04-10 21:14:28 -04:00
"execution_count": 41,
2021-02-22 22:10:41 -08:00
"metadata": {
2021-04-10 21:14:28 -04:00
"tags": []
2021-02-22 22:10:41 -08:00
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
2021-04-10 21:14:28 -04:00
"text": "\u001b[32m[I 2021-04-09 19:56:13,788]\u001b[0m A new study created in memory with name: no-name-be796674-63fe-4736-9436-82e0a952f36b\u001b[0m\nfeature_fraction, val_score: 2001137767.143790: 14%|#4 | 1/7 [00:02<00:13, 2.30s/it]\u001b[32m[I 2021-04-09 19:56:16,095]\u001b[0m Trial 0 finished with value: 2001137767.14379 and parameters: {'feature_fraction': 0.7}. Best is trial 0 with value: 2001137767.14379.\u001b[0m\nfeature_fraction, val_score: 2001137767.143790: 29%|##8 | 2/7 [00:04<00:11, 2.24s/it]\u001b[32m[I 2021-04-09 19:56:18,289]\u001b[0m Trial 1 finished with value: 2009099143.533758 and parameters: {'feature_fraction': 0.6}. Best is trial 0 with value: 2001137767.14379.\u001b[0m\nfeature_fraction, val_score: 2001137767.143790: 43%|####2 | 3/7 [00:06<00:09, 2.27s/it]\u001b[32m[I 2021-04-09 19:56:20,588]\u001b[0m Trial 2 finished with value: 2001137767.14379 and parameters: {'feature_fraction': 0.8}. Best is trial 0 with value: 2001137767.14379.\u001b[0m\nfeature_fraction, val_score: 2001137767.143790: 57%|#####7 | 4/7 [00:09<00:07, 2.38s/it]\u001b[32m[I 2021-04-09 19:56:23,148]\u001b[0m Trial 3 finished with value: 2017941196.0559783 and parameters: {'feature_fraction': 1.0}. Best is trial 0 with value: 2001137767.14379.\u001b[0m\nfeature_fraction, val_score: 1977065482.707781: 71%|#######1 | 5/7 [00:11<00:04, 2.27s/it]\u001b[32m[I 2021-04-09 19:56:25,222]\u001b[0m Trial 4 finished with value: 1977065482.7077813 and parameters: {'feature_fraction': 0.5}. Best is trial 4 with value: 1977065482.7077813.\u001b[0m\nfeature_fraction, val_score: 1977065482.707781: 71%|#######1 | 5/7 [00:11<00:04, 2.27s/it]"
2021-02-22 22:10:41 -08:00
}
],
"source": [
"%%time\n",
"model = lgb.train(params, dtrain, valid_sets=[dtrain, dval], verbose_eval=10000) \n"
]
},
2021-04-10 21:14:28 -04:00
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"y_pred = model.predict(X_test)\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('Optuna LightGBM Tuner r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Add a customized LightGBM learner in FLAML\n",
"The native API of LightGBM allows one to specify a custom objective function in the model constructor. You can easily enable it by adding a customized LightGBM learner in FLAML. In the following example, we show how to add such a customized LightGBM learner with a custom objective function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a customized LightGBM learner with a custom objective function"
]
},
2021-02-22 22:10:41 -08:00
{
"cell_type": "code",
2021-04-08 09:29:55 -07:00
"execution_count": 20,
2021-02-22 22:10:41 -08:00
"metadata": {},
2021-04-10 21:14:28 -04:00
"outputs": [],
"source": [
"\n",
"import numpy as np \n",
"\n",
"''' define your customized objective function '''\n",
"def my_loss_obj(y_true, y_pred):\n",
" c = 0.5\n",
" residual = y_pred - y_true\n",
" grad = c * residual /(np.abs(residual) + c)\n",
" hess = c ** 2 / (np.abs(residual) + c) ** 2\n",
" # rmse grad and hess\n",
" grad_rmse = residual\n",
" hess_rmse = 1.0\n",
" \n",
" # mae grad and hess\n",
" grad_mae = np.array(residual)\n",
" grad_mae[grad_mae > 0] = 1.\n",
" grad_mae[grad_mae <= 0] = -1.\n",
" hess_mae = 1.0\n",
"\n",
" coef = [0.4, 0.3, 0.3]\n",
" return coef[0] * grad + coef[1] * grad_rmse + coef[2] * grad_mae, \\\n",
" coef[0] * hess + coef[1] * hess_rmse + coef[2] * hess_mae\n",
"\n",
"\n",
"from flaml.model import LGBMEstimator\n",
"\n",
"''' create a customized LightGBM learner class with your objective function '''\n",
"class MyLGBM(LGBMEstimator):\n",
" '''LGBMEstimator with my_loss_obj as the objective function\n",
" '''\n",
"\n",
" def __init__(self, **params):\n",
" super().__init__(objective=my_loss_obj, **params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add the customized learner in FLAML"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": "[flaml.automl: 04-09 21:16:36] {890} INFO - Evaluation method: cv\n[flaml.automl: 04-09 21:16:36] {606} INFO - Using RepeatedKFold\n[flaml.automl: 04-09 21:16:36] {911} INFO - Minimizing error metric: 1-r2\n[flaml.automl: 04-09 21:16:36] {930} INFO - List of ML learners in AutoML Run: ['my_lgbm']\n[flaml.automl: 04-09 21:16:36] {994} INFO - iteration 0, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.2s,\tbest my_lgbm's error=2.9883,\tbest my_lgbm's error=2.9883\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 1, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.3s,\tbest my_lgbm's error=2.9883,\tbest my_lgbm's error=2.9883\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 2, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.4s,\tbest my_lgbm's error=0.4472,\tbest my_lgbm's error=0.4472\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 3, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.5s,\tbest my_lgbm's error=0.4472,\tbest my_lgbm's error=0.4472\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 4, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.7s,\tbest my_lgbm's error=0.2682,\tbest my_lgbm's error=0.2682\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 5, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 0.9s,\tbest my_lgbm's error=0.2682,\tbest my_lgbm's error=0.2682\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 6, current learner my_lgbm\n[flaml.automl: 04-09 21:16:37] {1147} INFO - at 1.1s,\tbest my_lgbm's error=0.2682,\tbest my_lgbm's error=0.2682\n[flaml.automl: 04-09 21:16:37] {994} INFO - iteration 7, current learner my_lgbm\n[flaml.automl: 04-09 21:16:38] {1147} INFO - at 1.3s,\tbest my_lgbm's error=0.2256,\tbest my_lgbm's error=0.2256\n[flaml.automl: 04-09 21:16:38] {994} INFO - iteration 8, current learner my_lgbm\n[flaml.automl: 04-09 21:16:38] {1147} INFO - at 1.5s,\tbest my_lgbm's error=0.2256,\tbest my_lgbm's error=0.2256\n[flaml.automl: 04-09 21:16:38] {994} INFO - iteration 9, current learner my_lgbm\n[flaml.automl: 04-09 21:16:38] {1147} INFO - at 1.6s,\tbest my_lgbm's error=0.2256,\tbest my_lgbm's error=0.2256\n[flaml.automl: 04-09 21:16:38] {994} INFO - iteration 10, current learner my_lgbm\n[flaml.automl: 04-09 21:16:38] {1147} INFO - at 1.8s,\tbest my_lgbm's error=0.2256,\tbest my_lgbm's error=0.2256\n[flaml.automl: 04-09 21:16:38] {994} INFO - iteration 11, current learner my_lgbm\n[flaml.automl: 04-09 21:16:39] {1147} INFO - at 2.3s,\tbest my_lgbm's error=0.1866,\tbest my_lgbm's error=0.1866\n[flaml.automl: 04-09 21:16:39] {994} INFO - iteration 12, current learner my_lgbm\n[flaml.automl: 04-09 21:16:39] {1147} INFO - at 2.9s,\tbest my_lgbm's error=0.1866,\tbest my_lgbm's error=0.1866\n[flaml.automl: 04-09 21:16:39] {994} INFO - iteration 13, current learner my_lgbm\n[flaml.automl: 04-09 21:16:39] {1147} INFO - at 3.1s,\tbest my_lgbm's error=0.1866,\tbest my_lgbm's error=0.1866\n[flaml.automl: 04-09 21:16:39] {994} INFO - iteration 14, current learner my_lgbm\n[flaml.automl: 04-09 21:16:41] {1147} INFO - at 5.0s,\tbest my_lgbm's error=0.1639,\tbest my_lgbm's error=0.1639\n[flaml.automl: 04-09 21:16:41] {994} INFO - iteration 15, current learner my_lgbm\n[flaml.automl: 04-09 21:16:42] {1147} INFO - at 5.6s,\tbest my_lgbm's error=0.1639,\tbest my_lgbm's error=0.1639\n[flaml.automl: 04-09 21:16:42] {994} INFO - iteration 16, current learner my_lgbm\n[flaml.automl: 04-09 21:16:48] {1147} INFO - at 11.9s,\tbest my_lgbm's error=0.1639,\tbest my_lgbm's error=0.1639\n[flaml.automl: 04-09 21:16:48] {994} INFO - iteration 17, current learner my_lgbm\n[flaml.automl: 04-09 21:16:49] {1147} INFO - at 13.1s,\tbest my_lgbm's error=0.1639,\tbest my_lgbm's error=0.1639\n[flaml.automl: 04-09 21:16:49] {994} INFO - iteration 18, current learner my_lgbm\n[flaml.automl: 04-09 21:16:54] {1147} INFO - at 17.7s,\tbest my_lgbm's error=0.1639,\tbest my_lgbm's error=0.1639\n[flaml.automl: 04-09 21:16:
}
],
"source": [
"automl = AutoML()\n",
"automl.add_learner(learner_name='my_lgbm', learner_class=MyLGBM)\n",
"settings = {\n",
" \"time_budget\": 120, # total running time in seconds\n",
" \"metric\": 'r2', # primary metrics for regression can be chosen from: ['mae','mse','r2']\n",
" \"estimator_list\": ['my_lgbm',], # list of ML learners; we tune lightgbm in this example\n",
" \"task\": 'regression', # task type \n",
" \"log_file_name\": 'houses_experiment_my_lgbm.log', # flaml log file\n",
"}\n",
"automl.fit(X_train=X_train, y_train=y_train, **settings)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"tags": []
},
2021-02-22 22:10:41 -08:00
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-04-10 21:14:28 -04:00
"text": "Best hyperparmeter config:{'n_estimators': 287.0, 'num_leaves': 247.0, 'min_child_samples': 81.0, 'learning_rate': 0.06283686776885493, 'subsample': 0.7669214501226506, 'log_max_bin': 10.0, 'colsample_bytree': 0.613734331916688, 'reg_alpha': 0.006495889833184046, 'reg_lambda': 0.005049036990045567}\nBest r2 on validation data: 0.839\nTraining duration of best run: 13.51 s\nPredicted labels[136183.28410995 260302.1656523 136575.03214257 ... 213737.94780122\n 248465.64921701 275744.71459095]\nTrue labels[136900. 241300. 200700. ... 160900. 227300. 265600.]\nr2=0.8449104679441721\nmse=2050051993.9844227\nmae=30061.65329294407\n"
2021-02-22 22:10:41 -08:00
}
],
"source": [
2021-04-10 21:14:28 -04:00
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))\n",
"\n",
"y_pred = automl.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)\n",
"\n",
2021-02-22 22:10:41 -08:00
"from flaml.ml import sklearn_metric_loss_score\n",
2021-04-10 21:14:28 -04:00
"print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))\n",
"print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))\n",
"print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))"
2021-02-22 22:10:41 -08:00
]
}
],
"metadata": {
"kernelspec": {
2021-04-10 21:14:28 -04:00
"name": "python37764bitbsconda5b158f6acec0414d8c5c2401992dd9e1",
"display_name": "Python 3.7.7 64-bit ('bs': conda)",
2021-02-22 22:10:41 -08:00
"metadata": {
"interpreter": {
2021-04-08 09:29:55 -07:00
"hash": "0cfea3304185a9579d09e0953576b57c8581e46e6ebc6dfeb681bc5a511f7544"
2021-02-22 22:10:41 -08:00
}
}
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2021-04-10 21:14:28 -04:00
"version": "3.7.7-final"
2021-02-22 22:10:41 -08:00
}
},
"nbformat": 4,
"nbformat_minor": 2
}