autogen/notebook/flaml_lightgbm.ipynb

649 lines
88 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Copyright (c) 2020-2021 Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Tune LightGBM with FLAML Library\n",
"\n",
"\n",
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
" tuning tasks.\n",
"\n",
"In this notebook, we demonstrate how to use FLAML library to tune hyperparameters of LightGBM with a regression example.\n",
"\n",
"FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
"```bash\n",
"pip install flaml[notebook]\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install flaml[notebook];"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2. Regression Example\n",
"### Load data and preprocess\n",
"\n",
"Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"load dataset from ./openml_ds537.pkl\nDataset name: houses\nX_train.shape: (15480, 8), y_train.shape: (15480,);\nX_test.shape: (5160, 8), y_test.shape: (5160,)\n"
]
}
],
"source": [
"from flaml.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id = 537, data_dir = './')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"''' import AutoML class from flaml package '''\n",
"from flaml import AutoML\n",
"automl = AutoML()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"settings = {\n",
" \"time_budget\": 120, # total running time in seconds\n",
" \"metric\": 'r2', # primary metrics for regression can be chosen from: ['mae','mse','r2']\n",
" \"estimator_list\": ['lgbm'], # list of ML learners; we tune lightgbm in this example\n",
" \"task\": 'regression', # task type \n",
" \"log_file_name\": 'houses_experiment.log', # flaml log file\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"[flaml.automl: 02-22 14:37:41] {844} INFO - Evaluation method: cv\n",
"[flaml.automl: 02-22 14:37:41] {573} INFO - Using RepeatedKFold\n",
"[flaml.automl: 02-22 14:37:41] {865} INFO - Minimizing error metric: 1-r2\n",
"[flaml.automl: 02-22 14:37:41] {885} INFO - List of ML learners in AutoML Run: ['lgbm']\n",
"[flaml.automl: 02-22 14:37:41] {944} INFO - iteration 0 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:41] {1098} INFO - at 0.2s,\tbest lgbm's error=0.7383,\tbest lgbm's error=0.7383\n",
"[flaml.automl: 02-22 14:37:41] {944} INFO - iteration 1 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:41] {1098} INFO - at 0.3s,\tbest lgbm's error=0.7383,\tbest lgbm's error=0.7383\n",
"[flaml.automl: 02-22 14:37:41] {944} INFO - iteration 2 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:41] {1098} INFO - at 0.4s,\tbest lgbm's error=0.4578,\tbest lgbm's error=0.4578\n",
"[flaml.automl: 02-22 14:37:41] {944} INFO - iteration 3 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:41] {1098} INFO - at 0.5s,\tbest lgbm's error=0.4578,\tbest lgbm's error=0.4578\n",
"[flaml.automl: 02-22 14:37:41] {944} INFO - iteration 4 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:42] {1098} INFO - at 0.9s,\tbest lgbm's error=0.2637,\tbest lgbm's error=0.2637\n",
"[flaml.automl: 02-22 14:37:42] {944} INFO - iteration 5 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:42] {1098} INFO - at 1.2s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:42] {944} INFO - iteration 6 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:42] {1098} INFO - at 1.3s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:42] {944} INFO - iteration 7 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:42] {1098} INFO - at 1.6s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:42] {944} INFO - iteration 8 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:42] {1098} INFO - at 1.8s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:42] {944} INFO - iteration 9 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:43] {1098} INFO - at 1.9s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:43] {944} INFO - iteration 10 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:43] {1098} INFO - at 2.1s,\tbest lgbm's error=0.2284,\tbest lgbm's error=0.2284\n",
"[flaml.automl: 02-22 14:37:43] {944} INFO - iteration 11 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:43] {1098} INFO - at 2.6s,\tbest lgbm's error=0.2262,\tbest lgbm's error=0.2262\n",
"[flaml.automl: 02-22 14:37:43] {944} INFO - iteration 12 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:44] {1098} INFO - at 3.7s,\tbest lgbm's error=0.2009,\tbest lgbm's error=0.2009\n",
"[flaml.automl: 02-22 14:37:44] {944} INFO - iteration 13 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:45] {1098} INFO - at 3.9s,\tbest lgbm's error=0.2009,\tbest lgbm's error=0.2009\n",
"[flaml.automl: 02-22 14:37:45] {944} INFO - iteration 14 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:47] {1098} INFO - at 6.0s,\tbest lgbm's error=0.1854,\tbest lgbm's error=0.1854\n",
"[flaml.automl: 02-22 14:37:47] {944} INFO - iteration 15 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:48] {1098} INFO - at 7.0s,\tbest lgbm's error=0.1854,\tbest lgbm's error=0.1854\n",
"[flaml.automl: 02-22 14:37:48] {944} INFO - iteration 16 current learner lgbm\n",
"[flaml.automl: 02-22 14:37:59] {1098} INFO - at 18.0s,\tbest lgbm's error=0.1761,\tbest lgbm's error=0.1761\n",
"[flaml.automl: 02-22 14:37:59] {944} INFO - iteration 17 current learner lgbm\n",
"[flaml.automl: 02-22 14:38:35] {1098} INFO - at 53.9s,\tbest lgbm's error=0.1725,\tbest lgbm's error=0.1725\n",
"[flaml.automl: 02-22 14:38:35] {944} INFO - iteration 18 current learner lgbm\n",
"[flaml.automl: 02-22 14:39:10] {1098} INFO - at 88.9s,\tbest lgbm's error=0.1725,\tbest lgbm's error=0.1725\n",
"[flaml.automl: 02-22 14:39:10] {944} INFO - iteration 19 current learner lgbm\n",
"[flaml.automl: 02-22 14:39:14] {1098} INFO - at 92.9s,\tbest lgbm's error=0.1725,\tbest lgbm's error=0.1725\n",
"[flaml.automl: 02-22 14:39:14] {944} INFO - iteration 20 current learner lgbm\n",
"[flaml.automl: 02-22 14:39:31] {1098} INFO - at 110.7s,\tbest lgbm's error=0.1563,\tbest lgbm's error=0.1563\n",
"[flaml.automl: 02-22 14:39:31] {1139} INFO - selected model: LGBMRegressor(colsample_bytree=0.9046814915274195,\n",
" learning_rate=0.025065630491840726, max_bin=255,\n",
" min_child_weight=20.0, n_estimators=451, num_leaves=113,\n",
" objective='regression', reg_alpha=8.352751749829367e-10,\n",
" reg_lambda=0.13991138691596908)\n",
"[flaml.automl: 02-22 14:39:31] {899} INFO - fit succeeded\n"
]
}
],
"source": [
"'''The main flaml automl API'''\n",
"automl.fit(X_train = X_train, y_train = y_train, **settings)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Best model and metric"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Best hyperparmeter config: {'n_estimators': 451.0, 'max_leaves': 113.0, 'min_child_weight': 20.0, 'learning_rate': 0.025065630491840726, 'subsample': 1.0, 'log_max_bin': 8.0, 'colsample_bytree': 0.9046814915274195, 'reg_alpha': 8.352751749829367e-10, 'reg_lambda': 0.13991138691596908}\nBest r2 on validation data: 0.8437\nTraining duration of best run: 17.86 s\n"
]
}
],
"source": [
"''' retrieve best config'''\n",
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LGBMRegressor(colsample_bytree=0.9046814915274195,\n",
" learning_rate=0.025065630491840726, max_bin=255,\n",
" min_child_weight=20.0, n_estimators=451, num_leaves=113,\n",
" objective='regression', reg_alpha=8.352751749829367e-10,\n",
" reg_lambda=0.13991138691596908)"
]
},
"metadata": {},
"execution_count": 10
}
],
"source": [
"automl.model"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"''' pickle and save the best model '''\n",
"import pickle\n",
"with open('best_model.pkl', 'wb') as f:\n",
" pickle.dump(automl.model, f, pickle.HIGHEST_PROTOCOL)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Predicted labels [147056.672508 246591.18821626 155253.69332074 ... 196516.76693923\n 235571.37776252 270133.77185961]\nTrue labels [136900. 241300. 200700. ... 160900. 227300. 265600.]\n"
]
}
],
"source": [
"''' compute predictions of testing dataset ''' \n",
"y_pred = automl.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"r2 = 0.8503723727607084\nmse = 1977853769.4384706\nmae = 29258.487121555943\n"
]
}
],
"source": [
"''' compute different metric values on testing dataset'''\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))\n",
"print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))\n",
"print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 20.0, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 1e-10, 'reg_lambda': 1.0}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 20.0, 'learning_rate': 0.1, 'subsample': 1.0, 'log_max_bin': 8, 'colsample_bytree': 1.0, 'reg_alpha': 1e-10, 'reg_lambda': 1.0}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 0.46335414315327306, 'subsample': 0.9339389930838808, 'log_max_bin': 10.0, 'colsample_bytree': 0.9904286645657556, 'reg_alpha': 2.841147337412889e-10, 'reg_lambda': 0.12000833497054482}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 4.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 0.46335414315327306, 'subsample': 0.9339389930838808, 'log_max_bin': 10.0, 'colsample_bytree': 0.9904286645657556, 'reg_alpha': 2.841147337412889e-10, 'reg_lambda': 0.12000833497054482}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 20.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 1.0, 'subsample': 0.9917683183663918, 'log_max_bin': 10.0, 'colsample_bytree': 0.9858892907525497, 'reg_alpha': 3.8783982645515837e-10, 'reg_lambda': 0.36607431863072826}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 20.0, 'max_leaves': 4.0, 'min_child_weight': 20.0, 'learning_rate': 1.0, 'subsample': 0.9917683183663918, 'log_max_bin': 10.0, 'colsample_bytree': 0.9858892907525497, 'reg_alpha': 3.8783982645515837e-10, 'reg_lambda': 0.36607431863072826}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 11.0, 'max_leaves': 15.0, 'min_child_weight': 14.947587304572773, 'learning_rate': 0.6092558236172073, 'subsample': 0.9659256891661986, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 3.816590663384559e-08, 'reg_lambda': 0.4482946615262561}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 11.0, 'max_leaves': 15.0, 'min_child_weight': 14.947587304572773, 'learning_rate': 0.6092558236172073, 'subsample': 0.9659256891661986, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 3.816590663384559e-08, 'reg_lambda': 0.4482946615262561}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 22.0, 'max_leaves': 44.0, 'min_child_weight': 8.295709769360025, 'learning_rate': 0.8096645680737932, 'subsample': 0.9506809897636022, 'log_max_bin': 9.0, 'colsample_bytree': 0.9671874874371171, 'reg_alpha': 1.7301741960564346e-06, 'reg_lambda': 0.0977230117487556}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 22.0, 'max_leaves': 44.0, 'min_child_weight': 8.295709769360025, 'learning_rate': 0.8096645680737932, 'subsample': 0.9506809897636022, 'log_max_bin': 9.0, 'colsample_bytree': 0.9671874874371171, 'reg_alpha': 1.7301741960564346e-06, 'reg_lambda': 0.0977230117487556}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 9.0, 'max_leaves': 245.0, 'min_child_weight': 4.208492943400939, 'learning_rate': 0.26609407333531715, 'subsample': 0.9704154473613615, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 1.6843847487782941e-09, 'reg_lambda': 1.0}, 'Best Learner': 'lgbm', 'Best Hyper-parameters': {'n_estimators': 9.0, 'max_leaves': 245.0, 'min_child_weight': 4.208492943400939, 'learning_rate': 0.26609407333531715, 'subsample': 0.9704154473613615, 'log_max_bin': 10.0, 'colsample_bytree': 1.0, 'reg_alpha': 1.6843847487782941e-09, 'reg_lambda': 1.0}}\n{'Current Learner': 'lgbm', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 25.0, 'max_leaves': 403.0, 'min_child_weight': 5.55091634143324, 'learning_rate': 0.12666913709918798, 'subsample': 1.0, 'log_max_bin': 9.0, 'colsample_bytr
]
}
],
"source": [
"from flaml.data import get_output_from_log\n",
"time_history, best_valid_loss_history, valid_loss_history, config_history, train_loss_history = \\\n",
" get_output_from_log(filename = settings['log_file_name'], time_budget = 60)\n",
"\n",
"for config in config_history:\n",
" print(config)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<!-- Created with matplotlib (https://matplotlib.org/) -->\r\n<svg height=\"277.314375pt\" version=\"1.1\" viewBox=\"0 0 385.78125 277.314375\" width=\"385.78125pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <defs>\r\n <style type=\"text/css\">\r\n*{stroke-linecap:butt;stroke-linejoin:round;}\r\n </style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 277.314375 \r\nL 385.78125 277.314375 \r\nL 385.78125 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:none;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 43.78125 239.758125 \r\nL 378.58125 239.758125 \r\nL 378.58125 22.318125 \r\nL 43.78125 22.318125 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"PathCollection_1\">\r\n <defs>\r\n <path d=\"M 0 3 \r\nC 0.795609 3 1.55874 2.683901 2.12132 2.12132 \r\nC 2.683901 1.55874 3 0.795609 3 0 \r\nC 3 -0.795609 2.683901 -1.55874 2.12132 -2.12132 \r\nC 1.55874 -2.683901 0.795609 -3 0 -3 \r\nC -0.795609 -3 -1.55874 -2.683901 -2.12132 -2.12132 \r\nC -2.683901 -1.55874 -3 -0.795609 -3 0 \r\nC -3 0.795609 -2.683901 1.55874 -2.12132 2.12132 \r\nC -1.55874 2.683901 -0.795609 3 0 3 \r\nz\r\n\" id=\"mc8caa135c5\" style=\"stroke:#1f77b4;\"/>\r\n </defs>\r\n <g clip-path=\"url(#p688fc51ca0)\">\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"58.999432\" xlink:href=\"#mc8caa135c5\" y=\"229.874489\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"60.539811\" xlink:href=\"#mc8caa135c5\" y=\"131.884705\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"62.969327\" xlink:href=\"#mc8caa135c5\" y=\"64.081953\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"64.852493\" xlink:href=\"#mc8caa135c5\" y=\"51.728855\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"72.89155\" xlink:href=\"#mc8caa135c5\" y=\"50.967974\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"79.013891\" xlink:href=\"#mc8caa135c5\" y=\"42.123928\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"92.017549\" xlink:href=\"#mc8caa135c5\" y=\"36.726087\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"160.160785\" xlink:href=\"#mc8caa135c5\" y=\"33.47822\"/>\r\n <use style=\"fill:#1f77b4;stroke:#1f77b4;\" x=\"363.363068\" xlink:href=\"#mc8caa135c5\" y=\"32.201761\"/>\r\n </g>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"mf71c50f617\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"58.109967\" xlink:href=\"#mf71c50f617\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 0 -->\r\n <defs>\r\n <path d=\"M 31.78125 66.40625 \r\nQ 24.171875 66.40625 20.328125 58.90625 \r\nQ 16.5 51.421875 16.5 36.375 \r\nQ 16.5 21.390625 20.328125 13.890625 \r\nQ 24.171875 6.390625 31.78125 6.390625 \r\nQ 39.453125 6.390625 43.28125 13.890625 \r\nQ 47.125 21.390625 47.125 36.375 \r\nQ 47.125 51.421875 43.28125 58.90625 \r\nQ 39.453125 66.40625 31.78125 66.40625 \r\nz\r\nM 31.78125 74.21875 \r\nQ 44.046875 74.21875 50.515625 64.515625 \r\nQ 56.984375 54.828125 56.984375 36.375 \r\nQ 56.984375 17.96875 50.515625 8.265625 \r\nQ 44.046875 -1.421875 31.78125 -1.421875 \r\nQ 19.53125 -1.421875 13.0625 8.265625 \r\nQ 6.59375 17.96875 6.59375 36.375 \r\nQ 6.59375 54.828125 13.0625 64.515625 \r\nQ 19.53125 74.21875 31.78125 74.21875 \r\nz\r\n\" id=\"DejaVuSans-48\"/>\r\n </defs>\r\n <g transform=\"translate(54.928717 254.356562)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_2\">\r\n <g id=\"line2d_2\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAcUElEQVR4nO3dfbxVZZ338c+XIyjmAxHY6AGEkvAhFRJtLJvUyUDTwHRMnYeGXhPaZNOME4bO2JO39+hw10zdWtzoOGTjsyJikWia2jiWYCBPSkNocsAUclDCkwj87j/WOrrY7L3P5pyz9j5nr+/79Tqvs9e1rr33b/Gwv3tda61rKSIwM7Pi6tfoAszMrLEcBGZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOArMqJH1I0qpG12GWJweB9VqSnpP0kUbWEBE/jYgxeb2+pAmSHpW0WdIGSY9I+nhe72dWjoPACk1SSwPf+2zgDuBGYBjwTuDLwBldeC1J8v9n6xL/w7E+R1I/SdMl/UrSbyXdLmlwZv0dkn4j6ZX02/YRmXWzJX1X0nxJW4CT0j2PL0pamj7nNkl7pf1PlNSWeX7Fvun6SyS9IGm9pL+SFJIOKbMNAr4JXBER10fEKxGxIyIeiYjPpH2+Kuk/Ms8Zmb7eHunyw5KulPQY8BpwmaRFJe/zd5LmpY/3lPR/JD0v6UVJMyUN7OZfhzUBB4H1RX8DTAY+DBwE/A9wbWb9j4DRwAHAL4CbSp5/PnAlsC/wn2nbOcBEYBRwFPCXVd6/bF9JE4GLgY8Ah6T1VTIGGA7cWaVPLf4cmEqyLf8XGCNpdGb9+cDN6eOrgfcAY9P6Wkn2QKzgHATWF10A/ENEtEXE68BXgbM7vilHxA0RsTmz7mhJ+2eef09EPJZ+A/992vbtiFgfES8D95J8WFZSqe85wL9HxIqIeA34WpXXeEf6+4Wat7q82en7bYuIV4B7gPMA0kA4FJiX7oF8Bvi7iHg5IjYD/xs4t5vvb03AQWB90cHA3ZI2SdoEPA1sB94pqUXSVemw0avAc+lzhmSev7bMa/4m8/g1YJ8q71+p70Elr13ufTr8Nv19YJU+tSh9j5tJg4Bkb2BuGkpDgb2BJzN/bvel7VZwDgLri9YCp0bEoMzPXhGxjuTDbxLJ8Mz+wMj0Oco8P68pd18gOejbYXiVvqtItuOsKn22kHx4d/iDMn1Kt+V+YIiksSSB0DEstBFoB47I/JntHxHVAs8KwkFgvV1/SXtlfvYAZgJXSjoYQNJQSZPS/vsCr5N8496bZPijXm4Hpkg6TNLeVBl/j2T+94uByyVNkbRfehD8BEmz0m5LgD+SNCId2rq0swIiYhvJcYcZwGDggbR9B3Ad8C+SDgCQ1CppQpe31pqGg8B6u/kk32Q7fr4KfAuYB9wvaTPwM+D9af8bgV8D64CV6bq6iIgfAd8GfgKsBh5PV71eof+dwCeBTwPrgReB/0Uyzk9EPADcBiwFngR+UGMpN5PsEd2RBkOHL6V1/SwdNvsxyUFrKzj5xjRm+ZB0GLAc2LPkA9msV/EegVkPknSmpAGS3k5yuua9DgHr7RwEZj3rAmAD8CuSM5k+29hyzDrnoSEzs4LzHoGZWcHt0egCdteQIUNi5MiRjS7DzKxPefLJJzdGRNkLCPtcEIwcOZJFixZ13tHMzN4k6deV1nloyMys4BwEZmYF5yAwMys4B4GZWcE5CMzMCq7PnTVkZlY0cxevY8aCVazf1M5BgwYybcIYJo9r7bHXdxCYmfVicxev49I5y2h/YzsA6za1c+mcZQA9FgYOAmuYvL/lmDWDGQtWvRkCHdrf2M6MBascBNa31eNbjlkzWL+pfbfau8JBYA1R6VvOJXcu5ZYnnm9QVWa9T/+WfmzdvmOX9oMGDeyx93AQ1ImHQXZW6dtMuX/wZkU2fPBAnt24hR2ZiaIH9m9h2oSeu7mcg6AOPAyyq4MGDWRdmTBoHTSQ2y44vgEVmfVePmuoCXgYZFd79e9HP5HrtxyzZjF5XGuuXxodBHXgYZBdDdlnTwDWvtzO1u07aPVwmVnDOAjqwMMgZtab5TrFhKSJklZJWi1pepn1+0u6V9JTklZImpJnPT1l7uJ1fPCqhxg1/Yd88KqHmLt4XdX+0yaMYWD/lp3aPAxiZr1FbnsEklqAa4FTgDZgoaR5EbEy0+1zwMqIOEPSUGCVpJsiYmtedXVXVw78drRfcudSD4OYWa+T59DQccDqiFgDIOlWYBKQDYIA9pUkYB/gZWBbjjV1W3cO/O7Zvx/jRgzycJCZ9Sp5Dg21Amszy21pW9Y1wGHAemAZ8IWI2OUIqqSpkhZJWrRhw4a86q1Jdw78Hn7gfkwa670AM+td8twjUJm2KFmeACwBTgbeDTwg6acR8epOT4qYBcwCGD9+fOlr1JUP/JpZs8lzj6ANGJ5ZHkbyzT9rCjAnEquBZ4FDc6yp23zg18yaTZ5BsBAYLWmUpAHAucC8kj7PA38MIOmdwBhgTY41ddvkca380yeOZEBL8kfXOmgg//SJI33g18z6rNyGhiJim6SLgAVAC3BDRKyQdGG6fiZwBTBb0jKSoaQvRcTGvGrqKZPHtb55YNjDQWbW1+V6QVlEzAfml7TNzDxeD3w0zxrMzKw6X1lchWcMNbMiKHQQVPug94yhZlYUhQ2Czj7oO7twbOULr3L4gfvVvW4zs55W2CDo7IO+3LUC8NaFY744zMyaRWGDoLMrhAdUuD2cLxwzs2aT6+yjvVml+312fND/89lH+cIxMyuEwgZBZ1cId1w41jpoIMIXjplZ8yrs0FAtU0PnfXs4M7PeoLBBAL5C2MwMCjw0ZGZmCQeBmVnBOQjMzArOQWBmVnCFPFicnWOof0s/hg8uf02BmVkRFG6PoGOOoXWb2gmSK4mf3biFuYvXNbo0M7OGKFwQlJtjaEck7WZmRVS4IKg0x1CldjOzZle4IKg0x1CldjOzZle4IOhsjiEzs6Ip3FlDtcwxZGZWJIULAvAcQ2ZmWYUbGjIzs505CMzMCs5BYGZWcA4CM7OCcxCYmRVcrkEgaaKkVZJWS5peZv00SUvSn+WStksanGdNZma2s9yCQFILcC1wKnA4cJ6kw7N9ImJGRIyNiLHApcAjEfFyXjWZmdmu8twjOA5YHRFrImIrcCswqUr/84BbcqzHzMzKyDMIWoG1meW2tG0XkvYGJgJ3VVg/VdIiSYs2bNjQ44WamRVZnkGgMm1Roe8ZwGOVhoUiYlZEjI+I8UOHDu2xAs3MLN8gaAOGZ5aHAesr9D0XDwuZmTVEnkGwEBgtaZSkASQf9vNKO0naH/gwcE+OtZiZWQW5TToXEdskXQQsAFqAGyJihaQL0/Uz065nAvdHxJa8ajEzs8pynX00IuYD80vaZpYszwZm51mHmZlV5iuLzcwKzkFgZlZwDgIzs4JzEJiZFZyDwMys4BwEZmYF5yAwMys4B4GZWcE5CMzMCs5BYGZWcA4CM7OCcxCYmRWcg8DMrOAcBGZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgqsaBJL2k/TuMu1H5VeSmZnVU8UgkHQO8Axwl6QVko7NrJ6dd2FmZlYf1fYILgOOiYixwBTg+5I+ka5T7pWZmVld7FFlXUtEvAAQEU9IOgn4gaRhQNSlOjMzy121PYLN2eMDaSicCEwCjsi5LjMzq5NqewSfpWQIKCI2S5oInJNrVWZmVjcV9wgi4ingWUk/Lml/IyJuyr0yMzOri6qnj0bEduA1Sft35cUlTZS0StJqSdMr9DlR0pL0zKRHuvI+ZmbWddWGhjr8Hlgm6QFgS0djRPxNtSdJagGuBU4B2oCFkuZFxMpMn0HAd4CJEfG8pAO6sA1mZtYNtQTBD9Of3XUcsDoi1gBIupXkQPPKTJ/zgTkR8TxARLzUhfcxM7Nu6DQIIuJ7XXztVmBtZrkNeH9Jn/cA/SU9DOwLfCsibix9IUlTgakAI0aM6GI5ZmZWTp5zDZW76Kz0+oM9gGOAjwETgMslvWeXJ0XMiojxETF+6NC
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"plt.title('Learning Curve')\n",
"plt.xlabel('Wall Clock Time (s)')\n",
"plt.ylabel('Validation r2')\n",
"plt.scatter(time_history, 1-np.array(valid_loss_history))\n",
"plt.step(time_history, 1-np.array(best_valid_loss_history), where='post')\n",
"plt.show()"
]
},
{
"source": [
"## 3. Comparison with alternatives\n",
"\n",
"### FLAML's accuracy"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"flaml r2 = 0.8503723727607084\n"
]
}
],
"source": [
"print('flaml r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
},
{
"source": [
"### Default LightGBM"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from lightgbm import LGBMRegressor\n",
"lgbm = LGBMRegressor()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"LGBMRegressor()"
]
},
"metadata": {},
"execution_count": 18
}
],
"source": [
"lgbm.fit(X_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"default lgbm r2 = 0.8296179648694404\n"
]
}
],
"source": [
"y_pred = lgbm.predict(X_test)\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('default lgbm r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
},
{
"source": [
"### Optuna LightGBM Tuner"
],
"cell_type": "markdown",
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install optuna==2.5.0;"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"train_x, val_x, train_y, val_y = train_test_split(X_train, y_train, test_size=0.1)\n",
"import optuna.integration.lightgbm as lgb\n",
"dtrain = lgb.Dataset(train_x, label=train_y)\n",
"dval = lgb.Dataset(val_x, label=val_y)\n",
"params = {\n",
" \"objective\": \"regression\",\n",
" \"metric\": \"regression\",\n",
" \"verbosity\": -1,\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"tags": [
"outputPrepend"
]
},
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"875128.6166067 and parameters: {'feature_fraction': 0.5}. Best is trial 0 with value: 2131729337.83384.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 43%|####2 | 3/7 [00:06<00:08, 2.23s/it]\u001b[32m[I 2021-02-22 14:39:47,252]\u001b[0m Trial 2 finished with value: 1949307059.499325 and parameters: {'feature_fraction': 0.8999999999999999}. Best is trial 2 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 57%|#####7 | 4/7 [00:09<00:06, 2.26s/it]\u001b[32m[I 2021-02-22 14:39:49,566]\u001b[0m Trial 3 finished with value: 1991236553.1444218 and parameters: {'feature_fraction': 0.7}. Best is trial 2 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 71%|#######1 | 5/7 [00:11<00:04, 2.21s/it]\u001b[32m[I 2021-02-22 14:39:51,687]\u001b[0m Trial 4 finished with value: 1988181425.985298 and parameters: {'feature_fraction': 1.0}. Best is trial 2 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 86%|########5 | 6/7 [00:13<00:02, 2.13s/it]\u001b[32m[I 2021-02-22 14:39:53,649]\u001b[0m Trial 5 finished with value: 1991236553.1444218 and parameters: {'feature_fraction': 0.8}. Best is trial 2 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 100%|##########| 7/7 [00:14<00:00, 2.04s/it]\u001b[32m[I 2021-02-22 14:39:55,505]\u001b[0m Trial 6 finished with value: 1985494931.14108 and parameters: {'feature_fraction': 0.6}. Best is trial 2 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction, val_score: 1949307059.499325: 100%|##########| 7/7 [00:15<00:00, 2.14s/it]\n",
"num_leaves, val_score: 1949307059.499325: 5%|5 | 1/20 [00:00<00:13, 1.37it/s]\u001b[32m[I 2021-02-22 14:39:56,251]\u001b[0m Trial 7 finished with value: 2193886138.5860405 and parameters: {'num_leaves': 5}. Best is trial 7 with value: 2193886138.5860405.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 10%|# | 2/20 [00:10<01:48, 6.04s/it]\u001b[32m[I 2021-02-22 14:40:06,007]\u001b[0m Trial 8 finished with value: 2027098939.2209685 and parameters: {'num_leaves': 200}. Best is trial 8 with value: 2027098939.2209685.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 15%|#5 | 3/20 [00:20<02:10, 7.70s/it]\u001b[32m[I 2021-02-22 14:40:15,684]\u001b[0m Trial 9 finished with value: 2035885641.3675568 and parameters: {'num_leaves': 205}. Best is trial 8 with value: 2027098939.2209685.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 20%|## | 4/20 [00:21<01:21, 5.10s/it]\u001b[32m[I 2021-02-22 14:40:16,785]\u001b[0m Trial 10 finished with value: 2004366052.4067948 and parameters: {'num_leaves': 16}. Best is trial 10 with value: 2004366052.4067948.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 25%|##5 | 5/20 [00:21<00:52, 3.50s/it]\u001b[32m[I 2021-02-22 14:40:17,456]\u001b[0m Trial 11 finished with value: 2193886138.5860405 and parameters: {'num_leaves': 5}. Best is trial 10 with value: 2004366052.4067948.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 30%|### | 6/20 [00:23<00:41, 2.98s/it]\u001b[32m[I 2021-02-22 14:40:19,430]\u001b[0m Trial 12 finished with value: 2002345051.114594 and parameters: {'num_leaves': 42}. Best is trial 12 with value: 2002345051.114594.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 35%|###5 | 7/20 [00:33<01:05, 5.06s/it]\u001b[32m[I 2021-02-22 14:40:28,770]\u001b[0m Trial 13 finished with value: 2056130771.9329934 and parameters: {'num_leaves': 256}. Best is trial 12 with value: 2002345051.114594.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 40%|#### | 8/20 [00:44<01:22, 6.87s/it]\u001b[32m[I 2021-02-22 14:40:39,523]\u001b[0m Trial 14 finished with value: 2071021341.6650958 and parameters: {'num_leaves': 242}. Best is trial 12 with value: 2002345051.114594.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 45%|####5 | 9/20 [00:50<01:13, 6.64s/it]\u001b[32m[I 2021-02-22 14:40:45,668]\u001b[0m Trial 15 finished with value: 1987944577.546844 and parameters: {'num_leaves': 121}. Best is trial 15 with value: 1987944577.546844.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 50%|##### | 10/20 [00:59<01:15, 7.57s/it]\u001b[32m[I 2021-02-22 14:40:55,309]\u001b[0m Trial 16 finished with value: 2043161650.035548 and parameters: {'num_leaves': 222}. Best is trial 15 with value: 1987944577.546844.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 55%|#####5 | 11/20 [01:03<00:57, 6.43s/it]\u001b[32m[I 2021-02-22 14:40:59,172]\u001b[0m Trial 17 finished with value: 2029987579.182447 and parameters: {'num_leaves': 104}. Best is trial 15 with value: 1987944577.546844.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 60%|###### | 12/20 [01:07<00:45, 5.63s/it]\u001b[32m[I 2021-02-22 14:41:02,967]\u001b[0m Trial 18 finished with value: 2012583295.5343304 and parameters: {'num_leaves': 84}. Best is trial 15 with value: 1987944577.546844.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 65%|######5 | 13/20 [01:09<00:32, 4.64s/it]\u001b[32m[I 2021-02-22 14:41:05,314]\u001b[0m Trial 19 finished with value: 1981788985.8686044 and parameters: {'num_leaves': 56}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 70%|####### | 14/20 [01:15<00:30, 5.10s/it]\u001b[32m[I 2021-02-22 14:41:11,497]\u001b[0m Trial 20 finished with value: 2023959326.6503484 and parameters: {'num_leaves': 155}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 75%|#######5 | 15/20 [01:18<00:22, 4.43s/it]\u001b[32m[I 2021-02-22 14:41:14,365]\u001b[0m Trial 21 finished with value: 2017760258.9167733 and parameters: {'num_leaves': 69}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 80%|######## | 16/20 [01:24<00:18, 4.66s/it]\u001b[32m[I 2021-02-22 14:41:19,569]\u001b[0m Trial 22 finished with value: 2001975542.06975 and parameters: {'num_leaves': 141}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 85%|########5 | 17/20 [01:28<00:13, 4.46s/it]\u001b[32m[I 2021-02-22 14:41:23,567]\u001b[0m Trial 23 finished with value: 2003714379.9130254 and parameters: {'num_leaves': 111}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 90%|######### | 18/20 [01:33<00:09, 4.88s/it]\u001b[32m[I 2021-02-22 14:41:29,431]\u001b[0m Trial 24 finished with value: 2049456748.8392146 and parameters: {'num_leaves': 171}. Best is trial 19 with value: 1981788985.8686044.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 95%|#########5| 19/20 [01:36<00:04, 4.20s/it]\u001b[32m[I 2021-02-22 14:41:32,036]\u001b[0m Trial 25 finished with value: 1956603615.1475646 and parameters: {'num_leaves': 50}. Best is trial 25 with value: 1956603615.1475646.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 100%|##########| 20/20 [01:40<00:00, 4.03s/it]\u001b[32m[I 2021-02-22 14:41:35,682]\u001b[0m Trial 26 finished with value: 1956603615.1475646 and parameters: {'num_leaves': 50}. Best is trial 25 with value: 1956603615.1475646.\u001b[0m\n",
"num_leaves, val_score: 1949307059.499325: 100%|##########| 20/20 [01:40<00:00, 5.01s/it]\n",
"bagging, val_score: 1949307059.499325: 10%|# | 1/10 [00:02<00:25, 2.89s/it]\u001b[32m[I 2021-02-22 14:41:38,583]\u001b[0m Trial 27 finished with value: 2058040724.0398781 and parameters: {'bagging_fraction': 0.6789607888490847, 'bagging_freq': 1}. Best is trial 27 with value: 2058040724.0398781.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 20%|## | 2/10 [00:05<00:22, 2.77s/it]\u001b[32m[I 2021-02-22 14:41:41,271]\u001b[0m Trial 28 finished with value: 2058412265.7710927 and parameters: {'bagging_fraction': 0.6382384586067118, 'bagging_freq': 2}. Best is trial 27 with value: 2058040724.0398781.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 30%|### | 3/10 [00:08<00:20, 2.87s/it]\u001b[32m[I 2021-02-22 14:41:44,265]\u001b[0m Trial 29 finished with value: 2022566859.4286194 and parameters: {'bagging_fraction': 0.9689975648970396, 'bagging_freq': 3}. Best is trial 29 with value: 2022566859.4286194.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 40%|#### | 4/10 [00:11<00:17, 2.94s/it]\u001b[32m[I 2021-02-22 14:41:47,318]\u001b[0m Trial 30 finished with value: 2019125904.4126759 and parameters: {'bagging_fraction': 0.7947264008869785, 'bagging_freq': 5}. Best is trial 30 with value: 2019125904.4126759.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 50%|##### | 5/10 [00:14<00:14, 2.87s/it]\u001b[32m[I 2021-02-22 14:41:50,057]\u001b[0m Trial 31 finished with value: 1987043205.8620193 and parameters: {'bagging_fraction': 0.8589564136583515, 'bagging_freq': 7}. Best is trial 31 with value: 1987043205.8620193.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 60%|###### | 6/10 [00:16<00:11, 2.77s/it]\u001b[32m[I 2021-02-22 14:41:52,636]\u001b[0m Trial 32 finished with value: 1995986447.1995134 and parameters: {'bagging_fraction': 0.7967369529423736, 'bagging_freq': 1}. Best is trial 31 with value: 1987043205.8620193.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 70%|####### | 7/10 [00:19<00:08, 2.67s/it]\u001b[32m[I 2021-02-22 14:41:55,113]\u001b[0m Trial 33 finished with value: 1955997102.255711 and parameters: {'bagging_fraction': 0.8942338061809765, 'bagging_freq': 6}. Best is trial 33 with value: 1955997102.255711.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 80%|######## | 8/10 [00:22<00:05, 2.66s/it]\u001b[32m[I 2021-02-22 14:41:57,747]\u001b[0m Trial 34 finished with value: 2048654658.4711766 and parameters: {'bagging_fraction': 0.5987476817554606, 'bagging_freq': 2}. Best is trial 33 with value: 1955997102.255711.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 90%|######### | 9/10 [00:24<00:02, 2.66s/it]\u001b[32m[I 2021-02-22 14:42:00,415]\u001b[0m Trial 35 finished with value: 1972462964.5654855 and parameters: {'bagging_fraction': 0.9885365644176932, 'bagging_freq': 3}. Best is trial 33 with value: 1955997102.255711.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 100%|##########| 10/10 [00:27<00:00, 2.64s/it]\u001b[32m[I 2021-02-22 14:42:02,999]\u001b[0m Trial 36 finished with value: 2017973500.1911747 and parameters: {'bagging_fraction': 0.7252191625113514, 'bagging_freq': 3}. Best is trial 33 with value: 1955997102.255711.\u001b[0m\n",
"bagging, val_score: 1949307059.499325: 100%|##########| 10/10 [00:27<00:00, 2.73s/it]\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 17%|#6 | 1/6 [00:02<00:11, 2.20s/it]\u001b[32m[I 2021-02-22 14:42:05,220]\u001b[0m Trial 37 finished with value: 1949307059.499325 and parameters: {'feature_fraction': 0.9159999999999999}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 33%|###3 | 2/6 [00:04<00:08, 2.16s/it]\u001b[32m[I 2021-02-22 14:42:07,344]\u001b[0m Trial 38 finished with value: 1949307059.499325 and parameters: {'feature_fraction': 0.8839999999999999}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 50%|##### | 3/6 [00:06<00:06, 2.30s/it]\u001b[32m[I 2021-02-22 14:42:09,820]\u001b[0m Trial 39 finished with value: 1988181425.985298 and parameters: {'feature_fraction': 0.948}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 67%|######6 | 4/6 [00:09<00:04, 2.30s/it]\u001b[32m[I 2021-02-22 14:42:12,114]\u001b[0m Trial 40 finished with value: 1988181425.985298 and parameters: {'feature_fraction': 0.9799999999999999}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 83%|########3 | 5/6 [00:11<00:02, 2.33s/it]\u001b[32m[I 2021-02-22 14:42:14,504]\u001b[0m Trial 41 finished with value: 1949307059.499325 and parameters: {'feature_fraction': 0.852}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 100%|##########| 6/6 [00:13<00:00, 2.28s/it]\u001b[32m[I 2021-02-22 14:42:16,674]\u001b[0m Trial 42 finished with value: 1949307059.499325 and parameters: {'feature_fraction': 0.82}. Best is trial 37 with value: 1949307059.499325.\u001b[0m\n",
"feature_fraction_stage2, val_score: 1949307059.499325: 100%|##########| 6/6 [00:13<00:00, 2.28s/it]\n",
"regularization_factors, val_score: 1949307038.475713: 5%|5 | 1/20 [00:02<00:41, 2.18s/it]\u001b[32m[I 2021-02-22 14:42:18,861]\u001b[0m Trial 43 finished with value: 1949307038.475713 and parameters: {'lambda_l1': 0.09101144524819704, 'lambda_l2': 1.7704703083488795e-08}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 10%|# | 2/20 [00:04<00:39, 2.21s/it]\u001b[32m[I 2021-02-22 14:42:21,099]\u001b[0m Trial 44 finished with value: 1995873001.2074113 and parameters: {'lambda_l1': 2.6557868173633803e-07, 'lambda_l2': 5.183750661759111}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 15%|#5 | 3/20 [00:06<00:37, 2.21s/it]\u001b[32m[I 2021-02-22 14:42:23,296]\u001b[0m Trial 45 finished with value: 1965414514.3746212 and parameters: {'lambda_l1': 0.7263427156399412, 'lambda_l2': 0.0013813379781067797}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 20%|## | 4/20 [00:08<00:34, 2.18s/it]\u001b[32m[I 2021-02-22 14:42:25,439]\u001b[0m Trial 46 finished with value: 1954274712.97174 and parameters: {'lambda_l1': 7.578182163631005e-05, 'lambda_l2': 0.00021084910200530688}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 25%|##5 | 5/20 [00:10<00:32, 2.16s/it]\u001b[32m[I 2021-02-22 14:42:27,566]\u001b[0m Trial 47 finished with value: 1965414460.2095408 and parameters: {'lambda_l1': 7.072150718391867e-06, 'lambda_l2': 0.0014472465664563316}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 30%|### | 6/20 [00:12<00:29, 2.14s/it]\u001b[32m[I 2021-02-22 14:42:29,671]\u001b[0m Trial 48 finished with value: 1949307057.598466 and parameters: {'lambda_l1': 4.951731906814475e-06, 'lambda_l2': 5.544032280720154e-07}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 35%|###5 | 7/20 [00:15<00:27, 2.14s/it]\u001b[32m[I 2021-02-22 14:42:31,794]\u001b[0m Trial 49 finished with value: 2021008326.2503476 and parameters: {'lambda_l1': 0.002348352296820926, 'lambda_l2': 0.143195505557829}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 40%|#### | 8/20 [00:17<00:25, 2.16s/it]\u001b[32m[I 2021-02-22 14:42:34,011]\u001b[0m Trial 50 finished with value: 1957224797.2084112 and parameters: {'lambda_l1': 1.5908622169872778e-07, 'lambda_l2': 0.7751981347764194}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949307038.475713: 45%|####5 | 9/20 [00:19<00:23, 2.16s/it]\u001b[32m[I 2021-02-22 14:42:36,177]\u001b[0m Trial 51 finished with value: 1958710530.5581698 and parameters: {'lambda_l1': 1.997720402454698e-06, 'lambda_l2': 0.3916021711591694}. Best is trial 43 with value: 1949307038.475713.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 50%|##### | 10/20 [00:21<00:21, 2.16s/it]\u001b[32m[I 2021-02-22 14:42:38,329]\u001b[0m Trial 52 finished with value: 1949011993.9609785 and parameters: {'lambda_l1': 1.9741002172068045e-06, 'lambda_l2': 0.0009225859572515588}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 55%|#####5 | 11/20 [00:25<00:23, 2.64s/it]\u001b[32m[I 2021-02-22 14:42:42,063]\u001b[0m Trial 53 finished with value: 1949306996.8346448 and parameters: {'lambda_l1': 1.4927989272260208e-08, 'lambda_l2': 1.815720293048345e-05}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 60%|###### | 12/20 [00:27<00:20, 2.52s/it]\u001b[32m[I 2021-02-22 14:42:44,299]\u001b[0m Trial 54 finished with value: 1949306999.1035354 and parameters: {'lambda_l1': 1.40977148038626e-08, 'lambda_l2': 1.7685279853730492e-05}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 65%|######5 | 13/20 [00:29<00:17, 2.46s/it]\u001b[32m[I 2021-02-22 14:42:46,634]\u001b[0m Trial 55 finished with value: 1949307044.7417297 and parameters: {'lambda_l1': 1.5463768519498943e-08, 'lambda_l2': 4.306626722644303e-06}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 70%|####### | 14/20 [00:32<00:14, 2.39s/it]\u001b[32m[I 2021-02-22 14:42:48,870]\u001b[0m Trial 56 finished with value: 1989218170.003291 and parameters: {'lambda_l1': 1.0022923369342419e-08, 'lambda_l2': 0.0045395189861971285}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 75%|#######5 | 15/20 [00:34<00:11, 2.34s/it]\u001b[32m[I 2021-02-22 14:42:51,095]\u001b[0m Trial 57 finished with value: 1949306886.7297008 and parameters: {'lambda_l1': 0.0002408003105743025, 'lambda_l2': 5.00826403625466e-05}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 80%|######## | 16/20 [00:36<00:08, 2.17s/it]\u001b[32m[I 2021-02-22 14:42:52,863]\u001b[0m Trial 58 finished with value: 1976046665.7503545 and parameters: {'lambda_l1': 0.0012739301862614585, 'lambda_l2': 0.015123687061631018}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 85%|########5 | 17/20 [00:37<00:05, 1.96s/it]\u001b[32m[I 2021-02-22 14:42:54,331]\u001b[0m Trial 59 finished with value: 1949307059.0756855 and parameters: {'lambda_l1': 0.000111924254335766, 'lambda_l2': 1.2760174203841509e-07}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 90%|######### | 18/20 [00:39<00:03, 1.80s/it]\u001b[32m[I 2021-02-22 14:42:55,768]\u001b[0m Trial 60 finished with value: 1954274496.2320116 and parameters: {'lambda_l1': 0.011232645128874983, 'lambda_l2': 0.00027341062705489946}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 95%|#########5| 19/20 [00:40<00:01, 1.76s/it]\u001b[32m[I 2021-02-22 14:42:57,426]\u001b[0m Trial 61 finished with value: 1949306916.0588307 and parameters: {'lambda_l1': 3.406557565523097e-05, 'lambda_l2': 4.1565414801737324e-05}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 100%|##########| 20/20 [00:42<00:00, 1.78s/it]\u001b[32m[I 2021-02-22 14:42:59,261]\u001b[0m Trial 62 finished with value: 1949868047.0339825 and parameters: {'lambda_l1': 7.935752792484713, 'lambda_l2': 1.1227100513172882e-06}. Best is trial 52 with value: 1949011993.9609785.\u001b[0m\n",
"regularization_factors, val_score: 1949011993.960979: 100%|##########| 20/20 [00:42<00:00, 2.13s/it]\n",
"min_data_in_leaf, val_score: 1949011993.960979: 20%|## | 1/5 [00:01<00:06, 1.62s/it]\u001b[32m[I 2021-02-22 14:43:00,901]\u001b[0m Trial 63 finished with value: 1989752975.1835222 and parameters: {'min_child_samples': 5}. Best is trial 63 with value: 1989752975.1835222.\u001b[0m\n",
"min_data_in_leaf, val_score: 1949011993.960979: 40%|#### | 2/5 [00:03<00:04, 1.64s/it]\u001b[32m[I 2021-02-22 14:43:02,562]\u001b[0m Trial 64 finished with value: 1970804604.9243653 and parameters: {'min_child_samples': 50}. Best is trial 64 with value: 1970804604.9243653.\u001b[0m\n",
"min_data_in_leaf, val_score: 1949011993.960979: 60%|###### | 3/5 [00:04<00:03, 1.63s/it]\u001b[32m[I 2021-02-22 14:43:04,166]\u001b[0m Trial 65 finished with value: 1998195068.5676072 and parameters: {'min_child_samples': 25}. Best is trial 64 with value: 1970804604.9243653.\u001b[0m\n",
"min_data_in_leaf, val_score: 1949011993.960979: 80%|######## | 4/5 [00:06<00:01, 1.68s/it]\u001b[32m[I 2021-02-22 14:43:05,928]\u001b[0m Trial 66 finished with value: 2013164530.6381345 and parameters: {'min_child_samples': 100}. Best is trial 64 with value: 1970804604.9243653.\u001b[0m\n",
"min_data_in_leaf, val_score: 1915185243.192194: 100%|##########| 5/5 [00:08<00:00, 1.61s/it]\u001b[32m[I 2021-02-22 14:43:07,426]\u001b[0m Trial 67 finished with value: 1915185243.1921945 and parameters: {'min_child_samples': 10}. Best is trial 67 with value: 1915185243.1921945.\u001b[0m\n",
"min_data_in_leaf, val_score: 1915185243.192194: 100%|##########| 5/5 [00:08<00:00, 1.63s/it]Wall time: 3min 26s\n",
"\n"
]
}
],
"source": [
"%%time\n",
"model = lgb.train(params, dtrain, valid_sets=[dtrain, dval], verbose_eval=10000) \n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Optuna LightGBM Tuner r2 = 0.8476245395516778\n"
]
}
],
"source": [
"y_pred = model.predict(X_test)\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('Optuna LightGBM Tuner r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3.7.7 64-bit ('flaml': conda)",
"metadata": {
"interpreter": {
"hash": "bfcd9a6a9254a5e160761a1fd7a9e444f011592c6770d9f4180dde058a9df5dd"
}
}
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7-final"
}
},
"nbformat": 4,
"nbformat_minor": 2
}