autogen/notebook/flaml_xgboost.ipynb

859 lines
64 KiB
Plaintext
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Tune XGBoost with FLAML Library\n",
"\n",
"\n",
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and cheap. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
" tuning tasks.\n",
"\n",
"In this notebook, we demonstrate how to use FLAML library to tune hyperparameters of XGBoost with a regression example.\n",
"\n",
"FLAML requires `Python>=3.6`. To run this notebook example, please install flaml with the `notebook` option:\n",
"```bash\n",
"pip install flaml[notebook]\n",
"```"
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"!pip install flaml[notebook];"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## 2. Regression Example\n",
"### Load data and preprocess\n",
"\n",
"Download [houses dataset](https://www.openml.org/d/537) from OpenML. The task is to predict median price of the house in the region based on demographic composition and a state of housing market in the region."
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 1,
"source": [
"from flaml.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir='./')"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"load dataset from ./openml_ds537.pkl\n",
"Dataset name: houses\n",
"X_train.shape: (15480, 8), y_train.shape: (15480,);\n",
"X_test.shape: (5160, 8), y_test.shape: (5160,)\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
}
},
{
"cell_type": "markdown",
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. "
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 2,
"source": [
"''' import AutoML class from flaml package '''\n",
"from flaml import AutoML\n",
"automl = AutoML()"
],
"outputs": [],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 3,
"source": [
"settings = {\n",
" \"time_budget\": 60, # total running time in seconds\n",
" \"metric\": 'r2', # primary metrics for regression can be chosen from: ['mae','mse','r2']\n",
" \"estimator_list\": ['xgboost'], # list of ML learners; we tune xgboost in this example\n",
" \"task\": 'regression', # task type \n",
" \"log_file_name\": 'houses_experiment.log', # flaml log file\n",
"}"
],
"outputs": [],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"'''The main flaml automl API'''\n",
"automl.fit(X_train=X_train, y_train=y_train, **settings)"
],
"outputs": [
{
"output_type": "stream",
"name": "stderr",
2021-05-08 02:50:50 +00:00
"text": [
"[flaml.automl: 09-29 23:06:46] {1446} INFO - Data split method: uniform\n",
"[flaml.automl: 09-29 23:06:46] {1450} INFO - Evaluation method: cv\n",
"[flaml.automl: 09-29 23:06:46] {1496} INFO - Minimizing error metric: 1-r2\n",
"[flaml.automl: 09-29 23:06:46] {1533} INFO - List of ML learners in AutoML Run: ['xgboost']\n",
"[flaml.automl: 09-29 23:06:46] {1763} INFO - iteration 0, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1880} INFO - Estimated sufficient time budget=2621s. Estimated necessary time budget=3s.\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 0.3s,\testimator xgboost's best error=2.1267,\tbest estimator xgboost's best error=2.1267\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 1, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 0.5s,\testimator xgboost's best error=2.1267,\tbest estimator xgboost's best error=2.1267\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 2, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 0.6s,\testimator xgboost's best error=0.8485,\tbest estimator xgboost's best error=0.8485\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 3, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 0.8s,\testimator xgboost's best error=0.3799,\tbest estimator xgboost's best error=0.3799\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 4, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 1.0s,\testimator xgboost's best error=0.3799,\tbest estimator xgboost's best error=0.3799\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 5, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:47] {1952} INFO - at 1.2s,\testimator xgboost's best error=0.3799,\tbest estimator xgboost's best error=0.3799\n",
"[flaml.automl: 09-29 23:06:47] {1763} INFO - iteration 6, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:48] {1952} INFO - at 1.5s,\testimator xgboost's best error=0.2992,\tbest estimator xgboost's best error=0.2992\n",
"[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 7, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:48] {1952} INFO - at 1.9s,\testimator xgboost's best error=0.2992,\tbest estimator xgboost's best error=0.2992\n",
"[flaml.automl: 09-29 23:06:48] {1763} INFO - iteration 8, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:49] {1952} INFO - at 2.2s,\testimator xgboost's best error=0.2992,\tbest estimator xgboost's best error=0.2992\n",
"[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 9, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:49] {1952} INFO - at 2.5s,\testimator xgboost's best error=0.2513,\tbest estimator xgboost's best error=0.2513\n",
"[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 10, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:49] {1952} INFO - at 2.8s,\testimator xgboost's best error=0.2513,\tbest estimator xgboost's best error=0.2513\n",
"[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 11, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:49] {1952} INFO - at 3.0s,\testimator xgboost's best error=0.2513,\tbest estimator xgboost's best error=0.2513\n",
"[flaml.automl: 09-29 23:06:49] {1763} INFO - iteration 12, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:50] {1952} INFO - at 3.3s,\testimator xgboost's best error=0.2113,\tbest estimator xgboost's best error=0.2113\n",
"[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 13, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:50] {1952} INFO - at 3.5s,\testimator xgboost's best error=0.2113,\tbest estimator xgboost's best error=0.2113\n",
"[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 14, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:50] {1952} INFO - at 4.0s,\testimator xgboost's best error=0.2090,\tbest estimator xgboost's best error=0.2090\n",
"[flaml.automl: 09-29 23:06:50] {1763} INFO - iteration 15, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:51] {1952} INFO - at 4.5s,\testimator xgboost's best error=0.2090,\tbest estimator xgboost's best error=0.2090\n",
"[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 16, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:51] {1952} INFO - at 5.2s,\testimator xgboost's best error=0.1919,\tbest estimator xgboost's best error=0.1919\n",
"[flaml.automl: 09-29 23:06:51] {1763} INFO - iteration 17, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:52] {1952} INFO - at 5.5s,\testimator xgboost's best error=0.1919,\tbest estimator xgboost's best error=0.1919\n",
"[flaml.automl: 09-29 23:06:52] {1763} INFO - iteration 18, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:54] {1952} INFO - at 8.0s,\testimator xgboost's best error=0.1797,\tbest estimator xgboost's best error=0.1797\n",
"[flaml.automl: 09-29 23:06:54] {1763} INFO - iteration 19, current learner xgboost\n",
"[flaml.automl: 09-29 23:06:55] {1952} INFO - at 9.0s,\testimator xgboost's best error=0.1797,\tbest estimator xgboost's best error=0.1797\n",
"[flaml.automl: 09-29 23:06:55] {1763} INFO - iteration 20, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:08] {1952} INFO - at 21.8s,\testimator xgboost's best error=0.1797,\tbest estimator xgboost's best error=0.1797\n",
"[flaml.automl: 09-29 23:07:08] {1763} INFO - iteration 21, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:11] {1952} INFO - at 24.4s,\testimator xgboost's best error=0.1797,\tbest estimator xgboost's best error=0.1797\n",
"[flaml.automl: 09-29 23:07:11] {1763} INFO - iteration 22, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:16] {1952} INFO - at 30.0s,\testimator xgboost's best error=0.1782,\tbest estimator xgboost's best error=0.1782\n",
"[flaml.automl: 09-29 23:07:16] {1763} INFO - iteration 23, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:20] {1952} INFO - at 33.5s,\testimator xgboost's best error=0.1782,\tbest estimator xgboost's best error=0.1782\n",
"[flaml.automl: 09-29 23:07:20] {1763} INFO - iteration 24, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:29] {1952} INFO - at 42.3s,\testimator xgboost's best error=0.1782,\tbest estimator xgboost's best error=0.1782\n",
"[flaml.automl: 09-29 23:07:29] {1763} INFO - iteration 25, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:30] {1952} INFO - at 43.2s,\testimator xgboost's best error=0.1782,\tbest estimator xgboost's best error=0.1782\n",
"[flaml.automl: 09-29 23:07:30] {1763} INFO - iteration 26, current learner xgboost\n",
"[flaml.automl: 09-29 23:07:50] {1952} INFO - at 63.4s,\testimator xgboost's best error=0.1663,\tbest estimator xgboost's best error=0.1663\n",
"[flaml.automl: 09-29 23:07:50] {2059} INFO - selected model: <xgboost.core.Booster object at 0x7f6399005910>\n",
"[flaml.automl: 09-29 23:07:55] {2122} INFO - retrain xgboost for 5.4s\n",
"[flaml.automl: 09-29 23:07:55] {2128} INFO - retrained model: <xgboost.core.Booster object at 0x7f6398fc0eb0>\n",
"[flaml.automl: 09-29 23:07:55] {1557} INFO - fit succeeded\n",
"[flaml.automl: 09-29 23:07:55] {1558} INFO - Time taken to find the best model: 63.427649974823\n",
"[flaml.automl: 09-29 23:07:55] {1569} WARNING - Time taken to find the best model is 106% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
}
},
{
"cell_type": "markdown",
"source": [
"### Best model and metric"
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 5,
"source": [
"''' retrieve best config'''\n",
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best r2 on validation data: {0:.4g}'.format(1 - automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"Best hyperparmeter config: {'n_estimators': 776, 'max_leaves': 160, 'min_child_weight': 32.57408640781376, 'learning_rate': 0.03478685333241491, 'subsample': 0.9152991332236934, 'colsample_bylevel': 0.5656764254642628, 'colsample_bytree': 0.7313266091895249, 'reg_alpha': 0.005771390107656191, 'reg_lambda': 1.4912667278658753}\n",
"Best r2 on validation data: 0.8337\n",
"Training duration of best run: 20.25 s\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
}
},
{
"cell_type": "code",
"execution_count": 6,
"source": [
"automl.model.estimator"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
2021-05-08 02:50:50 +00:00
"text/plain": [
"<xgboost.core.Booster at 0x7f6398fc0eb0>"
2021-05-08 02:50:50 +00:00
]
},
"metadata": {},
"execution_count": 6
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 7,
"source": [
"''' pickle and save the automl object '''\n",
"import pickle\n",
"with open('automl.pkl', 'wb') as f:\n",
" pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)"
],
"outputs": [],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "code",
"execution_count": 8,
"source": [
"''' compute predictions of testing dataset ''' \n",
"y_pred = automl.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"Predicted labels [137582.95 255519.23 139866.06 ... 185638.95 202493.78 269308.22]\n",
"True labels 14740 136900.0\n",
"10101 241300.0\n",
"20566 200700.0\n",
"2670 72500.0\n",
"15709 460000.0\n",
" ... \n",
"13132 121200.0\n",
"8228 137500.0\n",
"3948 160900.0\n",
"8522 227300.0\n",
"16798 265600.0\n",
"Name: median_house_value, Length: 5160, dtype: float64\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
}
},
{
"cell_type": "code",
"execution_count": 9,
"source": [
"''' compute different metric values on testing dataset'''\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))\n",
"print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))\n",
"print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"r2 = 0.8439648010832427\n",
"mse = 2062552297.5716143\n",
"mae = 30303.196008584666\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
}
},
{
"cell_type": "code",
"execution_count": 10,
"source": [
"from flaml.data import get_output_from_log\n",
"time_history, best_valid_loss_history, valid_loss_history, config_history, metric_history = \\\n",
" get_output_from_log(filename=settings['log_file_name'], time_budget=60)\n",
"\n",
"for config in config_history:\n",
" print(config)"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 0.9999999999999993, 'learning_rate': 0.09999999999999995, 'subsample': 1.0, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 0.9999999999999993, 'learning_rate': 0.09999999999999995, 'subsample': 1.0, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0009765625, 'reg_lambda': 1.0}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 0.26208115308159446, 'learning_rate': 0.25912534572860507, 'subsample': 0.9266743941610592, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.18096917948292954}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 0.26208115308159446, 'learning_rate': 0.25912534572860507, 'subsample': 0.9266743941610592, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013933617380144255, 'reg_lambda': 0.18096917948292954}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 1.8630223791106992, 'learning_rate': 1.0, 'subsample': 0.8513627344387318, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.946138073111236, 'reg_alpha': 0.0018311776973217071, 'reg_lambda': 0.27901659190538414}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 4, 'max_leaves': 4, 'min_child_weight': 1.8630223791106992, 'learning_rate': 1.0, 'subsample': 0.8513627344387318, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.946138073111236, 'reg_alpha': 0.0018311776973217071, 'reg_lambda': 0.27901659190538414}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 11, 'max_leaves': 4, 'min_child_weight': 5.909231502320296, 'learning_rate': 1.0, 'subsample': 0.8894434216129232, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013605736901132325, 'reg_lambda': 0.1222158118565165}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 11, 'max_leaves': 4, 'min_child_weight': 5.909231502320296, 'learning_rate': 1.0, 'subsample': 0.8894434216129232, 'colsample_bylevel': 1.0, 'colsample_bytree': 1.0, 'reg_alpha': 0.0013605736901132325, 'reg_lambda': 0.1222158118565165}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 11, 'max_leaves': 11, 'min_child_weight': 8.517629386811171, 'learning_rate': 1.0, 'subsample': 0.9233328006239466, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.9468117873770695, 'reg_alpha': 0.034996420228767956, 'reg_lambda': 0.6169079461473819}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 11, 'max_leaves': 11, 'min_child_weight': 8.517629386811171, 'learning_rate': 1.0, 'subsample': 0.9233328006239466, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.9468117873770695, 'reg_alpha': 0.034996420228767956, 'reg_lambda': 0.6169079461473819}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 20, 'max_leaves': 15, 'min_child_weight': 43.62419686983011, 'learning_rate': 0.6413547778096401, 'subsample': 1.0, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.8481188761562112, 'reg_alpha': 0.01241885232679939, 'reg_lambda': 0.21352682817916652}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 20, 'max_leaves': 15, 'min_child_weight': 43.62419686983011, 'learning_rate': 0.6413547778096401, 'subsample': 1.0, 'colsample_bylevel': 1.0, 'colsample_bytree': 0.8481188761562112, 'reg_alpha': 0.01241885232679939, 'reg_lambda': 0.21352682817916652}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 58, 'max_leaves': 8, 'min_child_weight': 51.84874392377363, 'learning_rate': 0.23511987355535005, 'subsample': 1.0, 'colsample_bylevel': 0.8182737361783602, 'colsample_bytree': 0.8031986460435498, 'reg_alpha': 0.00400039941928546, 'reg_lambda': 0.3870252968100477}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 58, 'max_leaves': 8, 'min_child_weight': 51.84874392377363, 'learning_rate': 0.23511987355535005, 'subsample': 1.0, 'colsample_bylevel': 0.8182737361783602, 'colsample_bytree': 0.8031986460435498, 'reg_alpha': 0.00400039941928546, 'reg_lambda': 0.3870252968100477}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 101, 'max_leaves': 14, 'min_child_weight': 7.444058088783045, 'learning_rate': 0.39220715578198356, 'subsample': 1.0, 'colsample_bylevel': 0.6274332478496758, 'colsample_bytree': 0.7190251742957809, 'reg_alpha': 0.007212902167942765, 'reg_lambda': 0.20172056689658158}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 101, 'max_leaves': 14, 'min_child_weight': 7.444058088783045, 'learning_rate': 0.39220715578198356, 'subsample': 1.0, 'colsample_bylevel': 0.6274332478496758, 'colsample_bytree': 0.7190251742957809, 'reg_alpha': 0.007212902167942765, 'reg_lambda': 0.20172056689658158}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 205, 'max_leaves': 30, 'min_child_weight': 5.450621032615104, 'learning_rate': 0.12229148765139466, 'subsample': 0.8895588746662894, 'colsample_bylevel': 0.47518959001130784, 'colsample_bytree': 0.6845612830806885, 'reg_alpha': 0.01126059820390593, 'reg_lambda': 0.08170816686602438}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 205, 'max_leaves': 30, 'min_child_weight': 5.450621032615104, 'learning_rate': 0.12229148765139466, 'subsample': 0.8895588746662894, 'colsample_bylevel': 0.47518959001130784, 'colsample_bytree': 0.6845612830806885, 'reg_alpha': 0.01126059820390593, 'reg_lambda': 0.08170816686602438}}\n",
"{'Current Learner': 'xgboost', 'Current Sample': 15480, 'Current Hyper-parameters': {'n_estimators': 222, 'max_leaves': 62, 'min_child_weight': 7.5054716192185795, 'learning_rate': 0.04623175582706431, 'subsample': 0.8756054034199897, 'colsample_bylevel': 0.44768367042684304, 'colsample_bytree': 0.7352307811741962, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.6207832675443758}, 'Best Learner': 'xgboost', 'Best Hyper-parameters': {'n_estimators': 222, 'max_leaves': 62, 'min_child_weight': 7.5054716192185795, 'learning_rate': 0.04623175582706431, 'subsample': 0.8756054034199897, 'colsample_bylevel': 0.44768367042684304, 'colsample_bytree': 0.7352307811741962, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.6207832675443758}}\n"
]
}
],
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
}
},
{
"cell_type": "code",
"execution_count": 11,
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"\n",
"plt.title('Learning Curve')\n",
"plt.xlabel('Wall Clock Time (s)')\n",
"plt.ylabel('Validation r2')\n",
"plt.scatter(time_history, 1 - np.array(valid_loss_history))\n",
"plt.step(time_history, 1 - np.array(best_valid_loss_history), where='post')\n",
"plt.show()"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEWCAYAAABIVsEJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8GearUAAAgAElEQVR4nO3dfZhVZb3/8ffHEQQzRYQMEUSPRGoW5GSX2Sk1De1XQmWm/U4HTaM6WefkFalZ2rHjOZSVp66fPZCZWpoPpEiFkYrag5qOovJgKKIpIyqKmA8kAt/fH+seWmz23rNnzezZe2Y+r+va16x1r3vt9V1sZn/nvu+17qWIwMzMrKu2aXQAZmbWNzmBmJlZIU4gZmZWiBOImZkV4gRiZmaFOIGYmVkhTiBmdSDpnyUta3QcZvXkBGL9jqRHJR3eyBgi4g8RMaFe7y9psqTfS3pB0mpJt0o6ul7HMyvHCcSsAEktDTz2McDVwKXA7sCuwFnABwq8lyT5e8AK8X8cGzAkbSPpdEkPS3pW0lWShue2Xy3pSUnPp7/u98ttu1jSDyTNk/QScGhq6XxR0v1pnyslDUn1D5G0Mrd/xbpp+5ckrZL0hKSTJYWkvcucg4DvAF+PiAsj4vmI2BQRt0bEJ1Odr0n6eW6fcen9tk3rt0g6V9KfgJeBGZLaSo7zBUlz0/J2kr4l6TFJT0n6oaSh3fw4rB9wArGB5HPAVODdwG7Ac8AFue3XA+OB1wH3AJeV7P8x4FzgtcAfU9mxwJHAnsCbgROqHL9sXUlHAqcChwN7A4dUeY8JwBhgdpU6tfg4MJ3sXH4ITJA0Prf9Y8DlaXkm8AZgYopvNFmLxwY4JxAbSD4NnBkRKyPiFeBrwDEdf5lHxEUR8UJu21sk7ZTb/7qI+FP6i//vqex7EfFERKwBfkX2JVtJpbrHAj+NiCUR8XI6diW7pJ+raj3pCi5Ox9sQEc8D1wHHA6RE8kZgbmrxTAe+EBFrIuIF4L+B47p5fOsHnEBsINkDuFbSWklrgQeAjcCuklokzUzdW38DHk37jMjt/3iZ93wyt/wysEOV41equ1vJe5c7Todn089RVerUovQYl5MSCFnrY05KZiOB7YG7c/9uv03lNsA5gdhA8jhwVEQMy72GREQ72ZfmFLJupJ2AcWkf5fav19TVq8gGwzuMqVJ3Gdl5fLhKnZfIvvQ7vL5MndJzuQEYKWkiWSLp6L56BlgH7Jf7N9spIqolShsgnECsvxokaUjutS1ZX/+5kvYAkDRS0pRU/7XAK2R/4W9P1k3TW64CTpS0j6Ttga9WqhjZ8xdOBb4q6URJO6aLA94paVaqdi/wLkljUxfcGZ0FEBGvkl3ZdR4wnCyhEBGbgB8D50t6HYCk0ZImFz5b6zecQKy/mkf2l3PH62vAd4G5wO8kvQDcAbw91b8U+CvQDixN23pFRFwPfA+4GVieO/YrFerPBj4KfAJ4AngK+C+ycQwi4gbgSuB+4G7g1zWGcjlZC+zqiNiQKz+tI67UvXcj2WC+DXDyA6XMmoukfYDFwHYlX+RmTcUtELMmIOmD6X6LnYFvAL9y8rBm5wRi1hw+BTwNPEx2ZdhnGhuOWefchWVmZoW4BWJmZoVs2+gAetOIESNi3LhxjQ7DzKxPufvuu5+JiK1uHh1QCWTcuHG0tbV1XtHMzDaT9Ndy5e7CMjOzQpxAzMysECcQMzMrxAnEzMwKcQIxM7NCBtRVWNa75ixs57z5y3hi7Tp2GzaUGZMnMHXS6EaHZTZg1Pt30AnE6mLOwnbOuGYR617dCED72nWccc0iACcRs17QG7+DTiADRG+3Bs6bv2zzf9wO617dyJdm388v7nysbsc1s8zCx9ayfuOmLcrWvbqR8+YvcwIZ6LqSEBrRGnhi7bqy5aX/oc2sPir9rlX63SzCCaRJVUsQXU0IjWgNDGrZpux/4NHDhnLlpw6qyzHN7B8OnrmA9jLJYrdhQ3vsGE4gTaizBNHVhFDuPxHUtzUwZvhQHnnmJTblJnseOqiFGZP9IDuz3jBj8oQtvkeg538HnUCaUGcJoqsJYXCDWgO+CsuscTp+1/rtVViSjiR7TnULcGFEzCzZfj5waFrdHnhdRAxL2zYCi9K2xyLi6N6Juv46Gz/oakIobdFA77QGpk4a7YRh1kD1/h1sWAKR1AJcABwBrATukjQ3IpZ21ImIL+Tqfw6YlHuLdRExsbfi7WnV/jrfbdjQsq2MjgTR1YTQG3+JmNnA08gWyIHA8ohYASDpCmAKsLRC/eOBs3sptrrqbIyjs77LIgnBrQEz62mNTCCjgcdz6yuBt5erKGkPYE9gQa54iKQ2YAMwMyLm1CvQnlbLIPhuw4awYvVLBFnLozRBOCGYWaP1lUH044DZEZH/1t0jItol7QUskLQoIh4u3VHSdGA6wNixY3sn2k7Uco/EiB22Y8QO2zFl4mg+9vbmiNvMLK+RCaQdGJNb3z2VlXMc8Nl8QUS0p58rJN1CNj6yVQKJiFnALIDW1tYo3d4InY1xmJn1BY2cjfcuYLykPSUNJksSc0srSXojsDNwe65sZ0nbpeURwMFUHjvpVXMWtnPwzAXsefpvOHjmAuYs3Donzpg8gaGDWrYo8z0SZtbXNKwFEhEbJJ0CzCe7jPeiiFgi6RygLSI6kslxwBURkW897AP8SNImsiQ4M3/1VqPUeod4x/KXZt/P+o2byo5xmJk1O235vdy/tba2RltbW93ev9LUAYNbtmHS2GFblS9d9Tf2HbWju63MrKlJujsiWkvL/UCpHtTVCQT3HbUjUya61WFmfVNfuQqrT/DguJkNJG6B9CAPjpvZQOIWSA/y4LiZDSROID1s6qTRm+8md7eVmfVnTiAFeJpyMzMnkC5rxONhzcyakRNIF9UyEWLH/R1mZv2Zr8Lqolru9fD9HWY2ELgF0kW+18PMLOMWSBf5Xg8zs4xbIF3kez3MzDJOIAX4Xg8zM3dhmZlZQU4gZmZWiBOImZkV4gRiZmaFNDSBSDpS0jJJyyWdXmb7CZJWS7o3vU7ObZsm6aH0mta7kZuZWcOuwpLUAlwAHAGsBO6SNLfMs82vjIhTSvYdDpwNtAIB3J32fa4XQjczMxrbAjkQWB4RKyJiPXAFMKXGfScDN0TEmpQ0bgCOrFOcZmZWRiMTyGjg8dz6ylRW6sOS7pc0W9KYLu6LpOmS2iS1rV69uifiNjMzmn8Q/VfAuIh4M1kr45KuvkFEzIqI1ohoHTlyZI8HaGY2UDUygbQDY3Lru6eyzSLi2Yh4Ja1eCBxQ675mZlZfjUwgdwHjJe0paTBwHDA3X0HSqNzq0cADaXk+8F5JO0vaGXhvKjMzs17SsKuwImKDpFPIvvhbgIsiYomkc4C2iJgLfF7S0cAGYA1wQtp3jaSvkyUhgHMiYk2vn4SZ2QDW0MkUI2IeMK+k7Kzc8hnAGRX2vQi4qK4BmplZRc0+iG5mZk3KCcTMzApxAjEzs0KcQMzMrBAnEDMzK8QJxMzMCnECMTOzQpxAzMysECcQMzMrxAnEzMwKcQIxM7NCnEDMzKwQJxAzMyvECcTMzApxAjEzs0KcQMzMrJCGJhBJR0paJmm5pNPLbD9V0lJJ90u6SdIeuW0bJd2bXnNL9zUzs/pq2BMJJbUAFwBHACuBuyTNjYiluWoLgdaIeFnSZ4BvAh9N29ZFxMReDdrMzDZrZAvkQGB5RKyIiPXAFcCUfIWIuDkiXk6rdwC793KMZmZWQSMTyGjg8dz6ylRWyUnA9bn1IZLaJN0haWqlnSRNT/XaVq9e3b2Izcxss4Z1YXWFpH8BWoF354r3iIh2SXsBCyQtioiHS/eNiFnALIDW1tbolYDNzAaARrZA2oExufXdU9kWJB0OnAkcHRGvdJRHRHv6uQK4BZhUz2DNzGxLjUwgdwHjJe0paTBwHLDF1VSSJgE/IkseT+fKd5a0XVoeARwM5AffzcyszhrWhRURGySdAswHWoCLImKJpHOAtoiYC5wH7ABcLQngsYg4GtgH+JGkTWRJcGbJ1VtmZlZnDR0DiYh5wLy
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
}
}
],
"metadata": {
"slideshow": {
"slide_type": "slide"
}
}
},
{
"cell_type": "markdown",
"source": [
"## 3. Comparison with untuned XGBoost\n",
"\n",
"### FLAML's accuracy"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 12,
"source": [
"print('flaml (60s) r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"flaml (60s) r2 = 0.8439648010832427\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"tags": []
}
},
{
"cell_type": "markdown",
"source": [
"### Default XGBoost"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 13,
"source": [
"from xgboost import XGBRegressor\n",
"xgb = XGBRegressor()"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 14,
"source": [
"xgb.fit(X_train, y_train)"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
2021-05-08 02:50:50 +00:00
"text/plain": [
"XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n",
" colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n",
" importance_type='gain', interaction_constraints='',\n",
" learning_rate=0.300000012, max_delta_step=0, max_depth=6,\n",
" min_child_weight=1, missing=nan, monotone_constraints='()',\n",
" n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,\n",
" reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,\n",
" tree_method='exact', validate_parameters=1, verbosity=None)"
]
},
"metadata": {},
"execution_count": 14
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 15,
"source": [
"y_pred = xgb.predict(X_test)\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('default xgboost r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
2021-05-08 02:50:50 +00:00
"text": [
"default xgboost r2 = 0.8265451174596482\n"
]
}
],
"metadata": {
"tags": []
}
},
{
"cell_type": "markdown",
"source": [
"## 4. Add customized XGBoost learners in FLAML\n",
"You can easily enable a custom objective function by adding a customized XGBoost learner (XGBoostEstimator for regression tasks, and XGBoostSklearnEstimator for classification tasks) in FLAML. In the following example, we show how to add such a customized XGBoostEstimator with a custom objective function. "
],
"metadata": {}
},
{
"cell_type": "code",
2021-05-08 02:50:50 +00:00
"execution_count": 16,
"source": [
"import numpy as np \n",
"\n",
"''' define your customized objective function '''\n",
"def logregobj(preds, dtrain):\n",
" labels = dtrain.get_label()\n",
" preds = 1.0 / (1.0 + np.exp(-preds)) # transform raw leaf weight\n",
" grad = preds - labels\n",
" hess = preds * (1.0 - preds)\n",
" return grad, hess\n",
"\n",
"''' create customized XGBoost learners class with your objective function '''\n",
"from flaml.model import XGBoostEstimator\n",
"\n",
"\n",
"class MyXGB1(XGBoostEstimator):\n",
" '''XGBoostEstimator with the logregobj function as the objective function\n",
" '''\n",
"\n",
" def __init__(self, **config):\n",
" super().__init__(objective=logregobj, **config) \n",
"\n",
"\n",
"class MyXGB2(XGBoostEstimator):\n",
" '''XGBoostEstimator with 'reg:squarederror' as the objective function\n",
" '''\n",
"\n",
" def __init__(self, **config):\n",
" super().__init__(objective='reg:gamma', **config)\n",
"\n",
"\n",
"from flaml import AutoML\n",
"automl = AutoML()\n",
"automl.add_learner(learner_name='my_xgb1', learner_class=MyXGB1)\n",
"automl.add_learner(learner_name='my_xgb2', learner_class=MyXGB2)\n",
"settings = {\n",
" \"time_budget\": 30, # total running time in seconds\n",
" \"metric\": 'r2', # primary metrics for regression can be chosen from: ['mae','mse','r2']\n",
" \"estimator_list\": ['my_xgb1', 'my_xgb2'], # list of ML learners; we tune lightgbm in this example\n",
" \"task\": 'regression', # task type \n",
" \"log_file_name\": 'houses_experiment_my_xgb.log', # flaml log file\n",
"}\n",
"automl.fit(X_train=X_train, y_train=y_train, **settings)"
],
"outputs": [
{
"output_type": "stream",
"name": "stderr",
2021-05-08 02:50:50 +00:00
"text": [
"[flaml.automl: 09-29 23:08:00] {1446} INFO - Data split method: uniform\n",
"[flaml.automl: 09-29 23:08:00] {1450} INFO - Evaluation method: holdout\n",
"[flaml.automl: 09-29 23:08:00] {1496} INFO - Minimizing error metric: 1-r2\n",
"[flaml.automl: 09-29 23:08:00] {1533} INFO - List of ML learners in AutoML Run: ['my_xgb1', 'my_xgb2']\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 0, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1880} INFO - Estimated sufficient time budget=443s. Estimated necessary time budget=0s.\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.1s,\testimator my_xgb1's best error=53750617.1059,\tbest estimator my_xgb1's best error=53750617.1059\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 1, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.1s,\testimator my_xgb1's best error=260718.5183,\tbest estimator my_xgb1's best error=260718.5183\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 2, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.2s,\testimator my_xgb2's best error=4.1611,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 3, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.2s,\testimator my_xgb2's best error=4.1611,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 4, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.3s,\testimator my_xgb1's best error=260718.5183,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 5, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.3s,\testimator my_xgb1's best error=260718.5183,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 6, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.4s,\testimator my_xgb1's best error=40726.5769,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 7, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.4s,\testimator my_xgb1's best error=1918.9637,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 8, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.5s,\testimator my_xgb1's best error=1918.9637,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 9, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.5s,\testimator my_xgb1's best error=1918.9637,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 10, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.6s,\testimator my_xgb2's best error=4.1611,\tbest estimator my_xgb2's best error=4.1611\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 11, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.6s,\testimator my_xgb2's best error=4.1603,\tbest estimator my_xgb2's best error=4.1603\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 12, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.7s,\testimator my_xgb2's best error=4.1603,\tbest estimator my_xgb2's best error=4.1603\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 13, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:00] {1952} INFO - at 0.7s,\testimator my_xgb2's best error=4.1603,\tbest estimator my_xgb2's best error=4.1603\n",
"[flaml.automl: 09-29 23:08:00] {1763} INFO - iteration 14, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 0.8s,\testimator my_xgb1's best error=1918.9637,\tbest estimator my_xgb2's best error=4.1603\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 15, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 0.8s,\testimator my_xgb2's best error=3.8476,\tbest estimator my_xgb2's best error=3.8476\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 16, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 0.9s,\testimator my_xgb1's best error=93.9115,\tbest estimator my_xgb2's best error=3.8476\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 17, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.0s,\testimator my_xgb2's best error=0.3645,\tbest estimator my_xgb2's best error=0.3645\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 18, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.1s,\testimator my_xgb2's best error=0.3645,\tbest estimator my_xgb2's best error=0.3645\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 19, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.2s,\testimator my_xgb2's best error=0.3139,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 20, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.2s,\testimator my_xgb1's best error=93.9115,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 21, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.3s,\testimator my_xgb1's best error=12.3445,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 22, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.4s,\testimator my_xgb2's best error=0.3139,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 23, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.4s,\testimator my_xgb2's best error=0.3139,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 24, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.5s,\testimator my_xgb1's best error=12.3445,\tbest estimator my_xgb2's best error=0.3139\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 25, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.6s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 26, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:01] {1952} INFO - at 1.7s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:01] {1763} INFO - iteration 27, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 1.9s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 28, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 1.9s,\testimator my_xgb1's best error=12.3445,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 29, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.0s,\testimator my_xgb1's best error=4.1558,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 30, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.0s,\testimator my_xgb1's best error=2.4948,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 31, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.2s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 32, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.2s,\testimator my_xgb1's best error=2.4948,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 33, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.3s,\testimator my_xgb1's best error=2.4948,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 34, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.5s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 35, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.6s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 36, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.6s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 37, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:02] {1952} INFO - at 2.7s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:02] {1763} INFO - iteration 38, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:03] {1952} INFO - at 3.0s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:03] {1763} INFO - iteration 39, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:03] {1952} INFO - at 3.1s,\testimator my_xgb2's best error=0.2254,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:03] {1763} INFO - iteration 40, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:03] {1952} INFO - at 3.1s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.2254\n",
"[flaml.automl: 09-29 23:08:03] {1763} INFO - iteration 41, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:03] {1952} INFO - at 3.6s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:03] {1763} INFO - iteration 42, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.0s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 43, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.2s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 44, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.3s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 45, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.3s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 46, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.4s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 47, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:04] {1952} INFO - at 4.4s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:04] {1763} INFO - iteration 48, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.2s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 49, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.3s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 50, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.3s,\testimator my_xgb1's best error=1.4151,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 51, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.4s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 52, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.5s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 53, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.6s,\testimator my_xgb2's best error=0.1900,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 54, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 55, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:05] {1952} INFO - at 5.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1900\n",
"[flaml.automl: 09-29 23:08:05] {1763} INFO - iteration 56, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:07] {1952} INFO - at 7.1s,\testimator my_xgb2's best error=0.1865,\tbest estimator my_xgb2's best error=0.1865\n",
"[flaml.automl: 09-29 23:08:07] {1763} INFO - iteration 57, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:07] {1952} INFO - at 7.4s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1865\n",
"[flaml.automl: 09-29 23:08:07] {1763} INFO - iteration 58, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:08] {1952} INFO - at 7.9s,\testimator my_xgb2's best error=0.1790,\tbest estimator my_xgb2's best error=0.1790\n",
"[flaml.automl: 09-29 23:08:08] {1763} INFO - iteration 59, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:09] {1952} INFO - at 9.1s,\testimator my_xgb2's best error=0.1790,\tbest estimator my_xgb2's best error=0.1790\n",
"[flaml.automl: 09-29 23:08:09] {1763} INFO - iteration 60, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:09] {1952} INFO - at 9.2s,\testimator my_xgb2's best error=0.1790,\tbest estimator my_xgb2's best error=0.1790\n",
"[flaml.automl: 09-29 23:08:09] {1763} INFO - iteration 61, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:13] {1952} INFO - at 12.8s,\testimator my_xgb2's best error=0.1707,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:13] {1763} INFO - iteration 62, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:13] {1952} INFO - at 12.9s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:13] {1763} INFO - iteration 63, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:13] {1952} INFO - at 13.0s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:13] {1763} INFO - iteration 64, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:14] {1952} INFO - at 14.5s,\testimator my_xgb2's best error=0.1707,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:14] {1763} INFO - iteration 65, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:14] {1952} INFO - at 14.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:14] {1763} INFO - iteration 66, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:14] {1952} INFO - at 14.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:14] {1763} INFO - iteration 67, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:18] {1952} INFO - at 18.5s,\testimator my_xgb2's best error=0.1707,\tbest estimator my_xgb2's best error=0.1707\n",
"[flaml.automl: 09-29 23:08:18] {1763} INFO - iteration 68, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:22] {1952} INFO - at 22.7s,\testimator my_xgb2's best error=0.1699,\tbest estimator my_xgb2's best error=0.1699\n",
"[flaml.automl: 09-29 23:08:22] {1763} INFO - iteration 69, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:23] {1952} INFO - at 23.0s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1699\n",
"[flaml.automl: 09-29 23:08:23] {1763} INFO - iteration 70, current learner my_xgb2\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.1s,\testimator my_xgb2's best error=0.1685,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 71, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.1s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 72, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.2s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 73, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.4s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 74, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.5s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 75, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.6s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 76, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:28] {1952} INFO - at 28.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:28] {1763} INFO - iteration 77, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 28.8s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 78, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 28.9s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 79, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.0s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 80, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.1s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 81, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.2s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 82, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.3s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 83, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.4s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 84, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.6s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 85, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:29] {1952} INFO - at 29.7s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:29] {1763} INFO - iteration 86, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:30] {1952} INFO - at 29.8s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:30] {1763} INFO - iteration 87, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:30] {1952} INFO - at 29.9s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:30] {1763} INFO - iteration 88, current learner my_xgb1\n",
"[flaml.automl: 09-29 23:08:30] {1952} INFO - at 30.0s,\testimator my_xgb1's best error=1.0011,\tbest estimator my_xgb2's best error=0.1685\n",
"[flaml.automl: 09-29 23:08:30] {2059} INFO - selected model: <xgboost.core.Booster object at 0x7f6314f51c40>\n",
"[flaml.automl: 09-29 23:08:35] {2122} INFO - retrain my_xgb2 for 4.9s\n",
"[flaml.automl: 09-29 23:08:35] {2128} INFO - retrained model: <xgboost.core.Booster object at 0x7f6314f0cee0>\n",
"[flaml.automl: 09-29 23:08:35] {1557} INFO - fit succeeded\n",
"[flaml.automl: 09-29 23:08:35] {1558} INFO - Time taken to find the best model: 28.05234169960022\n",
"[flaml.automl: 09-29 23:08:35] {1569} WARNING - Time taken to find the best model is 94% of the provided time budget and not all estimators' hyperparameter search converged. Consider increasing the time budget.\n"
2021-05-08 02:50:50 +00:00
]
}
],
"metadata": {
"tags": []
}
},
{
"cell_type": "code",
"execution_count": 17,
"source": [
"print('Best hyperparmeter config:', automl.best_config)\n",
"print('Best r2 on validation data: {0:.4g}'.format(1-automl.best_loss))\n",
"print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))\n",
"\n",
"y_pred = automl.predict(X_test)\n",
"print('Predicted labels', y_pred)\n",
"print('True labels', y_test)\n",
"\n",
"from flaml.ml import sklearn_metric_loss_score\n",
"print('r2', '=', 1 - sklearn_metric_loss_score('r2', y_pred, y_test))\n",
"print('mse', '=', sklearn_metric_loss_score('mse', y_pred, y_test))\n",
"print('mae', '=', sklearn_metric_loss_score('mae', y_pred, y_test))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Best hyperparmeter config: {'n_estimators': 810, 'max_leaves': 148, 'min_child_weight': 30.65305732414229, 'learning_rate': 0.05793074143079172, 'subsample': 0.9452642648281835, 'colsample_bylevel': 0.8662229421401874, 'colsample_bytree': 0.7851677398738949, 'reg_alpha': 0.00738292823760415, 'reg_lambda': 1.2202619267865558}\n",
"Best r2 on validation data: 0.8315\n",
"Training duration of best run: 5.028 s\n",
"Predicted labels [146309.06 253975.23 148795.17 ... 192561.88 182641.44 270495.53]\n",
"True labels 14740 136900.0\n",
"10101 241300.0\n",
"20566 200700.0\n",
"2670 72500.0\n",
"15709 460000.0\n",
" ... \n",
"13132 121200.0\n",
"8228 137500.0\n",
"3948 160900.0\n",
"8522 227300.0\n",
"16798 265600.0\n",
"Name: median_house_value, Length: 5160, dtype: float64\n",
"r2 = 0.8483896546182459\n",
"mse = 2004062342.1743872\n",
"mae = 28633.257468053536\n"
]
}
],
"metadata": {
"tags": []
}
},
{
"cell_type": "code",
"execution_count": null,
"source": [],
"outputs": [],
"metadata": {}
}
],
"metadata": {
"kernelspec": {
2021-05-08 02:50:50 +00:00
"name": "python3",
"display_name": "Python 3.8.0 64-bit ('blend': conda)"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
},
"interpreter": {
"hash": "0cfea3304185a9579d09e0953576b57c8581e46e6ebc6dfeb681bc5a511f7544"
}
},
"nbformat": 4,
"nbformat_minor": 2
}