{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved. \n", "\n", "Licensed under the MIT License.\n", "\n", "# AutoML with FLAML Library\n", "\n", "\n", "## 1. Introduction\n", "\n", "FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n", "with low computational cost. It is fast and economical. The simple and lightweight design makes it easy to use and extend, such as adding new learners. FLAML can \n", "- serve as an economical AutoML engine,\n", "- be used as a fast hyperparameter tuning tool, or \n", "- be embedded in self-tuning software that requires low latency & resource in repetitive\n", " tuning tasks.\n", "\n", "In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library.\n", "\n", "FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the `automl` option (this option is introduced from version 2, for version 1 it is installed by default):\n", "```bash\n", "pip install flaml[automl]\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# %pip install flaml[automl] matplotlib openml" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 2. Classification Example\n", "### Load data and preprocess\n", "\n", "Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "download dataset from openml\n", "Dataset name: airlines\n", "X_train.shape: (404537, 7), y_train.shape: (404537,);\n", "X_test.shape: (134846, 7), y_test.shape: (134846,)\n" ] } ], "source": [ "from flaml.data import load_openml_dataset\n", "X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Airline | \n", "Flight | \n", "AirportFrom | \n", "AirportTo | \n", "DayOfWeek | \n", "Time | \n", "Length | \n", "
---|---|---|---|---|---|---|---|
249392 | \n", "EV | \n", "5309.0 | \n", "MDT | \n", "ATL | \n", "3 | \n", "794.0 | \n", "131.0 | \n", "
166918 | \n", "CO | \n", "1079.0 | \n", "IAH | \n", "SAT | \n", "5 | \n", "900.0 | \n", "60.0 | \n", "
89110 | \n", "US | \n", "1636.0 | \n", "CLE | \n", "CLT | \n", "1 | \n", "530.0 | \n", "103.0 | \n", "
70258 | \n", "WN | \n", "928.0 | \n", "CMH | \n", "LAS | \n", "7 | \n", "480.0 | \n", "280.0 | \n", "
492985 | \n", "WN | \n", "729.0 | \n", "GEG | \n", "LAS | \n", "3 | \n", "630.0 | \n", "140.0 | \n", "
LGBMClassifier(colsample_bytree=0.763983850698587,\n", " learning_rate=0.087493667994037, max_bin=127,\n", " min_child_samples=128, n_estimators=302, num_leaves=466,\n", " reg_alpha=0.09968008477303378, reg_lambda=23.227419343318914,\n", " verbose=-1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LGBMClassifier(colsample_bytree=0.763983850698587,\n", " learning_rate=0.087493667994037, max_bin=127,\n", " min_child_samples=128, n_estimators=302, num_leaves=466,\n", " reg_alpha=0.09968008477303378, reg_lambda=23.227419343318914,\n", " verbose=-1)
LGBMClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LGBMClassifier()
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric=None, feature_types=None,\n", " gamma=None, gpu_id=None, grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=None, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=None, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " n_estimators=100, n_jobs=None, num_parallel_tree=None,\n", " predictor=None, random_state=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
XGBClassifier(base_score=None, booster=None, callbacks=None,\n", " colsample_bylevel=None, colsample_bynode=None,\n", " colsample_bytree=None, early_stopping_rounds=None,\n", " enable_categorical=False, eval_metric=None, feature_types=None,\n", " gamma=None, gpu_id=None, grow_policy=None, importance_type=None,\n", " interaction_constraints=None, learning_rate=None, max_bin=None,\n", " max_cat_threshold=None, max_cat_to_onehot=None,\n", " max_delta_step=None, max_depth=None, max_leaves=None,\n", " min_child_weight=None, missing=nan, monotone_constraints=None,\n", " n_estimators=100, n_jobs=None, num_parallel_tree=None,\n", " predictor=None, random_state=None, ...)