autogen/notebook/integrate_azureml.ipynb
Shreyas 3b3b0bfa8e
roc_auc_weighted metric addition (#827)
* Pending changes exported from your codespace

* Update flaml/automl.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/automl.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/docs/Examples/Integrate - Scikit-learn Pipeline.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* added documentation for new metric

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* minor notebook changes

* Update Integrate - Scikit-learn Pipeline.md

* Update notebook/automl_classification.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update integrate_azureml.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2022-12-02 19:27:32 -08:00

231 lines
5.9 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Run FLAML in AzureML\n",
"\n",
"\n",
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
" tuning tasks.\n",
"\n",
"In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library together with AzureML.\n",
"\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [azureml] option:\n",
"```bash\n",
"pip install flaml[azureml]\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install flaml[azureml]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Enable mlflow in AzureML workspace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2. Classification Example\n",
"### Load data and preprocess\n",
"\n",
"Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"from flaml.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"''' import AutoML class from flaml package '''\n",
"from flaml import AutoML\n",
"automl = AutoML()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"settings = {\n",
" \"time_budget\": 60, # total running time in seconds\n",
" \"metric\": 'accuracy', \n",
" # check the documentation for options of metrics (https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric)\n",
" \"estimator_list\": ['lgbm', 'rf', 'xgboost'], # list of ML learners\n",
" \"task\": 'classification', # task type \n",
" \"sample\": False, # whether to subsample training data\n",
" \"log_file_name\": 'airlines_experiment.log', # flaml log file\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [],
"source": [
"experiment = mlflow.set_experiment(\"flaml\")\n",
"with mlflow.start_run() as run:\n",
" automl.fit(X_train=X_train, y_train=y_train, **settings)\n",
" # log the model\n",
" mlflow.sklearn.log_model(automl, \"automl\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl = mlflow.sklearn.load_model(f\"{run.info.artifact_uri}/automl\")\n",
"print(automl.predict_proba(X_test))\n",
"print(automl.predict(X_test))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Retrieve logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"mlflow.search_runs(experiment_ids=[experiment.experiment_id], filter_string=\"params.learner = 'xgboost'\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"vscode": {
"interpreter": {
"hash": "e811209110f5aa4d8c2189eeb3ff7b9b4d146931cb9189ef6041ff71605c541d"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}