mirror of
https://github.com/microsoft/autogen.git
synced 2025-07-23 08:52:56 +00:00

A new documentation website. And: * add actions for doc * update docstr * installation instructions for doc dev * unify README and Getting Started * rename notebook * doc about best_model_for_estimator #340 * docstr for keep_search_state #340 * DNN Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: Z.sk <shaokunzhang@psu.edu>
63 lines
2.1 KiB
Markdown
63 lines
2.1 KiB
Markdown
As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we can get all the benefits of pipeline.
|
|
|
|
### Load data
|
|
|
|
```python
|
|
from flaml.data import load_openml_dataset
|
|
|
|
# Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
|
|
X_train, X_test, y_train, y_test = load_openml_dataset(
|
|
dataset_id=1169, data_dir='./', random_state=1234, dataset_format='array')
|
|
```
|
|
|
|
### Create a pipeline
|
|
|
|
```python
|
|
from sklearn import set_config
|
|
from sklearn.pipeline import Pipeline
|
|
from sklearn.impute import SimpleImputer
|
|
from sklearn.preprocessing import StandardScaler
|
|
from flaml import AutoML
|
|
|
|
set_config(display='diagram')
|
|
|
|
imputer = SimpleImputer()
|
|
standardizer = StandardScaler()
|
|
automl = AutoML()
|
|
|
|
automl_pipeline = Pipeline([
|
|
("imputuer",imputer),
|
|
("standardizer", standardizer),
|
|
("automl", automl)
|
|
])
|
|
automl_pipeline
|
|
```
|
|

|
|
|
|
### Run AutoML in the pipeline
|
|
|
|
```python
|
|
settings = {
|
|
"time_budget": 60, # total running time in seconds
|
|
"metric": 'accuracy', # primary metrics can be chosen from: ['accuracy','roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'f1','log_loss','mae','mse','r2']
|
|
"task": 'classification', # task type
|
|
"estimator_list":['xgboost','catboost','lgbm'],
|
|
"log_file_name": 'airlines_experiment.log', # flaml log file
|
|
}
|
|
automl_pipeline.fit(X_train, y_train,
|
|
automl__time_budget=60,
|
|
automl__metric="accuracy")
|
|
```
|
|
|
|
### Get the automl object from the pipeline
|
|
|
|
```python
|
|
automl = automl_pipeline.steps[2][1]
|
|
# Get the best config and best learner
|
|
print('Best ML leaner:', automl.best_estimator)
|
|
print('Best hyperparmeter config:', automl.best_config)
|
|
print('Best accuracy on validation data: {0:.4g}'.format(1 - automl.best_loss))
|
|
print('Training duration of best run: {0:.4g} s'.format(automl.best_config_train_time))
|
|
```
|
|
|
|
[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_sklearn.ipynb) |