2021-02-22 22:10:41 -08:00
[](https://badge.fury.io/py/FLAML)
2021-10-13 12:02:06 +02:00

2021-02-22 22:10:41 -08:00
[](https://github.com/microsoft/FLAML/actions/workflows/python-package.yml)
2021-07-05 21:17:26 -04:00

2021-03-16 22:13:35 -07:00
[](https://pepy.tech/project/flaml)
[](https://gitter.im/FLAMLer/community?utm_source=badge& utm_medium=badge& utm_campaign=pr-badge& utm_content=badge)
2021-02-22 22:10:41 -08:00
2020-12-04 09:40:27 -08:00
# FLAML - Fast and Lightweight AutoML
2021-02-05 21:41:14 -08:00
< p align = "center" >
2021-04-21 04:36:06 -04:00
< img src = "https://github.com/microsoft/FLAML/blob/main/docs/images/FLAML.png" width = 200 >
2021-02-05 21:41:14 -08:00
< br >
< / p >
2021-02-11 14:40:29 -05:00
FLAML is a lightweight Python library that finds accurate machine
learning models automatically, efficiently and economically. It frees users from selecting
2021-09-10 16:39:16 -07:00
learners and hyperparameters for each learner. It is fast and economical.
2020-12-04 09:40:27 -08:00
The simple and lightwei ght design makes it easy to extend, such as
2021-02-05 21:41:14 -08:00
adding customized learners or metrics. FLAML is powered by a new, [cost-effective
hyperparameter optimization](https://github.com/microsoft/FLAML/tree/main/flaml/tune)
and learner selection method invented by Microsoft Research.
2021-03-16 22:13:35 -07:00
FLAML leverages the structure of the search space to choose a search order optimized for both cost and error. For example, the system tends to propose cheap configurations at the beginning stage of the search,
2021-03-31 22:11:56 -07:00
but quickly moves to configurations with high model complexity and large sample size when needed in the later stage of the search. For another example, it favors cheap learners in the beginning but penalizes them later if the error improvement is slow. The cost-bounded search and cost-based prioritization make a big difference in the search efficiency under budget constraints.
2021-03-16 22:13:35 -07:00
2021-06-22 21:57:36 -07:00
FLAML has a .NET implementation as well from [ML.NET Model Builder ](https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet/model-builder ). This [ML.NET blog ](https://devblogs.microsoft.com/dotnet/ml-net-june-updates/#new-and-improved-automl ) describes the improvement brought by FLAML.
2021-04-30 23:19:49 -04:00
## Installation
FLAML requires **Python version >= 3.6** . It can be installed from pip:
```bash
pip install flaml
```
To run the [`notebook example` ](https://github.com/microsoft/FLAML/tree/main/notebook ),
install flaml with the [notebook] option:
```bash
pip install flaml[notebook]
```
## Quickstart
2020-12-04 09:40:27 -08:00
2020-12-15 04:52:55 -08:00
* With three lines of code, you can start using this economical and fast
2020-12-04 09:40:27 -08:00
AutoML engine as a scikit-learn style estimator.
2021-06-22 21:57:36 -07:00
2020-12-04 09:40:27 -08:00
```python
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
```
2020-12-15 04:52:55 -08:00
* You can restrict the learners and use FLAML as a fast hyperparameter tuning
2020-12-04 09:40:27 -08:00
tool for XGBoost, LightGBM, Random Forest etc. or a customized learner.
2021-06-22 21:57:36 -07:00
2020-12-04 09:40:27 -08:00
```python
automl.fit(X_train, y_train, task="classification", estimator_list=["lgbm"])
```
2021-02-05 21:41:14 -08:00
* You can also run generic ray-tune style hyperparameter tuning for a custom function.
2021-06-22 21:57:36 -07:00
2020-12-04 09:40:27 -08:00
```python
2021-02-05 21:41:14 -08:00
from flaml import tune
2021-04-06 11:37:52 -07:00
tune.run(train_with_config, config={…}, low_cost_partial_config={…}, time_budget_s=3600)
2020-12-04 09:40:27 -08:00
```
2021-04-30 23:19:49 -04:00
## Advantages
2020-12-04 09:40:27 -08:00
2021-09-01 16:25:04 -07:00
* For common machine learning tasks like classification and regression, find quality models with small computational resources.
2021-04-30 23:19:49 -04:00
* Users can choose their desired customizability: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), full customization (arbitrary training and evaluation code).
2021-06-22 21:57:36 -07:00
* Allow human guidance in hyperparameter tuning to respect prior on certain subspaces but also able to explore other subspaces. Read more about the
2021-06-04 13:52:52 -04:00
hyperparameter optimization methods
2021-06-22 21:57:36 -07:00
in FLAML [here ](https://github.com/microsoft/FLAML/tree/main/flaml/tune ). They can be used beyond the AutoML context.
2021-06-04 13:52:52 -04:00
And they can be used in distributed HPO frameworks such as ray tune or nni.
* Support online AutoML: automatic hyperparameter tuning for online learning algorithms. Read more about the online AutoML method in FLAML [here ](https://github.com/microsoft/FLAML/tree/main/flaml/onlineml ).
2020-12-04 09:40:27 -08:00
## Examples
2021-09-11 21:19:18 -07:00
* A basic classification example.
2020-12-04 09:40:27 -08:00
```python
from flaml import AutoML
from sklearn.datasets import load_iris
2021-02-05 21:41:14 -08:00
# Initialize an AutoML instance
2020-12-04 09:40:27 -08:00
automl = AutoML()
2021-02-05 21:41:14 -08:00
# Specify automl goal and constraint
2020-12-04 09:40:27 -08:00
automl_settings = {
"time_budget": 10, # in seconds
"metric": 'accuracy',
"task": 'classification',
2021-11-12 22:29:33 -08:00
"log_file_name": "iris.log",
2020-12-04 09:40:27 -08:00
}
X_train, y_train = load_iris(return_X_y=True)
2021-02-05 21:41:14 -08:00
# Train with labeled input data
2020-12-04 09:40:27 -08:00
automl.fit(X_train=X_train, y_train=y_train,
2021-06-11 10:25:45 -07:00
**automl_settings)
2020-12-04 09:40:27 -08:00
# Predict
print(automl.predict_proba(X_train))
2021-11-09 21:51:23 -08:00
# Print the best model
print(automl.model.estimator)
2020-12-04 09:40:27 -08:00
```
2021-09-11 21:19:18 -07:00
* A basic regression example.
2020-12-04 09:40:27 -08:00
```python
from flaml import AutoML
2021-10-08 16:09:43 -07:00
from sklearn.datasets import fetch_california_housing
2021-02-05 21:41:14 -08:00
# Initialize an AutoML instance
2020-12-04 09:40:27 -08:00
automl = AutoML()
2021-02-05 21:41:14 -08:00
# Specify automl goal and constraint
2020-12-04 09:40:27 -08:00
automl_settings = {
"time_budget": 10, # in seconds
"metric": 'r2',
"task": 'regression',
2021-11-12 22:29:33 -08:00
"log_file_name": "california.log",
2020-12-04 09:40:27 -08:00
}
2021-10-08 16:09:43 -07:00
X_train, y_train = fetch_california_housing(return_X_y=True)
2021-02-05 21:41:14 -08:00
# Train with labeled input data
2020-12-04 09:40:27 -08:00
automl.fit(X_train=X_train, y_train=y_train,
2021-06-11 10:25:45 -07:00
**automl_settings)
2020-12-04 09:40:27 -08:00
# Predict
print(automl.predict(X_train))
2021-11-09 21:51:23 -08:00
# Print the best model
print(automl.model.estimator)
2020-12-04 09:40:27 -08:00
```
2021-11-12 22:29:33 -08:00
* A basic time series forecasting example.
2021-09-01 16:25:04 -07:00
```python
2021-10-30 12:48:57 -04:00
# pip install flaml[ts_forecast]
2021-09-01 16:25:04 -07:00
import numpy as np
from flaml import AutoML
X_train = np.arange('2014-01', '2021-01', dtype='datetime64[M]')
y_train = np.random.random(size=72)
automl = AutoML()
automl.fit(X_train=X_train[:72], # a single column of timestamp
y_train=y_train, # value for each timestamp
period=12, # time horizon to forecast, e.g., 12 months
2021-10-30 12:48:57 -04:00
task='ts_forecast', time_budget=15, # time budget in seconds
2021-11-12 22:29:33 -08:00
log_file_name="ts_forecast.log",
2021-09-01 16:25:04 -07:00
)
print(automl.predict(X_train[72:]))
```
2021-09-11 21:19:18 -07:00
* Learning to rank.
2021-09-01 16:25:04 -07:00
```python
from sklearn.datasets import fetch_openml
from flaml import AutoML
2021-09-11 21:19:18 -07:00
X_train, y_train = fetch_openml(name="credit-g", return_X_y=True, as_frame=False)
y_train = y_train.cat.codes
2021-09-01 16:25:04 -07:00
# not a real learning to rank dataaset
2021-09-11 21:19:18 -07:00
groups = [200] * 4 + [100] * 2 # group counts
2021-09-01 16:25:04 -07:00
automl = AutoML()
automl.fit(
X_train, y_train, groups=groups,
task='rank', time_budget=10, # in seconds
)
```
2021-02-05 21:41:14 -08:00
More examples can be found in [notebooks ](https://github.com/microsoft/FLAML/tree/main/notebook/ ).
2020-12-04 09:40:27 -08:00
2020-12-22 19:32:58 -08:00
## Documentation
2021-05-27 16:04:13 -04:00
Please find the API documentation [here ](https://microsoft.github.io/FLAML/ ).
2021-08-26 13:45:13 -07:00
Please find demo and tutorials of FLAML [here ](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A ).
2020-12-22 19:32:58 -08:00
For more technical details, please check our papers.
2021-08-12 02:02:22 -04:00
* [FLAML: A Fast and Lightweight AutoML Library ](https://www.microsoft.com/en-us/research/publication/flaml-a-fast-and-lightweight-automl-library/ ). Chi Wang, Qingyun Wu, Markus Weimer, Erkang Zhu. MLSys 2021.
2021-06-22 21:57:36 -07:00
```bibtex
2021-02-05 21:41:14 -08:00
@inproceedings {wang2021flaml,
2021-02-07 07:46:30 -08:00
title={FLAML: A Fast and Lightweight AutoML Library},
2021-02-05 21:41:14 -08:00
author={Chi Wang and Qingyun Wu and Markus Weimer and Erkang Zhu},
year={2021},
booktitle={MLSys},
}
```
2021-06-22 21:57:36 -07:00
2021-02-05 21:41:14 -08:00
* [Frugal Optimization for Cost-related Hyperparameters ](https://arxiv.org/abs/2005.01571 ). Qingyun Wu, Chi Wang, Silu Huang. AAAI 2021.
2021-04-30 23:19:49 -04:00
* [Economical Hyperparameter Optimization With Blended Search Strategy ](https://www.microsoft.com/en-us/research/publication/economical-hyperparameter-optimization-with-blended-search-strategy/ ). Chi Wang, Qingyun Wu, Silu Huang, Amin Saied. ICLR 2021.
2021-07-20 17:00:44 -07:00
* [ChaCha for Online AutoML ](https://www.microsoft.com/en-us/research/publication/chacha-for-online-automl/ ). Qingyun Wu, Chi Wang, John Langford, Paul Mineiro and Marco Rossi. ICML 2021.
2021-06-02 22:08:24 -04:00
2020-12-04 09:40:27 -08:00
## Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit < https: / / cla . opensource . microsoft . com > .
2021-03-16 22:13:35 -07:00
If you are new to GitHub [here ](https://help.github.com/categories/collaborating-with-issues-and-pull-requests/ ) is a detailed help source on getting involved with development on GitHub.
2020-12-04 09:40:27 -08:00
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct ](https://opensource.microsoft.com/codeofconduct/ ).
For more information see the [Code of Conduct FAQ ](https://opensource.microsoft.com/codeofconduct/faq/ ) or
contact [opencode@microsoft.com ](mailto:opencode@microsoft.com ) with any additional questions or comments.
2021-03-16 22:13:35 -07:00
## Developing
2021-06-22 21:57:36 -07:00
### Setup
2021-03-16 22:13:35 -07:00
2021-06-22 21:57:36 -07:00
```bash
2021-03-16 22:13:35 -07:00
git clone https://github.com/microsoft/FLAML.git
pip install -e .[test,notebook]
```
2021-09-10 16:39:16 -07:00
### Docker
2021-09-11 21:19:18 -07:00
2021-09-10 16:39:16 -07:00
We provide a simple [Dockerfile ](https://github.com/microsoft/FLAML/blob/main/Dockerfile ).
2021-09-11 21:19:18 -07:00
```bash
2021-09-10 16:39:16 -07:00
docker build git://github.com/microsoft/FLAML -t flaml-dev
docker run -it flaml-dev
```
### Develop in Remote Container
2021-09-11 21:19:18 -07:00
2021-09-10 16:39:16 -07:00
If you use vscode, you can open the FLAML folder in a [Container ](https://code.visualstudio.com/docs/remote/containers ).
2021-09-11 21:19:18 -07:00
We have provided the configuration in [.devcontainer ]((https://github.com/microsoft/FLAML/blob/main/.devcontainer )).
2021-09-10 16:39:16 -07:00
### Pre-commit
2021-09-11 21:19:18 -07:00
2021-09-10 16:39:16 -07:00
Run `pre-commit install` to install pre-commit into your git hooks. Before you commit, run
`pre-commit run` to check if you meet the pre-commit requirements. If you use Windows (without WSL) and can't commit after installing pre-commit, you can run `pre-commit uninstall` to uninstall the hook. In WSL or Linux this is supposed to work.
2021-03-16 22:13:35 -07:00
### Coverage
2021-06-22 21:57:36 -07:00
2021-09-10 16:39:16 -07:00
Any code you commit should not decrease coverage. To run all unit tests:
2021-06-22 21:57:36 -07:00
```bash
2021-03-16 22:13:35 -07:00
coverage run -m pytest test
```
2021-06-22 21:57:36 -07:00
2021-06-15 18:52:57 -07:00
Then you can see the coverage report by
`coverage report -m` or `coverage html` .
2021-03-16 22:13:35 -07:00
If all the tests are passed, please also test run notebook/flaml_automl to make sure your commit does not break the notebook example.
2020-12-04 09:40:27 -08:00
## Authors
* Chi Wang
* Qingyun Wu
2021-07-20 17:00:44 -07:00
Contributors (alphabetical order): Amir Aghaei, Vijay Aski, Sebastien Bubeck, Surajit Chaudhuri, Nadiia Chepurko, Ofer Dekel, Alex Deng, Anshuman Dutt, Nicolo Fusi, Jianfeng Gao, Johannes Gehrke, Niklas Gustafsson, Silu Huang, Dongwoo Kim, Christian Konig, John Langford, Menghao Li, Mingqin Li, Zhe Liu, Naveen Gaur, Paul Mineiro, Vivek Narasayya, Jake Radzikowski, Marco Rossi, Amin Saied, Neil Tenenholtz, Olga Vrousgou, Markus Weimer, Yue Wang, Qingyun Wu, Qiufeng Yin, Haozhe Zhang, Minjia Zhang, XiaoYun Zhang, Eric Zhu, and open-source contributors.
2020-12-04 09:40:27 -08:00
## License
[MIT License ](LICENSE )