Qingyun Wu bcdfdc8735
handle non-flaml scheduler in flaml.tune (#532)
* handle non-flaml scheduler in flaml.tune

* revise time budget

* Update website/docs/Use-Cases/Tune-User-Defined-Function.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/docs/Use-Cases/Tune-User-Defined-Function.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/tune/tune.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* add docstr

* remove random seed

* StopIteration

* StopIteration format

* format

* Update flaml/tune/tune.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* revise docstr

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2022-05-06 14:09:35 -04:00
..
2022-03-25 17:32:37 -07:00
2022-01-30 19:36:41 -08:00
2021-12-16 17:11:33 -08:00
2021-11-06 09:37:33 -07:00
2022-04-23 15:33:35 -07:00
2022-03-28 16:57:52 -07:00
2021-11-06 09:37:33 -07:00

Economical Hyperparameter Optimization

flaml.tune is a module for economical hyperparameter tuning. It frees users from manually tuning many hyperparameters for a software, such as machine learning training procedures. It can be used standalone, or together with ray tune or nni. Please find detailed guidelines and use cases about this module in our documentation website.

Below are some quick examples.

  • Example for sequential tuning (recommended when compute resource is limited and each trial can consume all the resources):
# require: pip install flaml[blendsearch]
from flaml import tune
import time

def evaluate_config(config):
    '''evaluate a hyperparameter configuration'''
    # we uss a toy example with 2 hyperparameters
    metric = (round(config['x'])-85000)**2 - config['x']/config['y']
    # usually the evaluation takes an non-neglible cost
    # and the cost could be related to certain hyperparameters
    # in this example, we assume it's proportional to x
    time.sleep(config['x']/100000)
    # use tune.report to report the metric to optimize  
    tune.report(metric=metric)

analysis = tune.run(
    evaluate_config,    # the function to evaluate a config
    config={
        'x': tune.lograndint(lower=1, upper=100000),
        'y': tune.randint(lower=1, upper=100000)
    }, # the search space
    low_cost_partial_config={'x':1},    # a initial (partial) config with low cost
    metric='metric',    # the name of the metric used for optimization
    mode='min',         # the optimization mode, 'min' or 'max'
    num_samples=-1,    # the maximal number of configs to try, -1 means infinite
    time_budget_s=60,   # the time budget in seconds
    local_dir='logs/',  # the local directory to store logs
    # verbose=0,          # verbosity  
    # use_ray=True, # uncomment when performing parallel tuning using ray
    )

print(analysis.best_trial.last_result)  # the best trial's result
print(analysis.best_config) # the best config
  • Example for using ray tune's API:
# require: pip install flaml[blendsearch,ray]
from ray import tune as raytune
from flaml import CFO, BlendSearch
import time

def evaluate_config(config):
    '''evaluate a hyperparameter configuration'''
    # we use a toy example with 2 hyperparameters
    metric = (round(config['x'])-85000)**2 - config['x']/config['y']
    # usually the evaluation takes a non-neglible cost
    # and the cost could be related to certain hyperparameters
    # in this example, we assume it's proportional to x
    time.sleep(config['x']/100000)
    # use tune.report to report the metric to optimize  
    tune.report(metric=metric)

# provide a time budget (in seconds) for the tuning process
time_budget_s = 60
# provide the search space
config_search_space = {
        'x': tune.lograndint(lower=1, upper=100000),
        'y': tune.randint(lower=1, upper=100000)
    }
# provide the low cost partial config
low_cost_partial_config={'x':1}

# set up CFO
cfo = CFO(low_cost_partial_config=low_cost_partial_config)

# set up BlendSearch
blendsearch = BlendSearch(
    metric="metric", mode="min",
    space=config_search_space,
    low_cost_partial_config=low_cost_partial_config,
    time_budget_s=time_budget_s
)
# NOTE: when using BlendSearch as a search_alg in ray tune, you need to
# configure the 'time_budget_s' for BlendSearch accordingly such that
# BlendSearch is aware of the time budget. This step is not needed when
# BlendSearch is used as the search_alg in flaml.tune as it is done
# automatically in flaml.

analysis = raytune.run(
    evaluate_config,    # the function to evaluate a config
    config=config_search_space,
    metric='metric',    # the name of the metric used for optimization
    mode='min',         # the optimization mode, 'min' or 'max'
    num_samples=-1,     # the maximal number of configs to try, -1 means infinite
    time_budget_s=time_budget_s,   # the time budget in seconds
    local_dir='logs/',  # the local directory to store logs
    search_alg=blendsearch  # or cfo
)

print(analysis.best_trial.last_result)  # the best trial's result
print(analysis.best_config)  # the best config
  • Example for using NNI: An example of using BlendSearch with NNI can be seen in test. CFO can be used as well in a similar manner. To run the example, first make sure you have NNI installed, then run:
$nnictl create --config ./config.yml
  • For more examples, please check out notebooks.

flaml offers two HPO methods: CFO and BlendSearch. flaml.tune uses BlendSearch by default.


CFO uses the randomized direct search method FLOW2 with adaptive stepsize and random restart. It requires a low-cost initial point as input if such point exists. The search begins with the low-cost initial point and gradually move to high cost region if needed. The local search method has a provable convergence rate and bounded cost.

About FLOW2: FLOW2 is a simple yet effective randomized direct search method. It is an iterative optimization method that can optimize for black-box functions. FLOW2 only requires pairwise comparisons between function values to perform iterative update. Comparing to existing HPO methods, FLOW2 has the following appealing properties:

  1. It is applicable to general black-box functions with a good convergence rate in terms of loss.
  2. It provides theoretical guarantees on the total evaluation cost incurred.

The GIFs attached below demonstrate an example search trajectory of FLOW2 shown in the loss and evaluation cost (i.e., the training time ) space respectively. From the demonstration, we can see that (1) FLOW2 can quickly move toward the low-loss region, showing good convergence property and (2) FLOW2 tends to avoid exploring the high-cost region until necessary.


Figure 1. FLOW2 in tuning the # of leaves and the # of trees for XGBoost. The two background heatmaps show the loss and cost distribution of all configurations. The black dots are the points evaluated in FLOW2. Black dots connected by lines are points that yield better loss performance when evaluated.

Example:

from flaml import CFO
tune.run(...
    search_alg = CFO(low_cost_partial_config=low_cost_partial_config),
)

Recommended scenario: there exist cost-related hyperparameters and a low-cost initial point is known before optimization. If the search space is complex and CFO gets trapped into local optima, consider using BlendSearch.

BlendSearch: Economical Hyperparameter Optimization With Blended Search Strategy


BlendSearch combines local search with global search. It leverages the frugality of CFO and the space exploration ability of global search methods such as Bayesian optimization. Like CFO, BlendSearch requires a low-cost initial point as input if such point exists, and starts the search from there. Different from CFO, BlendSearch will not wait for the local search to fully converge before trying new start points. The new start points are suggested by the global search method and filtered based on their distance to the existing points in the cost-related dimensions. BlendSearch still gradually increases the trial cost. It prioritizes among the global search thread and multiple local search threads based on optimism in face of uncertainty.

Example:

# require: pip install flaml[blendsearch]
from flaml import BlendSearch
tune.run(...
    search_alg = BlendSearch(low_cost_partial_config=low_cost_partial_config),
)
  • Recommended scenario: cost-related hyperparameters exist, a low-cost initial point is known, and the search space is complex such that local search is prone to be stuck at local optima.

  • Suggestion about using larger search space in BlendSearch: In hyperparameter optimization, a larger search space is desirable because it is more likely to include the optimal configuration (or one of the optimal configurations) in hindsight. However the performance (especially anytime performance) of most existing HPO methods is undesirable if the cost of the configurations in the search space has a large variation. Thus hand-crafted small search spaces (with relatively homogeneous cost) are often used in practice for these methods, which is subject to idiosyncrasy. BlendSearch combines the benefits of local search and global search, which enables a smart (economical) way of deciding where to explore in the search space even though it is larger than necessary. This allows users to specify a larger search space in BlendSearch, which is often easier and a better practice than narrowing down the search space by hand.

For more technical details, please check our papers.

@inproceedings{wu2021cfo,
    title={Frugal Optimization for Cost-related Hyperparameters},
    author={Qingyun Wu and Chi Wang and Silu Huang},
    year={2021},
    booktitle={AAAI'21},
}
@inproceedings{wang2021blendsearch,
    title={Economical Hyperparameter Optimization With Blended Search Strategy},
    author={Chi Wang and Qingyun Wu and Silu Huang and Amin Saied},
    year={2021},
    booktitle={ICLR'21},
}