autogen/test/nlp/test_autohf_custom_metric.py

import sys
import pytest
from utils import get_toy_data_seqclassification, get_automl_settings
import os
import shutil


def custom_metric(
    X_test,
    y_test,
    estimator,
    labels,
    X_train,
    y_train,
    weight_test=None,
    weight_train=None,
    config=None,
    groups_test=None,
    groups_train=None,
):
    from datasets import Dataset
    from flaml.model import TransformersEstimator

    if estimator._trainer is None:
        trainer = estimator._init_model_for_predict()
        estimator._trainer = None
    else:
        trainer = estimator._trainer
    X_test, y_test = estimator._tokenize_text(X_test)

    if y_test is not None:
        eval_dataset = Dataset.from_pandas(X_test.join(y_test))
    else:
        eval_dataset = Dataset.from_pandas(X_test)

    estimator_metric_backup = estimator._metric
    estimator._metric = "rmse"
    metrics = trainer.evaluate(eval_dataset)
    estimator._metric = estimator_metric_backup

    return metrics.pop("eval_automl_metric"), metrics


@pytest.mark.skipif(sys.platform == "darwin", reason="do not run on mac os")
def test_custom_metric():
    from flaml import AutoML
    import requests

    X_train, y_train, X_val, y_val, X_test = get_toy_data_seqclassification()
    automl = AutoML()

    try:
        import ray

        if not ray.is_initialized():
            ray.init()
    except ImportError:
        return

    automl_settings = get_automl_settings()
    automl_settings["metric"] = custom_metric
    automl_settings["use_ray"] = {"local_dir": "data/output/"}

    try:
        automl.fit(
            X_train=X_train,
            y_train=y_train,
            X_val=X_val,
            y_val=y_val,
            **automl_settings
        )
    except requests.exceptions.HTTPError:
        return

    # testing calling custom metric in TransformersEstimator._compute_metrics_by_dataset_name

    automl_settings["max_iter"] = 3
    automl.fit(
        X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings
    )
    automl.score(X_val, y_val, **{"metric": custom_metric})
    automl.pickle("automl.pkl")

    del automl

    if os.path.exists("test/data/output/"):
        shutil.rmtree("test/data/output/")


if __name__ == "__main__":
    test_custom_metric()
adding TODOs for NLP module, so students can implement other tasks easier (#321) * fixing ray pickle bug, skipping macosx bug, completing code for seqregression * catching connectionerror * ading TODOs for NLP module 2021-12-03 12:45:16 -05:00			`import sys`
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00			`import pytest`
refactoring TransformersEstimator to support default and custom_hp (#511) * refactoring TransformersEstimator to support default and custom_hp * handling starting_points not in search space * addressing starting point more than max_iter * fixing upper < lower bug 2022-04-28 14:06:29 -04:00			`from utils import get_toy_data_seqclassification, get_automl_settings`
Remove NLP classification head (#756) * rm classification head in nlp * rm classification head in nlp * rm classification head in nlp * adding test cases for switch classification head * adding test cases for switch classification head * Update test/nlp/test_autohf_classificationhead.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * adding test cases for switch classification head * run each test separately * skip classification head test on windows * disabling wandb reporting * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * Update website/docs/Examples/AutoML-NLP.md Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update website/docs/Examples/AutoML-NLP.md Co-authored-by: Chi Wang <wang.chi@microsoft.com> * fix test nlp custom metric Co-authored-by: Chi Wang <wang.chi@microsoft.com> 2022-10-12 20:04:42 -04:00			`import os`
			`import shutil`
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00

fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00			`def custom_metric(`
Fixing the bug in custom metric (#356) * fixing the bug for custom metric 2021-12-23 18:44:53 -05:00			`X_test,`
			`y_test,`
			`estimator,`
			`labels,`
			`X_train,`
			`y_train,`
			`weight_test=None,`
			`weight_train=None,`
			`config=None,`
			`groups_test=None,`
			`groups_train=None,`
			`):`
fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00			`from datasets import Dataset`
			`from flaml.model import TransformersEstimator`

serialize TransformerEstimator (#381) * serialize TransformerEstimator * check has_attr * custom metric needs trainer * skip test on mac 2022-01-06 10:28:19 -08:00			`if estimator._trainer is None:`
refactoring TransformersEstimator to support default and custom_hp (#511) * refactoring TransformersEstimator to support default and custom_hp * handling starting_points not in search space * addressing starting point more than max_iter * fixing upper < lower bug 2022-04-28 14:06:29 -04:00			`trainer = estimator._init_model_for_predict()`
serialize TransformerEstimator (#381) * serialize TransformerEstimator * check has_attr * custom metric needs trainer * skip test on mac 2022-01-06 10:28:19 -08:00			`estimator._trainer = None`
			`else:`
			`trainer = estimator._trainer`
Fixing the issue that FLAML trial number is significantly smaller than Transformers.hyperparameter_search (#657) * fix 636 * adding low cost config * update padding; update tokenization output y type (series -> DF); update low cost init config * updating todf; updating metric_loss_score 2022-08-03 00:11:29 -04:00			`X_test, y_test = estimator._tokenize_text(X_test)`

fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00			`if y_test is not None:`
Fixing the issue that FLAML trial number is significantly smaller than Transformers.hyperparameter_search (#657) * fix 636 * adding low cost config * update padding; update tokenization output y type (series -> DF); update low cost init config * updating todf; updating metric_loss_score 2022-08-03 00:11:29 -04:00			`eval_dataset = Dataset.from_pandas(X_test.join(y_test))`
fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00			`else:`
			`eval_dataset = Dataset.from_pandas(X_test)`

fix issues in logging, bug in space.py, constraint sign, and improve code coverage (#388) * console log handler * version update * doc * skippable steps * notebook update * constraint sign * doc for constraints * bug fix: define-by-run and unflatten_hierarchical * const * handle nested space in indexof() * test grid search * test suggestion * model test * >1 ckpts * always increase iter count * log total # iterations * security patch * make iter_per_learner consistent 2022-01-14 13:39:09 -08:00			`estimator_metric_backup = estimator._metric`
			`estimator._metric = "rmse"`
fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00			`metrics = trainer.evaluate(eval_dataset)`
fix issues in logging, bug in space.py, constraint sign, and improve code coverage (#388) * console log handler * version update * doc * skippable steps * notebook update * constraint sign * doc for constraints * bug fix: define-by-run and unflatten_hierarchical * const * handle nested space in indexof() * test grid search * test suggestion * model test * >1 ckpts * always increase iter count * log total # iterations * security patch * make iter_per_learner consistent 2022-01-14 13:39:09 -08:00			`estimator._metric = estimator_metric_backup`
Logging multiple checkpoints (#394) 2022-01-12 22:50:39 -05:00
moving intermediate_results logging from model.py to huggingface/trainer.py (#403) * replacing val_loss with automl_metric 2022-01-14 20:26:10 -05:00			`return metrics.pop("eval_automl_metric"), metrics`
Fixing the bug in custom metric (#356) * fixing the bug for custom metric 2021-12-23 18:44:53 -05:00

adding TODOs for NLP module, so students can implement other tasks easier (#321) * fixing ray pickle bug, skipping macosx bug, completing code for seqregression * catching connectionerror * ading TODOs for NLP module 2021-12-03 12:45:16 -05:00			`@pytest.mark.skipif(sys.platform == "darwin", reason="do not run on mac os")`
Fixing the bug in custom metric (#356) * fixing the bug for custom metric 2021-12-23 18:44:53 -05:00			`def test_custom_metric():`
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00			`from flaml import AutoML`
adding catch for HTTP error (#432) 2022-01-30 01:53:32 -05:00			`import requests`
remove redundant imports (#426) * remove redundant imports * getting ride of hf dataset 2022-01-24 17:24:14 -05:00
refactoring TransformersEstimator to support default and custom_hp (#511) * refactoring TransformersEstimator to support default and custom_hp * handling starting_points not in search space * addressing starting point more than max_iter * fixing upper < lower bug 2022-04-28 14:06:29 -04:00			`X_train, y_train, X_val, y_val, X_test = get_toy_data_seqclassification()`
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00			`automl = AutoML()`

adding evaluation (#495) * adding automl.score * fixing the metric name in train_with_config * adding pickle after score * fixing a bug in automl.pickle 2022-03-25 17:00:08 -04:00			`try:`
			`import ray`

			`if not ray.is_initialized():`
			`ray.init()`
			`except ImportError:`
			`return`

refactoring TransformersEstimator to support default and custom_hp (#511) * refactoring TransformersEstimator to support default and custom_hp * handling starting_points not in search space * addressing starting point more than max_iter * fixing upper < lower bug 2022-04-28 14:06:29 -04:00			`automl_settings = get_automl_settings()`
			`automl_settings["metric"] = custom_metric`
			`automl_settings["use_ray"] = {"local_dir": "data/output/"}`
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00
adding catch for HTTP error (#432) 2022-01-30 01:53:32 -05:00			`try:`
			`automl.fit(`
			`X_train=X_train,`
			`y_train=y_train,`
			`X_val=X_val,`
			`y_val=y_val,`
			`**automl_settings`
			`)`
			`except requests.exceptions.HTTPError:`
			`return`
Fixing the bug in custom metric (#356) * fixing the bug for custom metric 2021-12-23 18:44:53 -05:00
			`# testing calling custom metric in TransformersEstimator._compute_metrics_by_dataset_name`

			`automl_settings["max_iter"] = 3`
			`automl.fit(`
			`X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings`
			`)`
adding evaluation (#495) * adding automl.score * fixing the metric name in train_with_config * adding pickle after score * fixing a bug in automl.pickle 2022-03-25 17:00:08 -04:00			`automl.score(X_val, y_val, **{"metric": custom_metric})`
			`automl.pickle("automl.pkl")`
Fixing the bug in custom metric (#356) * fixing the bug for custom metric 2021-12-23 18:44:53 -05:00
bug fix for TransformerEstimator (#293) * fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode * finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1 * adding predict_proba, address PR 293's comments close #293 #291 2021-11-23 14:26:39 -05:00			`del automl`
fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00
Remove NLP classification head (#756) * rm classification head in nlp * rm classification head in nlp * rm classification head in nlp * adding test cases for switch classification head * adding test cases for switch classification head * Update test/nlp/test_autohf_classificationhead.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * adding test cases for switch classification head * run each test separately * skip classification head test on windows * disabling wandb reporting * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * fix test nlp custom metric * Update website/docs/Examples/AutoML-NLP.md Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update website/docs/Examples/AutoML-NLP.md Co-authored-by: Chi Wang <wang.chi@microsoft.com> * fix test nlp custom metric Co-authored-by: Chi Wang <wang.chi@microsoft.com> 2022-10-12 20:04:42 -04:00			`if os.path.exists("test/data/output/"):`
			`shutil.rmtree("test/data/output/")`

fixing custom metric (#357) * fixing the error for custom metric 2021-12-24 16:23:09 -05:00
			`if __name__ == "__main__":`
			`test_custom_metric()`