autogen/test/test_training_log.py

import os
import unittest
from tempfile import TemporaryDirectory

from sklearn.datasets import fetch_california_housing

from flaml import AutoML
from flaml.training_log import training_log_reader


class TestTrainingLog(unittest.TestCase):
    def test_training_log(self, path="test_training_log.log"):

        with TemporaryDirectory() as d:
            filename = os.path.join(d, path)

            # Run a simple job.
            automl = AutoML()
            automl_settings = {
                "time_budget": 1,
                "metric": "mse",
                "task": "regression",
                "log_file_name": filename,
                "log_training_metric": True,
                "mem_thres": 1024 * 1024,
                "n_jobs": 1,
                "model_history": True,
                "train_time_limit": 0.01,
                "verbose": 3,
                "ensemble": True,
                "keep_search_state": True,
            }
            X_train, y_train = fetch_california_housing(return_X_y=True)
            automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
            automl._state._train_with_config(automl.best_estimator, automl.best_config)

            # Check if the training log file is populated.
            self.assertTrue(os.path.exists(filename))
            with training_log_reader(filename) as reader:
                count = 0
                for record in reader.records():
                    print(record)
                    count += 1
                self.assertGreater(count, 0)

            automl_settings["log_file_name"] = None
            automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
            automl._selected.update(None, 0)
            automl = AutoML()
            automl.fit(X_train=X_train, y_train=y_train, max_iter=0, task="regression")

    def test_illfilename(self):
        try:
            self.test_training_log("/")
        except IsADirectoryError:
            print("IsADirectoryError happens as expected in linux.")
        except PermissionError:
            print("PermissionError happens as expected in windows.")
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00			`import os`
			`import unittest`
			`from tempfile import TemporaryDirectory`

warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`from sklearn.datasets import fetch_california_housing`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00
			`from flaml import AutoML`
			`from flaml.training_log import training_log_reader`


			`class TestTrainingLog(unittest.TestCase):`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`def test_training_log(self, path="test_training_log.log"):`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00
			`with TemporaryDirectory() as d:`
v0.5.12 (#150) * remove extra comma * exclusive bound * log file name * add cost to space * dataset_format * add load_openml_dataset test * docstr * revise test format * simplify restore * order categories * openml server exception in test * process space * add warning * log format * reduce n_cpu * nested space * hierarchical search space for CFO * non hierarchical for bs * unflatten hierarchical config * connection error * random sample * config signature * check ray version * preprocess numpy array * catboost preprocess * time budget * seed, verbose, hpo_method * test cfocat * shallow copy in flatten_dict prevent lgbm model duplication * match estimator name * quantize and log * test qloguniform and qrandint * test qlograndint * thread.running Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local> 2021-08-12 02:02:22 -04:00			`filename = os.path.join(d, path)`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00
			`# Run a simple job.`
warmstart blendsearch (#186) * increase test coverage * use define by run only when needed * warmstart bs * classification -> binary, multi * warm start with evaluated rewards * data transformer; resource attr for gs * BlendSearchTuner bug fix and unittest * bug fix * docstr and import * task type 2021-09-04 01:42:21 -07:00			`automl = AutoML()`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00			`automl_settings = {`
warmstart blendsearch (#186) * increase test coverage * use define by run only when needed * warmstart bs * classification -> binary, multi * warm start with evaluated rewards * data transformer; resource attr for gs * BlendSearchTuner bug fix and unittest * bug fix * docstr and import * task type 2021-09-04 01:42:21 -07:00			`"time_budget": 1,`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`"metric": "mse",`
			`"task": "regression",`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00			`"log_file_name": filename,`
			`"log_training_metric": True,`
Issue58 (#59) * iter per learner * code cleanup 2021-04-08 09:29:55 -07:00			`"mem_thres": 1024 * 1024,`
V0.2.2 (#19) * v0.2.2 separate the HPO part into the module flaml.tune enhanced implementation of FLOW^2, CFO and BlendSearch support parallel tuning using ray tune add support for sample_weight and generic fit arguments enable mlflow logging Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com> Co-authored-by: qingyun-wu <qw2ky@virginia.edu> 2021-02-05 21:41:14 -08:00			`"n_jobs": 1,`
data validation (#45) * pickle the AutoML object * get best model per estimator * test deberta * stateless API * prevent divide by zero * test roberta * BlendSearchTuner * delta time * reindex columns when dropping int-indexed columns * test drop columns and small training data * param set for ensemble builder * fillna on copy Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com> 2021-03-19 09:50:47 -07:00			`"model_history": True,`
constraint (#132) * constraint * ensemble 2021-07-10 09:02:17 -07:00			`"train_time_limit": 0.01,`
			`"verbose": 3,`
coverage (#135) * coverage * readme * timeout 2021-07-20 17:00:44 -07:00			`"ensemble": True,`
warmstart blendsearch (#186) * increase test coverage * use define by run only when needed * warmstart bs * classification -> binary, multi * warm start with evaluated rewards * data transformer; resource attr for gs * BlendSearchTuner bug fix and unittest * bug fix * docstr and import * task type 2021-09-04 01:42:21 -07:00			`"keep_search_state": True,`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00			`}`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`X_train, y_train = fetch_california_housing(return_X_y=True)`
warmstart blendsearch (#186) * increase test coverage * use define by run only when needed * warmstart bs * classification -> binary, multi * warm start with evaluated rewards * data transformer; resource attr for gs * BlendSearchTuner bug fix and unittest * bug fix * docstr and import * task type 2021-09-04 01:42:21 -07:00			`automl.fit(X_train=X_train, y_train=y_train, **automl_settings)`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`automl._state._train_with_config(automl.best_estimator, automl.best_config)`
Fix #11; add tests for training log and python logger (#12) 2020-12-14 23:10:03 -08:00
			`# Check if the training log file is populated.`
			`self.assertTrue(os.path.exists(filename))`
			`with training_log_reader(filename) as reader:`
			`count = 0`
			`for record in reader.records():`
			`print(record)`
			`count += 1`
			`self.assertGreater(count, 0)`
v0.5.12 (#150) * remove extra comma * exclusive bound * log file name * add cost to space * dataset_format * add load_openml_dataset test * docstr * revise test format * simplify restore * order categories * openml server exception in test * process space * add warning * log format * reduce n_cpu * nested space * hierarchical search space for CFO * non hierarchical for bs * unflatten hierarchical config * connection error * random sample * config signature * check ray version * preprocess numpy array * catboost preprocess * time budget * seed, verbose, hpo_method * test cfocat * shallow copy in flatten_dict prevent lgbm model duplication * match estimator name * quantize and log * test qloguniform and qrandint * test qlograndint * thread.running Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local> 2021-08-12 02:02:22 -04:00
			`automl_settings["log_file_name"] = None`
warmstart blendsearch (#186) * increase test coverage * use define by run only when needed * warmstart bs * classification -> binary, multi * warm start with evaluated rewards * data transformer; resource attr for gs * BlendSearchTuner bug fix and unittest * bug fix * docstr and import * task type 2021-09-04 01:42:21 -07:00			`automl.fit(X_train=X_train, y_train=y_train, **automl_settings)`
			`automl._selected.update(None, 0)`
			`automl = AutoML()`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`automl.fit(X_train=X_train, y_train=y_train, max_iter=0, task="regression")`
v0.5.12 (#150) * remove extra comma * exclusive bound * log file name * add cost to space * dataset_format * add load_openml_dataset test * docstr * revise test format * simplify restore * order categories * openml server exception in test * process space * add warning * log format * reduce n_cpu * nested space * hierarchical search space for CFO * non hierarchical for bs * unflatten hierarchical config * connection error * random sample * config signature * check ray version * preprocess numpy array * catboost preprocess * time budget * seed, verbose, hpo_method * test cfocat * shallow copy in flatten_dict prevent lgbm model duplication * match estimator name * quantize and log * test qloguniform and qrandint * test qlograndint * thread.running Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local> 2021-08-12 02:02:22 -04:00
			`def test_illfilename(self):`
			`try:`
warning -> info for low cost partial config (#231) * warning -> info for low cost partial config #195, #110 * when n_estimators < 0, use trained_estimator's * log debug info * test random seed * remove "objective"; avoid ZeroDivisionError * hp config to estimator params * check type of searcher * default n_jobs * try import * Update searchalgo_auto.py * CLASSIFICATION * auto_augment flag * min_sample_size * make catboost optional 2021-10-08 16:09:43 -07:00			`self.test_training_log("/")`
v0.5.12 (#150) * remove extra comma * exclusive bound * log file name * add cost to space * dataset_format * add load_openml_dataset test * docstr * revise test format * simplify restore * order categories * openml server exception in test * process space * add warning * log format * reduce n_cpu * nested space * hierarchical search space for CFO * non hierarchical for bs * unflatten hierarchical config * connection error * random sample * config signature * check ray version * preprocess numpy array * catboost preprocess * time budget * seed, verbose, hpo_method * test cfocat * shallow copy in flatten_dict prevent lgbm model duplication * match estimator name * quantize and log * test qloguniform and qrandint * test qlograndint * thread.running Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local> 2021-08-12 02:02:22 -04:00			`except IsADirectoryError:`
			`print("IsADirectoryError happens as expected in linux.")`
			`except PermissionError:`
			`print("PermissionError happens as expected in windows.")`