53 Commits

Author SHA1 Message Date
Kevin Chen
d4273669e6
Time series forecasting with sklearn regressors (#362)
* add sklearn regressors as learners for ts_forecast task

* add direct forecasting strategy
warnings and errors for duplicate rows and missing values

- add preprocess for sklearn time series forecast
 update automl.py
 update test/test_forecast.py

* update model.py and test_forecast.py for cv eval_method

* add "hcrystalball" dependency in setup.py

* update automl.py
- add _validate_ts_data function for abstraction
- include xgb_limitdepth as a learner

* update model.py
- update search space for sklearn ts regressors

* update automl.py and test_forecast.py for numpy array inputs

* add documentations to model.py

* add documentation for removing catboost regressor

* update automl.py
- _validate_ts_data() function

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>
2022-01-06 23:12:38 -08:00
Chi Wang
612668e8ed
serialize TransformerEstimator (#381)
* serialize TransformerEstimator

* check has_attr

* custom metric needs trainer

* skip test on mac
2022-01-06 10:28:19 -08:00
Chi Wang
cd9740f022
Fix several issues for nlp tasks (#380)
* num cpu issue #378;
* temp fix for ray issue #379;
* transformers version.
2022-01-05 13:49:12 -08:00
Xueqing Liu
207b6935d9
adding token classification (#376)
* adding ner
2022-01-03 13:44:10 -05:00
Chi Wang
8602def1c4
logging (#371)
* query logged runs

* mlflow log when using ray

* key check for newer version of ray #363

* catch importerror

* log and load AutoML model

* retrain if necessary when ensemble fails
2022-01-02 21:37:19 -08:00
oberonbot
9c00e4272a
Finish the Multiple Choice Classification (#367)
* adding multiple choice

* update test cases (hard coded)

* merged common code in predict_proba and predict in TransformersEstimator
2022-01-02 20:12:34 -05:00
Xueqing Liu
b2900f4b22
fixing custom metric (#357)
* fixing the error for custom metric
2021-12-24 16:23:09 -05:00
Xueqing Liu
dcfd218108
Fixing the bug in custom metric (#356)
* fixing the bug for custom metric
2021-12-23 18:44:53 -05:00
Xueqing Liu
ee3162e232
Adding the NLP task summarization (#346)
* Add test_autohf_summarization.py

* adding seq2seq

* Update flaml/nlp/huggingface/trainer.py

* rouge metrics

Co-authored-by: XinZofStevens <xzhao4346@gmail.com>
Co-authored-by: JinzhuoWu <wujinzhuo0105@gmail.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-12-20 14:19:32 -08:00
Chi Wang
efd85b4c86
Deploy a new doc website (#338)
A new documentation website. And:

* add actions for doc

* update docstr

* installation instructions for doc dev

* unify README and Getting Started

* rename notebook

* doc about best_model_for_estimator #340

* docstr for keep_search_state #340

* DNN

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: Z.sk <shaokunzhang@psu.edu>
2021-12-16 17:11:33 -08:00
Chi Wang
434586e2e2
train at least one iter when not trained (#336)
* train at least one iter when not trained

* bump version to 0.9.1
2021-12-12 20:05:18 -08:00
Xueqing Liu
1a3e01c352
adding HF metrics (#335)
* adding nlp metrics

* fix ndcg
2021-12-10 12:32:49 -05:00
Chi Wang
54d303a95a
bug fix in confg2params (#323)
* bug fix in confg2params

* set the task property before config2params
2021-12-03 19:37:49 -08:00
Xueqing Liu
fb59bb9928
adding TODOs for NLP module, so students can implement other tasks easier (#321)
* fixing ray pickle bug, skipping macosx bug, completing code for seqregression

* catching connectionerror

* ading TODOs for NLP module
2021-12-03 12:45:16 -05:00
Chi Wang
c57954fbbd
include default value in rf search space (#317)
* include default value in rf search space

* init _mem_per_iter with -1

* bump version to 0.8.2

* docstr for search space's arguments
2021-12-03 09:15:21 -08:00
liususan091219
63f402b29e fixing config2params for transformersestimator 2021-11-26 21:28:38 -08:00
Xueqing Liu
fd136b02d1
bug fix for TransformerEstimator (#293)
* fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode

* finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1

* adding predict_proba, address PR 293's comments

close #293 #291
2021-11-23 11:26:39 -08:00
Chi Wang
ea6d28d7bd
add max_depth to xgboost search space (#282)
* add max_depth to xgboost search space

* notebook update

* two learners for xgboost (max_depth or max_leaves)
2021-11-22 21:17:48 -08:00
Chi Wang
72caa2172d
model_history, ITER_HP, settings in AutoML(), checkpoint bug fix (#283)
if save_best_model_per_estimator is False and retrain_final is True, unfit the model after evaluation in HPO.
retrain if using ray.
update ITER_HP in config after a trial is finished.
change prophet logging level.
example and notebook update.
allow settings to be passed to AutoML constructor. Are you planning to add multi-output-regression capability to FLAML #192 Is multi-tasking allowed? #277 can pass the auotml setting to the constructor instead of requiring a derived class.
remove model_history.
checkpoint bug fix.

* model_history meaning save_best_model_per_estimator

* ITER_HP

* example update

* prophet logging level

* comment update in forecast notebook

* print format improvement

* allow settings to be passed to AutoML constructor

* checkpoint bug fix

* time limit for autohf regression test

* skip slow test on macos

* cleanup before del
2021-11-18 09:39:45 -08:00
Xueqing Liu
42de3075e9
Make NLP tasks available from AutoML.fit() (#210)
Sequence classification and regression: "seq-classification" and "seq-regression"

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-11-16 11:06:20 -08:00
Chi Wang
0d9439212f update docstr 2021-11-06 09:37:33 -07:00
Chi Wang
549a0dfb53
limit time and memory consumption (#264)
* limit time and memory

* separate tests

* lrl1 can't be limited by limit_resource

* free memory when possible

* passthrough=False when ensemble fails;
retrain when trained_estimator is None

* use callback to for resource limit

* handle lower version of xgb with no callback

* free mem ratio

* reduce verbosity

* retrain_final when max_iter==1

* remove trained_estimator from result

* model_history

* wheel

* retrain time as best_config_train_time

* ci: libomp version for xgboost on macos

* limit_resource not working in windows

* test pickle load

* mute forecaster

* notebook update

* check hard

* preventive callback

* add use_ray
2021-11-03 19:08:23 -07:00
Kevin Chen
519bfc2a18
Integrate multivariate time series forecasting (#254)
* Integrate multivariate time series forecasting, now supports
continuous and categorical variables

- update data.py to transform time series data
- update search space
- update documentations to reflect changes
- update test_forecast.py
- rename 'forecast' task to 'ts_forecast' task

* update automl.py and test_forecast.py

* update forecast notebook

* update README.md and setup.py

* update ml.py and test_forecast.py

- make "ds" and "y" constant variables

* replace constants with constant variables

* bump version to 0.7.0

* update setup.py
- support 'forecast' and 'ts_forecast'

* update automl.py and data.py
- support 'forecast' and 'ts_forecast' tasks
2021-10-30 09:48:57 -07:00
Chi Wang
7d6e860102 n_estimators for catboost 2021-10-18 21:56:21 -07:00
Chi Wang
b2d8b097d7 check n_iter == 1 2021-10-18 21:56:21 -07:00
Chi Wang
b03a87e737 no search when max_iter < 2 2021-10-18 21:56:21 -07:00
Chi Wang
524f22bcc5
fix bug in hierarchical search space (#248); optional dependency on lgbm and xgb (#250)
* close #249

* admissible region

* best_config can be None

* optional dependency on lgbm and xgb
resolve #252
2021-10-15 21:36:42 -07:00
Chi Wang
f48ca2618f
warning -> info for low cost partial config (#231)
* warning -> info for low cost partial config
#195, #110

* when n_estimators < 0, use trained_estimator's

* log debug info

* test random seed

* remove "objective"; avoid ZeroDivisionError

* hp config to estimator params

* check type of searcher

* default n_jobs

* try import

* Update searchalgo_auto.py

* CLASSIFICATION

* auto_augment flag

* min_sample_size

* make catboost optional
2021-10-08 16:09:43 -07:00
Chi Wang
a99e939404
update config if n_estimators is modified (#225)
* update config if n_estimators is modified

* prediction as int

* handle the case n_estimators <= 0

* if trained and no budget to train more, return the trained model

* split_type=group for classification & regression
2021-09-27 21:30:49 -07:00
Chi Wang
f4529dfe89
package name in setup (#198)
* package name

* learning to rank example: close #200

* try import prophet #201
2021-09-11 21:19:18 -07:00
Chi Wang
e46573a01d
warmstart blendsearch (#186)
* increase test coverage

* use define by run only when needed

* warmstart bs

* classification -> binary, multi

* warm start with evaluated rewards

* data transformer; resource attr for gs

* BlendSearchTuner bug fix and unittest

* bug fix

* docstr and import

* task type
2021-09-04 01:42:21 -07:00
Chi Wang
6ab0730793
remove catboost training dir; ensemble api; blendsearch for hierarchical space; ranking task; forecast improvement (#178)
* remove catboost training dir

* close #48

* bs for hierarchical space. close #85

* retrain for hierarchical space

* clean ml (#180)

Co-authored-by: Qingyun Wu <qxw5138@psu.edu>

* support ranking task

* examples

* cv shuffle

* forecast api and implementation cleaner

* period constraints

* delete groups after fit
2021-09-01 16:25:04 -07:00
Qingyun Wu
a229a6112a
Support parallel and add random search (#167)
* non hashable value out of signature

* parallel trials

* add random in _search_parallel

* fix bug in retraining

* check memory constraint before training

* retrain_full

* log custom metric

* retraining budget check

* sample size check before retrain

* remove 'time2eval' from result

* report 'total_search_time' in result

* rename total_search_time to wall_clock_time

* rename train_loss boolean to log_training_metric

* set default train_loss to None

* exclude oom result

* log retrained model

* no subsample

* doc str

* notebook

* predicted value is NaN for sarimax

* version

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qxw5138@psu.edu>
2021-08-23 16:36:51 -07:00
Kevin Chen
3d0a3d26a2
Forecast (#162)
* added 'forecast' task with estimators ['fbprophet', 'arima', 'sarimax']

* update setup.py

* add TimeSeriesSplit to 'regression' and 'classification' task

* add 'time' split_type for 'classification' and 'regression' task

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* feature importance

* variable name

* Update test/test_split.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update test/test_forecast.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* prophet installation fail in windows

* upload flaml_forecast.ipynb

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>
2021-08-23 13:26:46 -07:00
Qingyun Wu
10082b9262
v0.5.12 (#150)
* remove extra comma

* exclusive bound

* log file name

* add cost to space

* dataset_format

* add load_openml_dataset test

* docstr

* revise test format

* simplify restore

* order categories

* openml server exception in test

* process space

* add warning

* log format

* reduce n_cpu

* nested space

* hierarchical search space for CFO

* non hierarchical for bs

* unflatten hierarchical config

* connection error

* random sample

* config signature

* check ray version

* preprocess numpy array

* catboost preprocess

* time budget

* seed, verbose, hpo_method

* test cfocat

* shallow copy in flatten_dict
prevent lgbm model duplication

* match estimator name

* quantize and log

* test qloguniform and qrandint

* test qlograndint

* thread.running

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local>
2021-08-11 23:02:22 -07:00
Chi Wang
15fd8adac4
max_leaves (#138)
* max_leaf_nodes in rf and extra_tree

* preprocess numpy str

* free up mem after training
2021-07-27 18:02:49 -07:00
Chi Wang
b3bb00966d
coverage (#135)
* coverage

* readme

* timeout
2021-07-20 17:00:44 -07:00
Chi Wang
072e9e4588
constraint (#132)
* constraint

* ensemble
2021-07-10 09:02:17 -07:00
Qingyun Wu
a291abfab9
Cha cha (#127)
* unordered categorical

* allow cost attribute to be None

* tensorboardX version

* quote

* cfo cat

* trunc

* Update version.py

* incumbent is normalized

* python 3.9

* remove ConcurrencyLimiter

* seed

* estimator

* update autovw notebook

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qiw@microsoft.com>
2021-07-05 18:17:26 -07:00
Chi Wang
c26720c299
api doc for chacha (#105)
* api doc for chacha

* update params

* link to paper

* update dataset id

Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com>
Co-authored-by: Qingyun Wu <qiw@microsoft.com>
2021-06-11 10:25:45 -07:00
Chi Wang
f7cf2ea45a
Multiclass (#99)
* utility functions

* stepsize lower bound
2021-06-04 10:31:33 -07:00
Chi Wang
b206363c9a
metric constraint (#90)
* penalty change

* metric modification

* catboost init
2021-05-22 08:51:38 -07:00
Chi Wang
0b23c3a028
stepsize (#86)
* decrease step size in suggest

* initialization of the counters

* increase step size

* init phase

* check converge in suggest
2021-05-06 21:29:38 -07:00
Qingyun Wu
f4f3f4f17b
update image url (#71)
* update image url

* ArffException

* OpenMLError is ValueError

* CatBoostError

* reduce build on push

Co-authored-by: Chi Wang (MSR) <wang.chi@microsoft.com>
2021-04-21 01:36:06 -07:00
Qingyun Wu
06045703bf
Lgbm w customized obj (#64)
* add customized lgbm learner

* add comments

* fix format issue

* format

* OpenMLError

* add test

* add notebook

Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-04-10 21:14:28 -04:00
Chi Wang
97a7c114ee
Issue58 (#59)
* iter per learner

* code cleanup
2021-04-08 09:29:55 -07:00
Chi Wang
b7a91e0385
V0.3.0 (#55)
* flaml v0.3

* low cost partial config
2021-04-06 11:37:52 -07:00
Chi Wang
37d7518a4c
sample weight in xgboost (#54) 2021-03-31 22:11:56 -07:00
Chi Wang
f28d093522
v0.2.10 (#51)
* increase search space

* None check
2021-03-28 17:54:25 -07:00
Chi Wang
ae5f8e5426
data validation (#45)
* pickle the AutoML object

* get best model per estimator

* test deberta

* stateless API

* prevent divide by zero

* test roberta

* BlendSearchTuner

* delta time

* reindex columns when dropping int-indexed columns

* test drop columns and small training data

* param set for ensemble builder

* fillna on copy

Co-authored-by: Chi Wang (MSR) <chiw@microsoft.com>
2021-03-19 09:50:47 -07:00