62 Commits

Author SHA1 Message Date
Mark Harley
44ddf9e104
Refactor into automl subpackage (#809)
* Refactor into automl subpackage

Moved some of the packages into an automl subpackage to tidy before the
task-based refactor. This is in response to discussions with the group
and a comment on the first task-based PR.

Only changes here are moving subpackages and modules into the new
automl, fixing imports to work with this structure and fixing some
dependencies in setup.py.

* Fix doc building post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Remove vw from test deps as this is breaking the build

* Move default back to the top-level

I'd moved this to automl as that's where it's used internally, but had
missed that this is actually part of the public interface so makes sense
to live where it was.

* Re-add top level modules with deprecation warnings

flaml.data, flaml.ml and flaml.model are re-added to the top level,
being re-exported from flaml.automl for backwards compatability. Adding
a deprecation warning so that we can have a planned removal later.

* Fix model.py line-endings

* Pin pytorch-lightning to less than 1.8.0

We're seeing strange lightning related bugs from pytorch-forecasting
since the release of lightning 1.8.0. Going to try constraining this to
see if we have a fix.

* Fix the lightning version pin

Was optimistic with setting it in the 1.7.x range, but that isn't
compatible with python 3.6

* Remove lightning version pin

* Revert dependency version changes

* Minor change to retrigger the build

* Fix line endings in ml.py and model.py

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: EgorKraevTransferwise <egor.kraev@transferwise.com>
2022-12-06 15:46:08 -05:00
Chi Wang
92b79221b6
make performance test reproducible (#837)
* make performance test reproducible

* fix test error

* Doc update and disable logging

* document random_state and version

* remove hardcoded budget

* fix test error and dependency; close #777

* iloc
2022-12-06 10:13:39 -08:00
Shreyas
3b3b0bfa8e
roc_auc_weighted metric addition (#827)
* Pending changes exported from your codespace

* Update flaml/automl.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/automl.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/docs/Examples/Integrate - Scikit-learn Pipeline.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* added documentation for new metric

* Update flaml/ml.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* minor notebook changes

* Update Integrate - Scikit-learn Pipeline.md

* Update notebook/automl_classification.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update integrate_azureml.ipynb

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2022-12-02 19:27:32 -08:00
skzhang1
96f0688595 fix 2022-08-24 02:58:18 +00:00
skzhang1
50fb20ebbc update 2022-08-21 17:44:26 +00:00
skzhang1
462c27f8ae fix 2022-08-21 12:54:58 +00:00
skzhang1
e3aa7ea9d1 update 2022-08-21 12:52:27 +00:00
skzhang1
34085b8c25 update 2022-08-15 14:41:30 +00:00
skzhang1
a55ed0ed61 update 2022-08-13 18:56:46 +00:00
skzhang1
fc633ef15e update 2022-08-13 18:51:33 +00:00
jmrichardson
e43485607a
Disable shuffle for custom CV (#659)
* Disable shuffle for custom CV

* Add custom fold shuffle test

* Update test_split.py

* Update test_split.py
2022-08-12 17:05:32 -07:00
Kevin Chen
f718d18b5e
time series forecasting with panel datasets (#541)
* time series forecasting with panel datasets
- integrate Temporal Fusion Transformer as a learner based on pytorchforecasting

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update setup.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update test_forecast.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update setup.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update setup.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update model.py and test_forecast.py
- remove blank lines

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update model.py to prevent errors

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update automl.py and data.py
- change forecast task name
- update documentation for fit() method

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update test_forecast.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update test_forecast.py
- add performance test
- use 'fit_kwargs_by_estimator'

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* add time index function

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update test_forecast.py performance test

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update data.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update automl.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update data.py to prevent type error

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update setup.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update for pytorch forecasting tft on panel datasets

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update automl.py documentations

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* - rename estimator
- add 'gpu_per_trial' for tft estimator

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update test_forecast.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* include ts panel forecasting as an example

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update model.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update documentations

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update automl_time_series_forecast.ipynb

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update documentations

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* "weights_summary" argument deprecated and removed for pl.Trainer()

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update model.py tft estimator prediction method

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update model.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update `fit_kwargs` documentation

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* update automl.py

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2022-08-12 08:39:22 -07:00
skzhang1
e3c9da50da update 2022-08-10 00:42:47 +00:00
skzhang1
e5a422c41e update 2022-08-07 18:11:04 +00:00
Xueqing Liu
21fa6c10ec
Fixing the issue that FLAML trial number is significantly smaller than Transformers.hyperparameter_search (#657)
* fix 636

* adding low cost config

* update padding; update tokenization output y type (series -> DF); update low cost init config

* updating todf; updating metric_loss_score
2022-08-03 00:11:29 -04:00
Xueqing Liu
6108493e0b
fix ner bug; refactor post processing of TransformersEstimator prediction (#615)
* fix ner bug; refactor post processing

* fix too many values to unpack

* supporting id/token label for NER
2022-07-05 13:38:21 -04:00
Chi Wang
c45741a67b
support latest xgboost version (#599)
* support latest xgboost version

* Update test_classification.py

* Update 

Exists problems when installing xgb1.6.1 in py3.6

* cleanup

* xgboost version

* remove time_budget_s in test

* remove redundancy

* stop support of python 3.6

Co-authored-by: zsk <shaokunzhang529@gmail.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2022-06-21 18:59:07 -07:00
Xueqing Liu
7740cd3466
trying to fix the indexerror for ner (#596)
* trying to fix the indexerror for ner
2022-06-16 14:58:23 -04:00
Xueqing Liu
e0e317bfb1
fixing trainable and update function, completing NOTE (#566)
* fix checkpoint naming + trial id for non-ray mode, fix the bug in running test mode, delete all the checkpoints in non-ray mode

* finished testing for checkpoint naming, delete checkpoint, ray, max iter = 1
2022-06-03 15:19:22 -04:00
Xueqing Liu
2a8decdc50
fix the post-processing bug in NER (#534)
* fix conll bug

* update DataCollatorForAuto

* adding label_list comments
2022-05-10 17:22:57 -04:00
Xueqing Liu
ca35fa969f
refactoring TransformersEstimator to support default and custom_hp (#511)
* refactoring TransformersEstimator to support default and custom_hp

* handling starting_points not in search space

* addressing starting point more than max_iter

* fixing upper < lower bug
2022-04-28 14:06:29 -04:00
Xueqing Liu
5f97532986
adding evaluation (#495)
* adding automl.score

* fixing the metric name in train_with_config

* adding pickle after score

* fixing a bug in automl.pickle
2022-03-25 17:00:08 -04:00
Xueqing Liu
af423463c3
fixing bug for ner (#463)
* fixing bug for ner

* removing global var

* adding class for trial counter

* adding notebook

* adding use_ray dict

* updating documentation for nlp
2022-03-20 22:03:02 -04:00
Kevin Chen
81f54026c9
Support time series forecasting for discrete target variable (#416)
* support 'ts_forecast_classification' task to forecast discrete values

* update test_forecast.py
- add test for forecasting discrete values

* update test_model.py

* pre-commit changes
2022-01-24 18:39:36 -08:00
Xueqing Liu
f41f1c2198
Logging multiple checkpoints (#394) 2022-01-12 19:50:39 -08:00
Kevin Chen
d4273669e6
Time series forecasting with sklearn regressors (#362)
* add sklearn regressors as learners for ts_forecast task

* add direct forecasting strategy
warnings and errors for duplicate rows and missing values

- add preprocess for sklearn time series forecast
 update automl.py
 update test/test_forecast.py

* update model.py and test_forecast.py for cv eval_method

* add "hcrystalball" dependency in setup.py

* update automl.py
- add _validate_ts_data function for abstraction
- include xgb_limitdepth as a learner

* update model.py
- update search space for sklearn ts regressors

* update automl.py and test_forecast.py for numpy array inputs

* add documentations to model.py

* add documentation for removing catboost regressor

* update automl.py
- _validate_ts_data() function

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>
2022-01-06 23:12:38 -08:00
Xueqing Liu
207b6935d9
adding token classification (#376)
* adding ner
2022-01-03 13:44:10 -05:00
Xueqing Liu
ee3162e232
Adding the NLP task summarization (#346)
* Add test_autohf_summarization.py

* adding seq2seq

* Update flaml/nlp/huggingface/trainer.py

* rouge metrics

Co-authored-by: XinZofStevens <xzhao4346@gmail.com>
Co-authored-by: JinzhuoWu <wujinzhuo0105@gmail.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-12-20 14:19:32 -08:00
Chia-Chi Hsu
671ccbbe3f
support for customized splitters (#333)
* add support for customized splitters

* use the param split_type for feeding generators

* use single API for customized splitter and add test

* when task==TS_FORCAST, always set shuffle=False

* update docstr

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-12-16 16:13:04 -08:00
Xueqing Liu
1a3e01c352
adding HF metrics (#335)
* adding nlp metrics

* fix ndcg
2021-12-10 12:32:49 -05:00
Chi Wang
18230ed22f
pred_time_limit clarification and logging (#319)
* pred_time_limit clarification

* log prediction time

* handle ChunkedEncodingError in test
2021-12-03 16:02:00 -08:00
Chi Wang
85e21864ce
test -> val; docstr (#300)
* rename test -> val in custom metric function
* add an example in docstr
resolve #299
2021-11-22 22:17:29 -08:00
Chi Wang
ea6d28d7bd
add max_depth to xgboost search space (#282)
* add max_depth to xgboost search space

* notebook update

* two learners for xgboost (max_depth or max_leaves)
2021-11-22 21:17:48 -08:00
Xueqing Liu
42de3075e9
Make NLP tasks available from AutoML.fit() (#210)
Sequence classification and regression: "seq-classification" and "seq-regression"

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2021-11-16 11:06:20 -08:00
Chi Wang
5b0932e442
Unify regression and classification for XGBoost (#276)
* scikit-learn API for XGBoostRegressor
2021-11-09 21:23:54 -08:00
Chi Wang
c4d5986ee8 no retraining when max_iter=0 and not retrain_full 2021-11-06 11:37:57 -07:00
Chi Wang
0d9439212f update docstr 2021-11-06 09:37:33 -07:00
Kevin Chen
519bfc2a18
Integrate multivariate time series forecasting (#254)
* Integrate multivariate time series forecasting, now supports
continuous and categorical variables

- update data.py to transform time series data
- update search space
- update documentations to reflect changes
- update test_forecast.py
- rename 'forecast' task to 'ts_forecast' task

* update automl.py and test_forecast.py

* update forecast notebook

* update README.md and setup.py

* update ml.py and test_forecast.py

- make "ds" and "y" constant variables

* replace constants with constant variables

* bump version to 0.7.0

* update setup.py
- support 'forecast' and 'ts_forecast'

* update automl.py and data.py
- support 'forecast' and 'ts_forecast' tasks
2021-10-30 09:48:57 -07:00
Chi Wang
7d6e860102 n_estimators for catboost 2021-10-18 21:56:21 -07:00
Chi Wang
f48ca2618f
warning -> info for low cost partial config (#231)
* warning -> info for low cost partial config
#195, #110

* when n_estimators < 0, use trained_estimator's

* log debug info

* test random seed

* remove "objective"; avoid ZeroDivisionError

* hp config to estimator params

* check type of searcher

* default n_jobs

* try import

* Update searchalgo_auto.py

* CLASSIFICATION

* auto_augment flag

* min_sample_size

* make catboost optional
2021-10-08 16:09:43 -07:00
Chi Wang
a99e939404
update config if n_estimators is modified (#225)
* update config if n_estimators is modified

* prediction as int

* handle the case n_estimators <= 0

* if trained and no budget to train more, return the trained model

* split_type=group for classification & regression
2021-09-27 21:30:49 -07:00
Chi Wang
16a97bec76
set converge flag when no trial can be sampled (#217)
* set converge flag when no trial can be sampled

* require custom_metric to return dict for logging
close #218

* estimate time budget needed

* log info per iteration
2021-09-23 10:49:02 -07:00
Chi Wang
f4529dfe89
package name in setup (#198)
* package name

* learning to rank example: close #200

* try import prophet #201
2021-09-11 21:19:18 -07:00
Chi Wang
e46573a01d
warmstart blendsearch (#186)
* increase test coverage

* use define by run only when needed

* warmstart bs

* classification -> binary, multi

* warm start with evaluated rewards

* data transformer; resource attr for gs

* BlendSearchTuner bug fix and unittest

* bug fix

* docstr and import

* task type
2021-09-04 01:42:21 -07:00
Qingyun Wu
5fdfa2559b
Cleanml (#185)
* reorg ml

* return y_pred in eval_estimator

* add train loss into metric_for_logging dict
2021-09-02 13:07:30 -07:00
Chi Wang
6ab0730793
remove catboost training dir; ensemble api; blendsearch for hierarchical space; ranking task; forecast improvement (#178)
* remove catboost training dir

* close #48

* bs for hierarchical space. close #85

* retrain for hierarchical space

* clean ml (#180)

Co-authored-by: Qingyun Wu <qxw5138@psu.edu>

* support ranking task

* examples

* cv shuffle

* forecast api and implementation cleaner

* period constraints

* delete groups after fit
2021-09-01 16:25:04 -07:00
Qingyun Wu
a229a6112a
Support parallel and add random search (#167)
* non hashable value out of signature

* parallel trials

* add random in _search_parallel

* fix bug in retraining

* check memory constraint before training

* retrain_full

* log custom metric

* retraining budget check

* sample size check before retrain

* remove 'time2eval' from result

* report 'total_search_time' in result

* rename total_search_time to wall_clock_time

* rename train_loss boolean to log_training_metric

* set default train_loss to None

* exclude oom result

* log retrained model

* no subsample

* doc str

* notebook

* predicted value is NaN for sarimax

* version

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qxw5138@psu.edu>
2021-08-23 16:36:51 -07:00
Kevin Chen
3d0a3d26a2
Forecast (#162)
* added 'forecast' task with estimators ['fbprophet', 'arima', 'sarimax']

* update setup.py

* add TimeSeriesSplit to 'regression' and 'classification' task

* add 'time' split_type for 'classification' and 'regression' task

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>

* feature importance

* variable name

* Update test/test_split.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update test/test_forecast.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* prophet installation fail in windows

* upload flaml_forecast.ipynb

Signed-off-by: Kevin Chen <chenkevin.8787@gmail.com>
2021-08-23 13:26:46 -07:00
すずまる
6270353458
support ROC and AUC for multi-class classification (#170)
* support ROC and AUC for multi-class classification

* add a test case to cover ROC and AUC for multi-class classification
2021-08-22 15:16:10 -07:00
Qingyun Wu
10082b9262
v0.5.12 (#150)
* remove extra comma

* exclusive bound

* log file name

* add cost to space

* dataset_format

* add load_openml_dataset test

* docstr

* revise test format

* simplify restore

* order categories

* openml server exception in test

* process space

* add warning

* log format

* reduce n_cpu

* nested space

* hierarchical search space for CFO

* non hierarchical for bs

* unflatten hierarchical config

* connection error

* random sample

* config signature

* check ray version

* preprocess numpy array

* catboost preprocess

* time budget

* seed, verbose, hpo_method

* test cfocat

* shallow copy in flatten_dict
prevent lgbm model duplication

* match estimator name

* quantize and log

* test qloguniform and qrandint

* test qlograndint

* thread.running

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qingyunwu@Qingyuns-MacBook-Pro-2.local>
2021-08-11 23:02:22 -07:00