autogen

mirror of https://github.com/microsoft/autogen.git synced 2025-09-08 15:56:13 +00:00

Author	SHA1	Message	Date
Jane Illarionova	b235fe0098	Expose feature and label transformer in automl.py (#993 ) * expose label and feature transformer * linter apply * avoid undefined attribute in flaml/automl/automl.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * avoid undefined attribute in flaml/automl/automl.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * retrigger checks * retrigger checks --------- Co-authored-by: Chi Wang <wang.chi@microsoft.com>	2023-04-15 19:06:47 +00:00
Jirka Borovec	a701cd82f8	set black with 120 line length (#975 ) * set black with 120 line length * apply pre-commit * apply black	2023-04-10 19:50:40 +00:00
Susan Xueqing Liu	ef5a17cd83	handling nlp divide by zero (#926 ) * handling nlp divide by zero * catching zerodivisionerror * catching zerodivisionerror * catching zerodivisionerror * addressing comments * addressing comments * updating test case * update * add blank to last line * update nlp notebook * rerun * rerun * sync with main * add model selection for nlg * addressing keyerror * add raise exception * update * fix bug * revert * updating automl_nlp * Update flaml/automl/model.py Co-authored-by: Zvi Baratz <z.baratz@gmail.com> * address comments * address comments --------- Co-authored-by: Li Jiang <lijiang1@microsoft.com> Co-authored-by: Zvi Baratz <z.baratz@gmail.com>	2023-04-09 16:53:30 +00:00
Li Jiang	50334f2c52	Support spark dataframe as input dataset and spark models as estimators (#934 ) * add basic support to Spark dataframe add support to SynapseML LightGBM model update to pyspark>=3.2.0 to leverage pandas_on_Spark API * clean code, add TODOs * add sample_train_data for pyspark.pandas dataframe, fix bugs * improve some functions, fix bugs * fix dict change size during iteration * update model predict * update LightGBM model, update test * update SynapseML LightGBM params * update synapseML and tests * update TODOs * Added support to roc_auc for spark models * Added support to score of spark estimator * Added test for automl score of spark estimator * Added cv support to pyspark.pandas dataframe * Update test, fix bugs * Added tests * Updated docs, tests, added a notebook * Fix bugs in non-spark env * Fix bugs and improve tests * Fix uninstall pyspark * Fix tests error * Fix java.lang.OutOfMemoryError: Java heap space * Fix test_performance * Update test_sparkml to test_0sparkml to use the expected spark conf * Remove unnecessary widgets in notebook * Fix iloc java.lang.StackOverflowError * fix pre-commit * Added params check for spark dataframes * Refactor code for train_test_split to a function * Update train_test_split_pyspark * Refactor if-else, remove unnecessary code * Remove y from predict, remove mem control from n_iter compute * Update workflow * Improve _split_pyspark * Fix test failure of too short training time * Fix typos, improve docstrings * Fix index errors of pandas_on_spark, add spark loss metric * Fix typo of ndcgAtK * Update NDCG metrics and tests * Remove unuseful logger * Use cache and count to ensure consistent indexes * refactor for merge maain * fix errors of refactor * Updated SparkLightGBMEstimator and cache * Updated config2params * Remove unused import * Fix unknown parameters * Update default_estimator_list * Add unit tests for spark metrics	2023-03-25 19:59:46 +00:00
Mark Harley	27b2712016	Extract task class from automl (#857 ) * Refactor into automl subpackage Moved some of the packages into an automl subpackage to tidy before the task-based refactor. This is in response to discussions with the group and a comment on the first task-based PR. Only changes here are moving subpackages and modules into the new automl, fixing imports to work with this structure and fixing some dependencies in setup.py. * Fix doc building post automl subpackage refactor * Fix broken links in website post automl subpackage refactor * Fix broken links in website post automl subpackage refactor * Remove vw from test deps as this is breaking the build * Move default back to the top-level I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was. * Re-add top level modules with deprecation warnings flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later. * Fix model.py line-endings * WIP * WIP - Notes below Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after * Re-add generic_task * Fix tests: add Task.__str__ * Fix tests: test for ray.ObjectRef * Hotwire TS_Sklearn wrapper to fix test fail * Remove unused data size field from Task * Fix import for CLASSIFICATION in notebook * Update flaml/automl/data.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Fix review comments * Fix task -> str in custom learner constructor * Remove unused CLASSIFICATION imports * Hotwire TS_Sklearn wrapper to fix test fail by setting optimizer_for_horizon == False * Revert changes to the automl_classification and pin FLAML version * Fix imports in reverted notebook * Fix FLAML version in automl notebooks * Fix ml.py line endings * Fix CLASSIFICATION task import in automl_classification notebook * Uncomment pip install in notebook and revert import Not convinced this will work because of installing an older version of the package into the environment in which we're running the tests, but let's see. * Revert c6a5dd1a0 * Revert "Revert c6a5dd1a0" This reverts commit e55e35adea03993de87b23f092b14c6af623d487. * Black format model.py * Bump version to 1.1.2 in automl_xgboost * Add docstrings to the Task ABC * Fix import in custom_learner * fix 'optimize_for_horizon' for ts_sklearn * remove debugging print statements * Check for is_forecast() before is_classification() in decide_split_type * Attempt to fix formatting fail * Another attempt to fix formatting fail * And another attempt to fix formatting fail * Add type annotations for task arg in signatures and docstrings * Fix formatting * Fix linting --------- Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: EgorKraevTransferwise <egor.kraev@transferwise.com> Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Kevin Chen <chenkevin.8787@gmail.com>	2023-03-11 02:39:08 +00:00
Jirka Borovec	2ff1035733	precommit: end-of-file-fixer (#929 ) * precommit: end-of-file-fixer * exclude .gitignore * apply --------- Co-authored-by: Shaokun <shaokunzhang529@gmail.com>	2023-02-28 16:27:14 +00:00
levscaut	c6a2440348	add PySparkOvertimeMonitor to avoid exceeding time budget (#923 ) * merging * clean commit * Delete mylearner.py This file is not needed. * fix py4j import error * more tolerant cancelling time * fix problems following suggestions * Update flaml/tune/spark/utils.py Co-authored-by: Li Jiang <bnujli@gmail.com> * remove redundant model * Update test/spark/custom_mylearner.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * add docstr * reverse change in gitignore * Update test/spark/custom_mylearner.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> --------- Co-authored-by: Li Jiang <bnujli@gmail.com> Co-authored-by: Chi Wang <wang.chi@microsoft.com>	2023-02-24 08:07:00 +00:00
Andrea Ruggerini	8e447562c7	Improve annotations in automl and ml modules (#919 ) * begin annotation in automl.py and ml.py * EstimatorSubclass + annotate metric * review: fixes + setting fit_kwargs as proper Optional * import from flaml.automl.model (import from flaml.model is deprecated) * comment n_jobs in train_estimator as well * better annotation in _compute_with_config_base Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> --------- Co-authored-by: Andrea W <a.ruggerini@ammagamma.com> Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2023-02-22 02:49:56 +00:00
Jirka Borovec	6aa1d16ebc	pre-commit: update config (#925 ) * update config * apply precommit	2023-02-22 00:49:38 +00:00
Chi Wang	fbea1d06dd	stratified group kfold splitter (#899 ) * stratified group kfold splitter * exclude catboost --------- Co-authored-by: Shaokun <shaokunzhang529@gmail.com> Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2023-02-05 18:26:14 -05:00
Li Jiang	9fde27e536	fix #871 : call check_spark only when necessary (#872 ) Co-authored-by: Li Jiang <lijiang1@microsoft.com>	2023-01-07 07:41:35 -08:00
Antoni Baum	5f67c0ab8a	Do not persist entire AutoMLState in Searcher (#870 ) * Do not persist entire AutoMLState in Searcher Signed-off-by: Antoni Baum <antoni.baum@protonmail.com> * Fix tests Signed-off-by: Antoni Baum <antoni.baum@protonmail.com> Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>	2023-01-05 18:00:05 -08:00
Li Jiang	da2cd7ca89	Add supporting using Spark as the backend of parallel training (#846 ) * Added spark support for parallel training. * Added tests and fixed a bug * Added more tests and updated docs * Updated setup.py and docs * Added customize_learner and tests * Update spark tests and setup.py * Update docs and verbose * Update logging, fix issue in cloud notebook * Update github workflow for spark tests * Update github workflow * Remove hack of handling _choice_ * Allow for failures * Fix tests, update docs * Update setup.py * Update Dockerfile for Spark * Update tests, remove some warnings * Add test for notebooks, update utils * Add performance test for Spark * Fix lru_cache maxsize * Fix test failures on some platforms * Fix coverage report failure * resovle PR comments * resovle PR comments 2nd round * resovle PR comments 3rd round * fix lint and rename test class * resovle PR comments 4th round * refactor customize_learner to broadcast_code	2022-12-23 08:18:49 -08:00
Shaokun	4140fc9022	Format errors on the web. (#855 ) * fix_doc * update * fix lint * fix lint * reformat Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2022-12-22 22:36:34 -05:00
Jing Dong	b2d51b648c	Added an info reminding user that if no time_budget and no max_iter is specified, then effectively zero-shot AutoML is used (#850 ) * Added an info reminding user that if no time_budget and no max_iter is specified, then effectively zero-shot AutoML is used * moved message to line 2818 * Update flaml/automl/automl.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> * Update flaml/automl/automl.py Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Chi Wang <wang.chi@microsoft.com> Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>	2022-12-18 12:49:00 -05:00
Chi Wang	232c356a4b	fix bug related to _choice_ (#848 ) * fix bug related to _choice_ * remove py 3.6 * sanitize config * optimize test	2022-12-13 15:48:32 -05:00
Chi Wang	dbc2e2d796	Use get to avoid KeyError (#824 ) Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: Kevin Chen <74878789+int-chaos@users.noreply.github.com>	2022-12-07 01:23:45 -05:00
Mark Harley	44ddf9e104	Refactor into automl subpackage (#809 ) * Refactor into automl subpackage Moved some of the packages into an automl subpackage to tidy before the task-based refactor. This is in response to discussions with the group and a comment on the first task-based PR. Only changes here are moving subpackages and modules into the new automl, fixing imports to work with this structure and fixing some dependencies in setup.py. * Fix doc building post automl subpackage refactor * Fix broken links in website post automl subpackage refactor * Fix broken links in website post automl subpackage refactor * Remove vw from test deps as this is breaking the build * Move default back to the top-level I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was. * Re-add top level modules with deprecation warnings flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later. * Fix model.py line-endings * Pin pytorch-lightning to less than 1.8.0 We're seeing strange lightning related bugs from pytorch-forecasting since the release of lightning 1.8.0. Going to try constraining this to see if we have a fix. * Fix the lightning version pin Was optimistic with setting it in the 1.7.x range, but that isn't compatible with python 3.6 * Remove lightning version pin * Revert dependency version changes * Minor change to retrigger the build * Fix line endings in ml.py and model.py Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu> Co-authored-by: EgorKraevTransferwise <egor.kraev@transferwise.com>	2022-12-06 15:46:08 -05:00

18 Commits