5 Commits

Author SHA1 Message Date
levscaut
5eece5c748
Enhance Integration with Spark (#1097)
* add doc for spark

* labelCol equals to label by default

* change title and reformat

* reference about default index type

* fix doc build

* Update website/docs/Examples/Integrate - Spark.md

* update doc

* Added more references

* remove exception case when `y_train.name` is None

* fix broken link

---------

Co-authored-by: Wendong Li <v-wendongli@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-07-10 04:44:01 +00:00
Li Jiang
8b2411b219
update spark session in spark tests (#1006)
* add mlflow and spark integration tests

* remove unused params

* remove mlflow tests
2023-05-03 09:59:29 +00:00
Li Jiang
c9fc622af1
fix tests failure caused by version incompatibility (#995) 2023-04-15 14:52:40 +00:00
Jirka Borovec
a701cd82f8
set black with 120 line length (#975)
* set black with 120 line length

* apply pre-commit

* apply black
2023-04-10 19:50:40 +00:00
Li Jiang
50334f2c52
Support spark dataframe as input dataset and spark models as estimators (#934)
* add basic support to Spark dataframe

add support to SynapseML LightGBM model

update to pyspark>=3.2.0 to leverage pandas_on_Spark API

* clean code, add TODOs

* add sample_train_data for pyspark.pandas dataframe, fix bugs

* improve some functions, fix bugs

* fix dict change size during iteration

* update model predict

* update LightGBM model, update test

* update SynapseML LightGBM params

* update synapseML and tests

* update TODOs

* Added support to roc_auc for spark models

* Added support to score of spark estimator

* Added test for automl score of spark estimator

* Added cv support to pyspark.pandas dataframe

* Update test, fix bugs

* Added tests

* Updated docs, tests, added a notebook

* Fix bugs in non-spark env

* Fix bugs and improve tests

* Fix uninstall pyspark

* Fix tests error

* Fix java.lang.OutOfMemoryError: Java heap space

* Fix test_performance

* Update test_sparkml to test_0sparkml to use the expected spark conf

* Remove unnecessary widgets in notebook

* Fix iloc java.lang.StackOverflowError

* fix pre-commit

* Added params check for spark dataframes

* Refactor code for train_test_split to a function

* Update train_test_split_pyspark

* Refactor if-else, remove unnecessary code

* Remove y from predict, remove mem control from n_iter compute

* Update workflow

* Improve _split_pyspark

* Fix test failure of too short training time

* Fix typos, improve docstrings

* Fix index errors of pandas_on_spark, add spark loss metric

* Fix typo of ndcgAtK

* Update NDCG metrics and tests

* Remove unuseful logger

* Use cache and count to ensure consistent indexes

* refactor for merge maain

* fix errors of refactor

* Updated SparkLightGBMEstimator and cache

* Updated config2params

* Remove unused import

* Fix unknown parameters

* Update default_estimator_list

* Add unit tests for spark metrics
2023-03-25 19:59:46 +00:00