mirror of
https://github.com/microsoft/autogen.git
synced 2025-11-05 04:09:51 +00:00
* add basic support to Spark dataframe add support to SynapseML LightGBM model update to pyspark>=3.2.0 to leverage pandas_on_Spark API * clean code, add TODOs * add sample_train_data for pyspark.pandas dataframe, fix bugs * improve some functions, fix bugs * fix dict change size during iteration * update model predict * update LightGBM model, update test * update SynapseML LightGBM params * update synapseML and tests * update TODOs * Added support to roc_auc for spark models * Added support to score of spark estimator * Added test for automl score of spark estimator * Added cv support to pyspark.pandas dataframe * Update test, fix bugs * Added tests * Updated docs, tests, added a notebook * Fix bugs in non-spark env * Fix bugs and improve tests * Fix uninstall pyspark * Fix tests error * Fix java.lang.OutOfMemoryError: Java heap space * Fix test_performance * Update test_sparkml to test_0sparkml to use the expected spark conf * Remove unnecessary widgets in notebook * Fix iloc java.lang.StackOverflowError * fix pre-commit * Added params check for spark dataframes * Refactor code for train_test_split to a function * Update train_test_split_pyspark * Refactor if-else, remove unnecessary code * Remove y from predict, remove mem control from n_iter compute * Update workflow * Improve _split_pyspark * Fix test failure of too short training time * Fix typos, improve docstrings * Fix index errors of pandas_on_spark, add spark loss metric * Fix typo of ndcgAtK * Update NDCG metrics and tests * Remove unuseful logger * Use cache and count to ensure consistent indexes * refactor for merge maain * fix errors of refactor * Updated SparkLightGBMEstimator and cache * Updated config2params * Remove unused import * Fix unknown parameters * Update default_estimator_list * Add unit tests for spark metrics
19 lines
703 B
Python
19 lines
703 B
Python
from typing import Optional, Union, Tuple
|
|
import numpy as np
|
|
|
|
|
|
def len_labels(y: np.ndarray, return_labels=False) -> Union[int, Optional[np.ndarray]]:
|
|
"""Get the number of unique labels in y. The non-spark version of
|
|
flaml.automl.spark.utils.len_labels"""
|
|
labels = np.unique(y)
|
|
if return_labels:
|
|
return len(labels), labels
|
|
return len(labels)
|
|
|
|
|
|
def unique_value_first_index(y: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
|
|
"""Get the unique values and indices of a pandas series or numpy array.
|
|
The non-spark version of flaml.automl.spark.utils.unique_value_first_index"""
|
|
label_set, first_index = np.unique(y, return_index=True)
|
|
return label_set, first_index
|