autogen/website/docs/Installation.md
Li Jiang da2cd7ca89
Add supporting using Spark as the backend of parallel training (#846)
* Added spark support for parallel training.

* Added tests and fixed a bug

* Added more tests and updated docs

* Updated setup.py and docs

* Added customize_learner and tests

* Update spark tests and setup.py

* Update docs and verbose

* Update logging, fix issue in cloud notebook

* Update github workflow for spark tests

* Update github workflow

* Remove hack of handling _choice_

* Allow for failures

* Fix tests, update docs

* Update setup.py

* Update Dockerfile for Spark

* Update tests, remove some warnings

* Add test for notebooks, update utils

* Add performance test for Spark

* Fix lru_cache maxsize

* Fix test failures on some platforms

* Fix coverage report failure

* resovle PR comments

* resovle PR comments 2nd round

* resovle PR comments 3rd round

* fix lint and rename test class

* resovle PR comments 4th round

* refactor customize_learner to broadcast_code
2022-12-23 08:18:49 -08:00

3.8 KiB

Installation

Python

FLAML requires Python version >= 3.7. It can be installed from pip:

pip install flaml

or conda:

conda install flaml -c conda-forge

Optional Dependencies

Notebook

To run the notebook examples, install flaml with the [notebook] option:

pip install flaml[notebook]

Extra learners

  • catboost
pip install flaml[catboost]
  • vowpal wabbit
pip install flaml[vw]
  • time series forecaster: prophet, statsmodels
pip install flaml[forecast]
  • natural language processing: transformers
pip install flaml[nlp]

Distributed tuning

  • ray
pip install flaml[ray]
  • spark

Spark support is added in v1.1.0

pip install flaml[spark]>=1.1.0

For cloud platforms such as Azure Synapse, Spark clusters are provided. But you may also need to install Spark manually when setting up your own environment. For latest Ubuntu system, you can install Spark 3.3.0 standalone version with below script. For more details of installing Spark, please refer to Spark Doc.

sudo apt-get update && sudo apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
    ca-certificates-java ca-certificates openjdk-17-jdk-headless \
    && sudo apt-get clean && sudo rm -rf /var/lib/apt/lists/*
wget --progress=dot:giga "https://www.apache.org/dyn/closer.lua/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz?action=download" \
    -O - | tar -xzC /tmp; archive=$(basename "spark-3.3.0/spark-3.3.0-bin-hadoop2.tgz") \
    bash -c "sudo mv -v /tmp/\${archive/%.tgz/} /spark"
export SPARK_HOME=/spark
export PYTHONPATH=/spark/python/lib/py4j-0.10.9.5-src.zip:/spark/python
export PATH=$PATH:$SPARK_HOME/bin
  • nni
pip install flaml[nni]
  • blendsearch
pip install flaml[blendsearch]

Test and Benchmark

  • test
pip install flaml[test]
  • benchmark
pip install flaml[benchmark]

.NET

FLAML has a .NET implementation in ML.NET, an open-source, cross-platform machine learning framework for .NET.

You can use FLAML in .NET in the following ways:

Low-code

  • Model Builder - A Visual Studio extension for training ML models using FLAML. For more information on how to install the, see the install Model Builder guide.
  • ML.NET CLI - A dotnet CLI tool for training machine learning models using FLAML on Windows, MacOS, and Linux. For more information on how to install the ML.NET CLI, see the install the ML.NET CLI guide.

Code-first

  • Microsoft.ML.AutoML - NuGet package that provides direct access to the FLAML AutoML APIs that power low-code solutions like Model Builder and the ML.NET CLI. For more information on installing NuGet packages, see the install and use a NuGet package in Visual Studio or dotnet CLI guides.

To get started with the ML.NET API and AutoML, see the csharp-notebooks.