autogen/notebook/agentchat_groupchat_RAG.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Group Chat with Retrieval Augmented Generation\n",
    "\n",
    "AutoGen supports conversable agents powered by LLMs, tools, or humans, performing tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.\n",
    "Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).\n",
    "\n",
    "````{=mdx}\n",
    ":::info Requirements\n",
    "Some extra dependencies are needed for this notebook, which can be installed via pip:\n",
    "\n",
    "```bash\n",
    "pip install pyautogen[retrievechat]\n",
    "```\n",
    "\n",
    "For more information, please refer to the [installation guide](/docs/installation/).\n",
    ":::\n",
    "````"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set your API Endpoint\n",
    "\n",
    "The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "LLM models:  ['gpt4-1106-preview', 'gpt-35-turbo', 'gpt-35-turbo-0613']\n"
     ]
    }
   ],
   "source": [
    "import chromadb\n",
    "from typing_extensions import Annotated\n",
    "\n",
    "import autogen\n",
    "from autogen import AssistantAgent\n",
    "from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent\n",
    "\n",
    "config_list = autogen.config_list_from_json(\"OAI_CONFIG_LIST\")\n",
    "\n",
    "print(\"LLM models: \", [config_list[i][\"model\"] for i in range(len(config_list))])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "````{=mdx}\n",
    ":::tip\n",
    "Learn more about configuring LLMs for agents [here](/docs/topics/llm_configuration).\n",
    ":::\n",
    "````\n",
    "\n",
    "## Construct Agents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/lijiang1/anaconda3/envs/autogen/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
      "  torch.utils._pytree._register_pytree_node(\n"
     ]
    }
   ],
   "source": [
    "def termination_msg(x):\n",
    "    return isinstance(x, dict) and \"TERMINATE\" == str(x.get(\"content\", \"\"))[-9:].upper()\n",
    "\n",
    "\n",
    "llm_config = {\"config_list\": config_list, \"timeout\": 60, \"temperature\": 0.8, \"seed\": 1234}\n",
    "\n",
    "boss = autogen.UserProxyAgent(\n",
    "    name=\"Boss\",\n",
    "    is_termination_msg=termination_msg,\n",
    "    human_input_mode=\"NEVER\",\n",
    "    code_execution_config=False,  # we don't want to execute code in this case.\n",
    "    default_auto_reply=\"Reply `TERMINATE` if the task is done.\",\n",
    "    description=\"The boss who ask questions and give tasks.\",\n",
    ")\n",
    "\n",
    "boss_aid = RetrieveUserProxyAgent(\n",
    "    name=\"Boss_Assistant\",\n",
    "    is_termination_msg=termination_msg,\n",
    "    human_input_mode=\"NEVER\",\n",
    "    default_auto_reply=\"Reply `TERMINATE` if the task is done.\",\n",
    "    max_consecutive_auto_reply=3,\n",
    "    retrieve_config={\n",
    "        \"task\": \"code\",\n",
    "        \"docs_path\": \"https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md\",\n",
    "        \"chunk_token_size\": 1000,\n",
    "        \"model\": config_list[0][\"model\"],\n",
    "        \"collection_name\": \"groupchat\",\n",
    "        \"get_or_create\": True,\n",
    "    },\n",
    "    code_execution_config=False,  # we don't want to execute code in this case.\n",
    "    description=\"Assistant who has extra content retrieval power for solving difficult problems.\",\n",
    ")\n",
    "\n",
    "coder = AssistantAgent(\n",
    "    name=\"Senior_Python_Engineer\",\n",
    "    is_termination_msg=termination_msg,\n",
    "    system_message=\"You are a senior python engineer, you provide python code to answer questions. Reply `TERMINATE` in the end when everything is done.\",\n",
    "    llm_config=llm_config,\n",
    "    description=\"Senior Python Engineer who can write code to solve problems and answer questions.\",\n",
    ")\n",
    "\n",
    "pm = autogen.AssistantAgent(\n",
    "    name=\"Product_Manager\",\n",
    "    is_termination_msg=termination_msg,\n",
    "    system_message=\"You are a product manager. Reply `TERMINATE` in the end when everything is done.\",\n",
    "    llm_config=llm_config,\n",
    "    description=\"Product Manager who can design and plan the project.\",\n",
    ")\n",
    "\n",
    "reviewer = autogen.AssistantAgent(\n",
    "    name=\"Code_Reviewer\",\n",
    "    is_termination_msg=termination_msg,\n",
    "    system_message=\"You are a code reviewer. Reply `TERMINATE` in the end when everything is done.\",\n",
    "    llm_config=llm_config,\n",
    "    description=\"Code Reviewer who can review the code.\",\n",
    ")\n",
    "\n",
    "PROBLEM = \"How to use spark for parallel training in FLAML? Give me sample code.\"\n",
    "\n",
    "\n",
    "def _reset_agents():\n",
    "    boss.reset()\n",
    "    boss_aid.reset()\n",
    "    coder.reset()\n",
    "    pm.reset()\n",
    "    reviewer.reset()\n",
    "\n",
    "\n",
    "def rag_chat():\n",
    "    _reset_agents()\n",
    "    groupchat = autogen.GroupChat(\n",
    "        agents=[boss_aid, pm, coder, reviewer], messages=[], max_round=12, speaker_selection_method=\"round_robin\"\n",
    "    )\n",
    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
    "\n",
    "    # Start chatting with boss_aid as this is the user proxy agent.\n",
    "    boss_aid.initiate_chat(\n",
    "        manager,\n",
    "        message=boss_aid.message_generator,\n",
    "        problem=PROBLEM,\n",
    "        n_results=3,\n",
    "    )\n",
    "\n",
    "\n",
    "def norag_chat():\n",
    "    _reset_agents()\n",
    "    groupchat = autogen.GroupChat(\n",
    "        agents=[boss, pm, coder, reviewer],\n",
    "        messages=[],\n",
    "        max_round=12,\n",
    "        speaker_selection_method=\"auto\",\n",
    "        allow_repeat_speaker=False,\n",
    "    )\n",
    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
    "\n",
    "    # Start chatting with the boss as this is the user proxy agent.\n",
    "    boss.initiate_chat(\n",
    "        manager,\n",
    "        message=PROBLEM,\n",
    "    )\n",
    "\n",
    "\n",
    "def call_rag_chat():\n",
    "    _reset_agents()\n",
    "\n",
    "    # In this case, we will have multiple user proxy agents and we don't initiate the chat\n",
    "    # with RAG user proxy agent.\n",
    "    # In order to use RAG user proxy agent, we need to wrap RAG agents in a function and call\n",
    "    # it from other agents.\n",
    "    def retrieve_content(\n",
    "        message: Annotated[\n",
    "            str,\n",
    "            \"Refined message which keeps the original meaning and can be used to retrieve content for code generation and question answering.\",\n",
    "        ],\n",
    "        n_results: Annotated[int, \"number of results\"] = 3,\n",
    "    ) -> str:\n",
    "        boss_aid.n_results = n_results  # Set the number of results to be retrieved.\n",
    "        # Check if we need to update the context.\n",
    "        update_context_case1, update_context_case2 = boss_aid._check_update_context(message)\n",
    "        if (update_context_case1 or update_context_case2) and boss_aid.update_context:\n",
    "            boss_aid.problem = message if not hasattr(boss_aid, \"problem\") else boss_aid.problem\n",
    "            _, ret_msg = boss_aid._generate_retrieve_user_reply(message)\n",
    "        else:\n",
    "            _context = {\"problem\": message, \"n_results\": n_results}\n",
    "            ret_msg = boss_aid.message_generator(boss_aid, None, _context)\n",
    "        return ret_msg if ret_msg else message\n",
    "\n",
    "    boss_aid.human_input_mode = \"NEVER\"  # Disable human input for boss_aid since it only retrieves content.\n",
    "\n",
    "    for caller in [pm, coder, reviewer]:\n",
    "        d_retrieve_content = caller.register_for_llm(\n",
    "            description=\"retrieve content for code generation and question answering.\", api_style=\"function\"\n",
    "        )(retrieve_content)\n",
    "\n",
    "    for executor in [boss, pm]:\n",
    "        executor.register_for_execution()(d_retrieve_content)\n",
    "\n",
    "    groupchat = autogen.GroupChat(\n",
    "        agents=[boss, pm, coder, reviewer],\n",
    "        messages=[],\n",
    "        max_round=12,\n",
    "        speaker_selection_method=\"round_robin\",\n",
    "        allow_repeat_speaker=False,\n",
    "    )\n",
    "\n",
    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
    "\n",
    "    # Start chatting with the boss as this is the user proxy agent.\n",
    "    boss.initiate_chat(\n",
    "        manager,\n",
    "        message=PROBLEM,\n",
    "    )"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Start Chat\n",
    "\n",
    "### UserProxyAgent doesn't get the correct code\n",
    "[FLAML](https://github.com/microsoft/FLAML) was open sourced in 2020, so ChatGPT is familiar with it. However, Spark-related APIs were added in 2022, so they were not in ChatGPT's training data. As a result, we end up with invalid code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
      "\n",
      "How to use spark for parallel training in FLAML? Give me sample code.\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mSenior_Python_Engineer\u001b[0m (to chat_manager):\n",
      "\n",
      "To use Spark for parallel training in FLAML (Fast and Lightweight AutoML), you would need to set up a Spark cluster and utilize the `spark` backend for joblib, which FLAML uses internally for parallel training. Here’s an example of how you might set up and use Spark with FLAML for AutoML tasks:\n",
      "\n",
      "Firstly, ensure that you have the Spark cluster set up and the `pyspark` and `joblib-spark` packages installed in your environment. You can install the required packages using pip if they are not already installed:\n",
      "\n",
      "```python\n",
      "!pip install flaml pyspark joblib-spark\n",
      "```\n",
      "\n",
      "Here's a sample code snippet that demonstrates how to use FLAML with Spark for parallel training:\n",
      "\n",
      "```python\n",
      "from flaml import AutoML\n",
      "from pyspark.sql import SparkSession\n",
      "from sklearn.datasets import load_digits\n",
      "from joblibspark import register_spark\n",
      "\n",
      "# Initialize a Spark session\n",
      "spark = SparkSession.builder \\\n",
      "    .master(\"local[*]\") \\\n",
      "    .appName(\"FLAML_Spark_Example\") \\\n",
      "    .getOrCreate()\n",
      "\n",
      "# Register the joblib spark backend\n",
      "register_spark()  # This registers the backend for parallel processing\n",
      "\n",
      "# Load sample data\n",
      "X, y = load_digits(return_X_y=True)\n",
      "\n",
      "# Initialize an AutoML instance\n",
      "automl = AutoML()\n",
      "\n",
      "# Define the settings for the AutoML run\n",
      "settings = {\n",
      "    \"time_budget\": 60,  # Total running time in seconds\n",
      "    \"metric\": 'accuracy',  # Primary metric for evaluation\n",
      "    \"task\": 'classification',  # Task type\n",
      "    \"n_jobs\": -1,  # Number of jobs to run in parallel (use -1 for all)\n",
      "    \"estimator_list\": ['lgbm', 'rf', 'xgboost'],  # List of estimators to consider\n",
      "    \"log_file_name\": \"flaml_log.txt\",  # Log file name\n",
      "}\n",
      "\n",
      "# Run the AutoML search with Spark backend\n",
      "automl.fit(X_train=X, y_train=y, **settings)\n",
      "\n",
      "# Output the best model and its performance\n",
      "print(f\"Best ML model: {automl.model}\")\n",
      "print(f\"Best ML model's accuracy: {automl.best_loss}\")\n",
      "\n",
      "# Stop the Spark session\n",
      "spark.stop()\n",
      "```\n",
      "\n",
      "The `register_spark()` function from `joblib-spark` is used to register the Spark backend with joblib, which is utilized for parallel training within FLAML. The `n_jobs=-1` parameter tells FLAML to use all available Spark executors for parallel training.\n",
      "\n",
      "Please note that the actual process of setting up a Spark cluster can be complex and might involve additional steps such as configuring Spark workers, allocating resources, and more, which are beyond the scope of this code snippet.\n",
      "\n",
      "If you encounter any issues or need to adjust configurations for your specific Spark setup, please refer to the Spark and FLAML documentation for more details.\n",
      "\n",
      "When you run the code, ensure that your Spark cluster is properly configured and accessible from your Python environment. Adjust the `.master(\"local[*]\")` to point to your Spark master's URL if you are running a cluster that is not local.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "To use Spark for parallel training in FLAML (Fast and Lightweight AutoML), you would need to set up a Spark cluster and utilize the `spark` backend for joblib, which FLAML uses internally for parallel training. Here’s an example of how you might set up and use Spark with FLAML for AutoML tasks:\n",
      "\n",
      "Firstly, ensure that you have the Spark cluster set up and the `pyspark` and `joblib-spark` packages installed in your environment. You can install the required packages using pip if they are not already installed:\n",
      "\n",
      "```python\n",
      "!pip install flaml pyspark joblib-spark\n",
      "```\n",
      "\n",
      "Here's a sample code snippet that demonstrates how to use FLAML with Spark for parallel training:\n",
      "\n",
      "```python\n",
      "from flaml import AutoML\n",
      "from pyspark.sql import SparkSession\n",
      "from sklearn.datasets import load_digits\n",
      "from joblibspark import register_spark\n",
      "\n",
      "# Initialize a Spark session\n",
      "spark = SparkSession.builder \\\n",
      "    .master(\"local[*]\") \\\n",
      "    .appName(\"FLAML_Spark_Example\") \\\n",
      "    .getOrCreate()\n",
      "\n",
      "# Register the joblib spark backend\n",
      "register_spark()  # This registers the backend for parallel processing\n",
      "\n",
      "# Load sample data\n",
      "X, y = load_digits(return_X_y=True)\n",
      "\n",
      "# Initialize an AutoML instance\n",
      "automl = AutoML()\n",
      "\n",
      "# Define the settings for the AutoML run\n",
      "settings = {\n",
      "    \"time_budget\": 60,  # Total running time in seconds\n",
      "    \"metric\": 'accuracy',  # Primary metric for evaluation\n",
      "    \"task\": 'classification',  # Task type\n",
      "    \"n_jobs\": -1,  # Number of jobs to run in parallel (use -1 for all)\n",
      "    \"estimator_list\": ['lgbm', 'rf', 'xgboost'],  # List of estimators to consider\n",
      "    \"log_file_name\": \"flaml_log.txt\",  # Log file name\n",
      "}\n",
      "\n",
      "# Run the AutoML search with Spark backend\n",
      "automl.fit(X_train=X, y_train=y, **settings)\n",
      "\n",
      "# Output the best model and its performance\n",
      "print(f\"Best ML model: {automl.model}\")\n",
      "print(f\"Best ML model's accuracy: {automl.best_loss}\")\n",
      "\n",
      "# Stop the Spark session\n",
      "spark.stop()\n",
      "```\n",
      "\n",
      "The `register_spark()` function from `joblib-spark` is used to register the Spark backend with joblib, which is utilized for parallel training within FLAML. The `n_jobs=-1` parameter tells FLAML to use all available Spark executors for parallel training.\n",
      "\n",
      "Please note that the actual process of setting up a Spark cluster can be complex and might involve additional steps such as configuring Spark workers, allocating resources, and more, which are beyond the scope of this code snippet.\n",
      "\n",
      "If you encounter any issues or need to adjust configurations for your specific Spark setup, please refer to the Spark and FLAML documentation for more details.\n",
      "\n",
      "When you run the code, ensure that your Spark cluster is properly configured and accessible from your Python environment. Adjust the `.master(\"local[*]\")` to point to your Spark master's URL if you are running a cluster that is not local.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mCode_Reviewer\u001b[0m (to chat_manager):\n",
      "\n",
      "TERMINATE\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "norag_chat()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RetrieveUserProxyAgent get the correct code\n",
    "Since RetrieveUserProxyAgent can perform retrieval-augmented generation based on the given documentation file, ChatGPT can generate the correct code for us!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-04-07 18:26:04,562 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - \u001b[32mUse the existing collection `groupchat`.\u001b[0m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Trying to create collection.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-04-07 18:26:05,485 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 1 chunks.\u001b[0m\n",
      "Number of requested results 3 is greater than number of elements in index 1, updating n_results = 1\n",
      "Model gpt4-1106-preview not found. Using cl100k_base encoding.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VectorDB returns doc_ids:  [['bdfbc921']]\n",
      "\u001b[32mAdding content of doc bdfbc921 to context.\u001b[0m\n",
      "\u001b[33mBoss_Assistant\u001b[0m (to chat_manager):\n",
      "\n",
      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
      "context provided by the user.\n",
      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
      "For code generation, you must obey the following rules:\n",
      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
      "Rule 2. You must follow the formats below to write your code:\n",
      "```language\n",
      "# your code\n",
      "```\n",
      "\n",
      "User's question is: How to use spark for parallel training in FLAML? Give me sample code.\n",
      "\n",
      "Context is: # Integrate - Spark\n",
      "\n",
      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
      "\n",
      "- Use Spark ML estimators for AutoML.\n",
      "- Use Spark to run training in parallel spark jobs.\n",
      "\n",
      "## Spark ML Estimators\n",
      "\n",
      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
      "\n",
      "### Data\n",
      "\n",
      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
      "\n",
      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
      "\n",
      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
      "\n",
      "- `index_col` is the column name to use as the index, default is None.\n",
      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
      "\n",
      "Here is an example code snippet for Spark Data:\n",
      "\n",
      "```python\n",
      "import pandas as pd\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "\n",
      "# Creating a dictionary\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "\n",
      "# Creating a pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "label = \"Price\"\n",
      "\n",
      "# Convert to pandas-on-spark dataframe\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "```\n",
      "\n",
      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
      "\n",
      "Here is an example of how to use it:\n",
      "\n",
      "```python\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "\n",
      "columns = psdf.columns\n",
      "feature_cols = [col for col in columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
      "```\n",
      "\n",
      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
      "\n",
      "### Estimators\n",
      "\n",
      "#### Model List\n",
      "\n",
      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
      "\n",
      "#### Usage\n",
      "\n",
      "First, prepare your data in the required format as described in the previous section.\n",
      "\n",
      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
      "\n",
      "Here is an example code snippet using SparkML models in AutoML:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
      "\n",
      "automl = flaml.AutoML()\n",
      "settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
      "    \"task\": \"regression\",\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=psdf,\n",
      "    label=label,\n",
      "    **settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
      "\n",
      "## Parallel Spark Jobs\n",
      "\n",
      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
      "\n",
      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
      "\n",
      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
      "\n",
      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
      "\n",
      "An example code snippet for using parallel Spark jobs:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "automl_experiment = flaml.AutoML()\n",
      "automl_settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,\n",
      "    \"use_spark\": True,\n",
      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=dataframe,\n",
      "    label=label,\n",
      "    **automl_settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
      "\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mBoss_Assistant\u001b[0m (to chat_manager):\n",
      "\n",
      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
      "context provided by the user.\n",
      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
      "For code generation, you must obey the following rules:\n",
      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
      "Rule 2. You must follow the formats below to write your code:\n",
      "```language\n",
      "# your code\n",
      "```\n",
      "\n",
      "User's question is: How to use spark for parallel training in FLAML? Give me sample code.\n",
      "\n",
      "Context is: # Integrate - Spark\n",
      "\n",
      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
      "\n",
      "- Use Spark ML estimators for AutoML.\n",
      "- Use Spark to run training in parallel spark jobs.\n",
      "\n",
      "## Spark ML Estimators\n",
      "\n",
      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
      "\n",
      "### Data\n",
      "\n",
      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
      "\n",
      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
      "\n",
      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
      "\n",
      "- `index_col` is the column name to use as the index, default is None.\n",
      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
      "\n",
      "Here is an example code snippet for Spark Data:\n",
      "\n",
      "```python\n",
      "import pandas as pd\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "\n",
      "# Creating a dictionary\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "\n",
      "# Creating a pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "label = \"Price\"\n",
      "\n",
      "# Convert to pandas-on-spark dataframe\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "```\n",
      "\n",
      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
      "\n",
      "Here is an example of how to use it:\n",
      "\n",
      "```python\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "\n",
      "columns = psdf.columns\n",
      "feature_cols = [col for col in columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
      "```\n",
      "\n",
      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
      "\n",
      "### Estimators\n",
      "\n",
      "#### Model List\n",
      "\n",
      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
      "\n",
      "#### Usage\n",
      "\n",
      "First, prepare your data in the required format as described in the previous section.\n",
      "\n",
      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
      "\n",
      "Here is an example code snippet using SparkML models in AutoML:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
      "\n",
      "automl = flaml.AutoML()\n",
      "settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
      "    \"task\": \"regression\",\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=psdf,\n",
      "    label=label,\n",
      "    **settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
      "\n",
      "## Parallel Spark Jobs\n",
      "\n",
      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
      "\n",
      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
      "\n",
      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
      "\n",
      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
      "\n",
      "An example code snippet for using parallel Spark jobs:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "automl_experiment = flaml.AutoML()\n",
      "automl_settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,\n",
      "    \"use_spark\": True,\n",
      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=dataframe,\n",
      "    label=label,\n",
      "    **automl_settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
      "\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
      "\n",
      "```python\n",
      "from flaml.automl import AutoML\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "import pandas as pd\n",
      "\n",
      "# Sample data in a dictionary\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "\n",
      "# Convert dictionary to pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "label = \"Price\"\n",
      "\n",
      "# Convert pandas DataFrame to pandas-on-spark DataFrame\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "\n",
      "# Use VectorAssembler to merge feature columns into a single vector column\n",
      "feature_cols = [col for col in psdf.columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\", label]\n",
      "\n",
      "# Initialize AutoML instance\n",
      "automl = AutoML()\n",
      "\n",
      "# AutoML settings\n",
      "automl_settings = {\n",
      "    \"time_budget\": 30,  # Total running time in seconds\n",
      "    \"metric\": \"r2\",     # Evaluation metric\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,   # Number of concurrent Spark jobs\n",
      "    \"use_spark\": True,          # Enable Spark for parallel training\n",
      "    \"force_cancel\": True,       # Force cancel Spark jobs if they exceed the time budget\n",
      "    \"estimator_list\": [\"lgbm_spark\"]  # Optional: Specific estimator to use\n",
      "}\n",
      "\n",
      "# Run AutoML fit with pandas-on-spark dataframe\n",
      "automl.fit(\n",
      "    dataframe=psdf,\n",
      "    label=label,\n",
      "    **automl_settings,\n",
      ")\n",
      "```\n",
      "TERMINATE\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "rag_chat()\n",
    "# type exit to terminate the chat"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Call RetrieveUserProxyAgent while init chat with another user proxy agent\n",
    "Sometimes, there might be a need to use RetrieveUserProxyAgent in group chat without initializing the chat with it. In such scenarios, it becomes essential to create a function that wraps the RAG agents and allows them to be called from other agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
      "\n",
      "How to use spark for parallel training in FLAML? Give me sample code.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
      "\n",
      "\u001b[32m***** Suggested function call: retrieve_content *****\u001b[0m\n",
      "Arguments: \n",
      "{\"message\":\"using Apache Spark for parallel training in FLAML with sample code\"}\n",
      "\u001b[32m*****************************************************\u001b[0m\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[35m\n",
      ">>>>>>>> EXECUTING FUNCTION retrieve_content...\u001b[0m\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Number of requested results 3 is greater than number of elements in index 1, updating n_results = 1\n",
      "Model gpt4-1106-preview not found. Using cl100k_base encoding.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "VectorDB returns doc_ids:  [['bdfbc921']]\n",
      "\u001b[32mAdding content of doc bdfbc921 to context.\u001b[0m\n",
      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
      "\n",
      "\u001b[32m***** Response from calling function (retrieve_content) *****\u001b[0m\n",
      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
      "context provided by the user.\n",
      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
      "For code generation, you must obey the following rules:\n",
      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
      "Rule 2. You must follow the formats below to write your code:\n",
      "```language\n",
      "# your code\n",
      "```\n",
      "\n",
      "User's question is: using Apache Spark for parallel training in FLAML with sample code\n",
      "\n",
      "Context is: # Integrate - Spark\n",
      "\n",
      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
      "\n",
      "- Use Spark ML estimators for AutoML.\n",
      "- Use Spark to run training in parallel spark jobs.\n",
      "\n",
      "## Spark ML Estimators\n",
      "\n",
      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
      "\n",
      "### Data\n",
      "\n",
      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
      "\n",
      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
      "\n",
      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
      "\n",
      "- `index_col` is the column name to use as the index, default is None.\n",
      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
      "\n",
      "Here is an example code snippet for Spark Data:\n",
      "\n",
      "```python\n",
      "import pandas as pd\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "\n",
      "# Creating a dictionary\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "\n",
      "# Creating a pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "label = \"Price\"\n",
      "\n",
      "# Convert to pandas-on-spark dataframe\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "```\n",
      "\n",
      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
      "\n",
      "Here is an example of how to use it:\n",
      "\n",
      "```python\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "\n",
      "columns = psdf.columns\n",
      "feature_cols = [col for col in columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
      "```\n",
      "\n",
      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
      "\n",
      "### Estimators\n",
      "\n",
      "#### Model List\n",
      "\n",
      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
      "\n",
      "#### Usage\n",
      "\n",
      "First, prepare your data in the required format as described in the previous section.\n",
      "\n",
      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
      "\n",
      "Here is an example code snippet using SparkML models in AutoML:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
      "\n",
      "automl = flaml.AutoML()\n",
      "settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
      "    \"task\": \"regression\",\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=psdf,\n",
      "    label=label,\n",
      "    **settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
      "\n",
      "## Parallel Spark Jobs\n",
      "\n",
      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
      "\n",
      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
      "\n",
      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
      "\n",
      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
      "\n",
      "An example code snippet for using parallel Spark jobs:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "automl_experiment = flaml.AutoML()\n",
      "automl_settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,\n",
      "    \"use_spark\": True,\n",
      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=dataframe,\n",
      "    label=label,\n",
      "    **automl_settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
      "\n",
      "\n",
      "\u001b[32m*************************************************************\u001b[0m\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
      "\n",
      "\u001b[32m***** Response from calling function (retrieve_content) *****\u001b[0m\n",
      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
      "context provided by the user.\n",
      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
      "For code generation, you must obey the following rules:\n",
      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
      "Rule 2. You must follow the formats below to write your code:\n",
      "```language\n",
      "# your code\n",
      "```\n",
      "\n",
      "User's question is: using Apache Spark for parallel training in FLAML with sample code\n",
      "\n",
      "Context is: # Integrate - Spark\n",
      "\n",
      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
      "\n",
      "- Use Spark ML estimators for AutoML.\n",
      "- Use Spark to run training in parallel spark jobs.\n",
      "\n",
      "## Spark ML Estimators\n",
      "\n",
      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
      "\n",
      "### Data\n",
      "\n",
      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
      "\n",
      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
      "\n",
      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
      "\n",
      "- `index_col` is the column name to use as the index, default is None.\n",
      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
      "\n",
      "Here is an example code snippet for Spark Data:\n",
      "\n",
      "```python\n",
      "import pandas as pd\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "\n",
      "# Creating a dictionary\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "\n",
      "# Creating a pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "label = \"Price\"\n",
      "\n",
      "# Convert to pandas-on-spark dataframe\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "```\n",
      "\n",
      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
      "\n",
      "Here is an example of how to use it:\n",
      "\n",
      "```python\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "\n",
      "columns = psdf.columns\n",
      "feature_cols = [col for col in columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
      "```\n",
      "\n",
      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
      "\n",
      "### Estimators\n",
      "\n",
      "#### Model List\n",
      "\n",
      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
      "\n",
      "#### Usage\n",
      "\n",
      "First, prepare your data in the required format as described in the previous section.\n",
      "\n",
      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
      "\n",
      "Here is an example code snippet using SparkML models in AutoML:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
      "\n",
      "automl = flaml.AutoML()\n",
      "settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
      "    \"task\": \"regression\",\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=psdf,\n",
      "    label=label,\n",
      "    **settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
      "\n",
      "## Parallel Spark Jobs\n",
      "\n",
      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
      "\n",
      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
      "\n",
      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
      "\n",
      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
      "\n",
      "An example code snippet for using parallel Spark jobs:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "\n",
      "automl_experiment = flaml.AutoML()\n",
      "automl_settings = {\n",
      "    \"time_budget\": 30,\n",
      "    \"metric\": \"r2\",\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,\n",
      "    \"use_spark\": True,\n",
      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
      "}\n",
      "\n",
      "automl.fit(\n",
      "    dataframe=dataframe,\n",
      "    label=label,\n",
      "    **automl_settings,\n",
      ")\n",
      "```\n",
      "\n",
      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
      "\n",
      "\n",
      "\u001b[32m*************************************************************\u001b[0m\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
      "\n",
      "To use Apache Spark for parallel training in FLAML, you can follow these steps:\n",
      "\n",
      "1. Ensure your data is in the required pandas-on-spark format.\n",
      "2. Use Spark ML estimators by including them in the `estimator_list`.\n",
      "3. Set `use_spark` to `True` for parallel tuning.\n",
      "\n",
      "Here's a sample code demonstrating how to use Spark for parallel training in FLAML:\n",
      "\n",
      "```python\n",
      "import flaml\n",
      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
      "import pandas as pd\n",
      "from pyspark.ml.feature import VectorAssembler\n",
      "\n",
      "# Sample data in a pandas DataFrame\n",
      "data = {\n",
      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
      "}\n",
      "label = \"Price\"\n",
      "\n",
      "# Creating a pandas DataFrame\n",
      "dataframe = pd.DataFrame(data)\n",
      "\n",
      "# Convert to pandas-on-spark dataframe\n",
      "psdf = to_pandas_on_spark(dataframe)\n",
      "\n",
      "# Prepare features using VectorAssembler\n",
      "columns = psdf.columns\n",
      "feature_cols = [col for col in columns if col != label]\n",
      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
      "\n",
      "# Initialize AutoML\n",
      "automl = flaml.AutoML()\n",
      "\n",
      "# Configure settings for AutoML\n",
      "settings = {\n",
      "    \"time_budget\": 30,  # time budget in seconds\n",
      "    \"metric\": \"r2\",\n",
      "    \"estimator_list\": [\"lgbm_spark\"],  # using Spark ML estimators\n",
      "    \"task\": \"regression\",\n",
      "    \"n_concurrent_trials\": 2,  # number of parallel trials\n",
      "    \"use_spark\": True,  # enable parallel training using Spark\n",
      "    \"force_cancel\": True,  # force cancel Spark jobs if time_budget is exceeded\n",
      "}\n",
      "\n",
      "# Start the training\n",
      "automl.fit(dataframe=psdf, label=label, **settings)\n",
      "```\n",
      "\n",
      "In this code snippet:\n",
      "- The `to_pandas_on_spark` function is used to convert the pandas DataFrame to a pandas-on-spark DataFrame.\n",
      "- `VectorAssembler` is used to transform feature columns into a single vector column.\n",
      "- The `AutoML` object is created, and settings are configured for the AutoML run, including setting `use_spark` to `True` for parallel training.\n",
      "- The `fit` method is called to start the automated machine learning process.\n",
      "\n",
      "By using these settings, FLAML will train the models in parallel using Spark, which can accelerate the training process on large models and datasets.\n",
      "\n",
      "TERMINATE\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "call_rag_chat()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "front_matter": {
   "description": "Implement and manage a multi-agent chat system using AutoGen, where AI assistants retrieve information, generate code, and interact collaboratively to solve complex tasks, especially in areas not covered by their training data.",
   "tags": [
    "group chat",
    "orchestration",
    "RAG"
   ]
  },
  "kernelspec": {
   "display_name": "flaml",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								{
 								 "cells": [
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    "# Group Chat with Retrieval Augmented Generation\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
-												Minor grammar and wording issues (#854)

* wording

* grammar and wording

* readability suggested by shruti222patel

* period
											
										
										
											2023-12-03 21:57:06 -05:00
+								    "AutoGen supports conversable agents powered by LLMs, tools, or humans, performing tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).\n",
 								    "\n",
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    "````{=mdx}\n",
 								    ":::info Requirements\n",
 								    "Some extra dependencies are needed for this notebook, which can be installed via pip:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
 								    "```bash\n",
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    "pip install pyautogen[retrievechat]\n",
 								    "```\n",
 								    "\n",
 								    "For more information, please refer to the [installation guide](/docs/installation/).\n",
 								    ":::\n",
 								    "````"
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Set your API Endpoint\n",
 								    "\n",
 								    "The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								   "execution_count": 1,
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "metadata": {},
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "LLM models:  ['gpt4-1106-preview', 'gpt-35-turbo', 'gpt-35-turbo-0613']\n"
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								     ]
 								    }
 								   ],
 								   "source": [
-												nbqa adedd to pre-commit, added black and ruff for notebooks (#1171)

* nbqa adedd to pre-commit, added black and ruff for notebooks

* polishing

* polishing

* polishing
											
										
										
											2024-01-08 04:47:01 +01:00
+								    "import chromadb\n",
-												Add isort (#2265)

* Add isort

* Apply isort on py files

* Fix circular import

* Fix format for notebooks

* Fix format

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-05 10:26:06 +08:00
+								    "from typing_extensions import Annotated\n",
-												nbqa adedd to pre-commit, added black and ruff for notebooks (#1171)

* nbqa adedd to pre-commit, added black and ruff for notebooks

* polishing

* polishing

* polishing
											
										
										
											2024-01-08 04:47:01 +01:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "import autogen\n",
-												nbqa adedd to pre-commit, added black and ruff for notebooks (#1171)

* nbqa adedd to pre-commit, added black and ruff for notebooks

* polishing

* polishing

* polishing
											
										
										
											2024-01-08 04:47:01 +01:00
+								    "from autogen import AssistantAgent\n",
 								    "from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    "config_list = autogen.config_list_from_json(\"OAI_CONFIG_LIST\")\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
 								    "print(\"LLM models: \", [config_list[i][\"model\"] for i in range(len(config_list))])"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    "````{=mdx}\n",
 								    ":::tip\n",
-												Create topics dir and move llm config (#1853)

* create topics dir and move llm config

* fix redirect

* fix link
											
										
										
											2024-03-04 13:02:26 -05:00
+								    "Learn more about configuring LLMs for agents [here](/docs/topics/llm_configuration).\n",
-												Update notebook contrib guidance, update a few notebooks for site (#1651)

* update some notebooks

* Update contributing.md

* remove os

---------

Co-authored-by: Eric Zhu <ekzhu@users.noreply.github.com>
											
										
										
											2024-02-14 12:00:55 -05:00
+								    ":::\n",
 								    "````\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
 								    "## Construct Agents"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								   "execution_count": 2,
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "metadata": {},
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								   "outputs": [
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "/home/lijiang1/anaconda3/envs/autogen/lib/python3.10/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
 								      "  torch.utils._pytree._register_pytree_node(\n"
 								     ]
 								    }
 								   ],
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "source": [
-												nbqa adedd to pre-commit, added black and ruff for notebooks (#1171)

* nbqa adedd to pre-commit, added black and ruff for notebooks

* polishing

* polishing

* polishing
											
										
										
											2024-01-08 04:47:01 +01:00
+								    "def termination_msg(x):\n",
 								    "    return isinstance(x, dict) and \"TERMINATE\" == str(x.get(\"content\", \"\"))[-9:].upper()\n",
 								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "llm_config = {\"config_list\": config_list, \"timeout\": 60, \"temperature\": 0.8, \"seed\": 1234}\n",
 								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "boss = autogen.UserProxyAgent(\n",
 								    "    name=\"Boss\",\n",
 								    "    is_termination_msg=termination_msg,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "    human_input_mode=\"NEVER\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    code_execution_config=False,  # we don't want to execute code in this case.\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "    default_auto_reply=\"Reply `TERMINATE` if the task is done.\",\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    description=\"The boss who ask questions and give tasks.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    ")\n",
 								    "\n",
 								    "boss_aid = RetrieveUserProxyAgent(\n",
 								    "    name=\"Boss_Assistant\",\n",
 								    "    is_termination_msg=termination_msg,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "    human_input_mode=\"NEVER\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    default_auto_reply=\"Reply `TERMINATE` if the task is done.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    max_consecutive_auto_reply=3,\n",
 								    "    retrieve_config={\n",
 								    "        \"task\": \"code\",\n",
 								    "        \"docs_path\": \"https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md\",\n",
 								    "        \"chunk_token_size\": 1000,\n",
 								    "        \"model\": config_list[0][\"model\"],\n",
 								    "        \"collection_name\": \"groupchat\",\n",
 								    "        \"get_or_create\": True,\n",
 								    "    },\n",
 								    "    code_execution_config=False,  # we don't want to execute code in this case.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    description=\"Assistant who has extra content retrieval power for solving difficult problems.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    ")\n",
 								    "\n",
 								    "coder = AssistantAgent(\n",
 								    "    name=\"Senior_Python_Engineer\",\n",
 								    "    is_termination_msg=termination_msg,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    system_message=\"You are a senior python engineer, you provide python code to answer questions. Reply `TERMINATE` in the end when everything is done.\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    llm_config=llm_config,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    description=\"Senior Python Engineer who can write code to solve problems and answer questions.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    ")\n",
 								    "\n",
 								    "pm = autogen.AssistantAgent(\n",
 								    "    name=\"Product_Manager\",\n",
 								    "    is_termination_msg=termination_msg,\n",
 								    "    system_message=\"You are a product manager. Reply `TERMINATE` in the end when everything is done.\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    llm_config=llm_config,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    description=\"Product Manager who can design and plan the project.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    ")\n",
 								    "\n",
 								    "reviewer = autogen.AssistantAgent(\n",
 								    "    name=\"Code_Reviewer\",\n",
 								    "    is_termination_msg=termination_msg,\n",
 								    "    system_message=\"You are a code reviewer. Reply `TERMINATE` in the end when everything is done.\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    llm_config=llm_config,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    description=\"Code Reviewer who can review the code.\",\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    ")\n",
 								    "\n",
 								    "PROBLEM = \"How to use spark for parallel training in FLAML? Give me sample code.\"\n",
 								    "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "def _reset_agents():\n",
 								    "    boss.reset()\n",
 								    "    boss_aid.reset()\n",
 								    "    coder.reset()\n",
 								    "    pm.reset()\n",
 								    "    reviewer.reset()\n",
 								    "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "def rag_chat():\n",
 								    "    _reset_agents()\n",
 								    "    groupchat = autogen.GroupChat(\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "        agents=[boss_aid, pm, coder, reviewer], messages=[], max_round=12, speaker_selection_method=\"round_robin\"\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    )\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
 								    "    # Start chatting with boss_aid as this is the user proxy agent.\n",
 								    "    boss_aid.initiate_chat(\n",
 								    "        manager,\n",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								    "        message=boss_aid.message_generator,\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "        problem=PROBLEM,\n",
 								    "        n_results=3,\n",
 								    "    )\n",
 								    "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "def norag_chat():\n",
 								    "    _reset_agents()\n",
 								    "    groupchat = autogen.GroupChat(\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "        agents=[boss, pm, coder, reviewer],\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "        messages=[],\n",
 								    "        max_round=12,\n",
 								    "        speaker_selection_method=\"auto\",\n",
 								    "        allow_repeat_speaker=False,\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    )\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
-												Minor grammar and wording issues (#854)

* wording

* grammar and wording

* readability suggested by shruti222patel

* period
											
										
										
											2023-12-03 21:57:06 -05:00
+								    "    # Start chatting with the boss as this is the user proxy agent.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    boss.initiate_chat(\n",
 								    "        manager,\n",
 								    "        message=PROBLEM,\n",
 								    "    )\n",
 								    "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "def call_rag_chat():\n",
 								    "    _reset_agents()\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    # In this case, we will have multiple user proxy agents and we don't initiate the chat\n",
 								    "    # with RAG user proxy agent.\n",
 								    "    # In order to use RAG user proxy agent, we need to wrap RAG agents in a function and call\n",
 								    "    # it from other agents.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    def retrieve_content(\n",
 								    "        message: Annotated[\n",
 								    "            str,\n",
 								    "            \"Refined message which keeps the original meaning and can be used to retrieve content for code generation and question answering.\",\n",
 								    "        ],\n",
 								    "        n_results: Annotated[int, \"number of results\"] = 3,\n",
 								    "    ) -> str:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "        boss_aid.n_results = n_results  # Set the number of results to be retrieved.\n",
 								    "        # Check if we need to update the context.\n",
 								    "        update_context_case1, update_context_case2 = boss_aid._check_update_context(message)\n",
 								    "        if (update_context_case1 or update_context_case2) and boss_aid.update_context:\n",
 								    "            boss_aid.problem = message if not hasattr(boss_aid, \"problem\") else boss_aid.problem\n",
 								    "            _, ret_msg = boss_aid._generate_retrieve_user_reply(message)\n",
 								    "        else:\n",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								    "            _context = {\"problem\": message, \"n_results\": n_results}\n",
 								    "            ret_msg = boss_aid.message_generator(boss_aid, None, _context)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "        return ret_msg if ret_msg else message\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "\n",
 								    "    boss_aid.human_input_mode = \"NEVER\"  # Disable human input for boss_aid since it only retrieves content.\n",
 								    "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "    for caller in [pm, coder, reviewer]:\n",
 								    "        d_retrieve_content = caller.register_for_llm(\n",
 								    "            description=\"retrieve content for code generation and question answering.\", api_style=\"function\"\n",
 								    "        )(retrieve_content)\n",
 								    "\n",
 								    "    for executor in [boss, pm]:\n",
 								    "        executor.register_for_execution()(d_retrieve_content)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
 								    "    groupchat = autogen.GroupChat(\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "        agents=[boss, pm, coder, reviewer],\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "        messages=[],\n",
 								    "        max_round=12,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								    "        speaker_selection_method=\"round_robin\",\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								    "        allow_repeat_speaker=False,\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    )\n",
-												Raise error when function as llm_config passed to GroupChatManager (#911)

* fix groupchat selection

* update

* update notbooks

* update

* update
											
										
										
											2023-12-09 20:33:46 -05:00
+								    "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    "    manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "\n",
-												Minor grammar and wording issues (#854)

* wording

* grammar and wording

* readability suggested by shruti222patel

* period
											
										
										
											2023-12-03 21:57:06 -05:00
+								    "    # Start chatting with the boss as this is the user proxy agent.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    "    boss.initiate_chat(\n",
 								    "        manager,\n",
 								    "        message=PROBLEM,\n",
 								    "    )"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "## Start Chat\n",
 								    "\n",
 								    "### UserProxyAgent doesn't get the correct code\n",
 								    "[FLAML](https://github.com/microsoft/FLAML) was open sourced in 2020, so ChatGPT is familiar with it. However, Spark-related APIs were added in 2022, so they were not in ChatGPT's training data. As a result, we end up with invalid code."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								   "execution_count": 3,
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "metadata": {},
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
 								      "\n",
 								      "How to use spark for parallel training in FLAML? Give me sample code.\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "--------------------------------------------------------------------------------\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\u001b[33mSenior_Python_Engineer\u001b[0m (to chat_manager):\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "To use Spark for parallel training in FLAML (Fast and Lightweight AutoML), you would need to set up a Spark cluster and utilize the `spark` backend for joblib, which FLAML uses internally for parallel training. Here’s an example of how you might set up and use Spark with FLAML for AutoML tasks:\n",
 								      "\n",
 								      "Firstly, ensure that you have the Spark cluster set up and the `pyspark` and `joblib-spark` packages installed in your environment. You can install the required packages using pip if they are not already installed:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
 								      "```python\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "!pip install flaml pyspark joblib-spark\n",
 								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Here's a sample code snippet that demonstrates how to use FLAML with Spark for parallel training:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "```python\n",
 								      "from flaml import AutoML\n",
 								      "from pyspark.sql import SparkSession\n",
 								      "from sklearn.datasets import load_digits\n",
 								      "from joblibspark import register_spark\n",
 								      "\n",
 								      "# Initialize a Spark session\n",
 								      "spark = SparkSession.builder \\\n",
 								      "    .master(\"local[*]\") \\\n",
 								      "    .appName(\"FLAML_Spark_Example\") \\\n",
 								      "    .getOrCreate()\n",
 								      "\n",
 								      "# Register the joblib spark backend\n",
 								      "register_spark()  # This registers the backend for parallel processing\n",
 								      "\n",
 								      "# Load sample data\n",
 								      "X, y = load_digits(return_X_y=True)\n",
 								      "\n",
 								      "# Initialize an AutoML instance\n",
 								      "automl = AutoML()\n",
 								      "\n",
 								      "# Define the settings for the AutoML run\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 60,  # Total running time in seconds\n",
 								      "    \"metric\": 'accuracy',  # Primary metric for evaluation\n",
 								      "    \"task\": 'classification',  # Task type\n",
 								      "    \"n_jobs\": -1,  # Number of jobs to run in parallel (use -1 for all)\n",
 								      "    \"estimator_list\": ['lgbm', 'rf', 'xgboost'],  # List of estimators to consider\n",
 								      "    \"log_file_name\": \"flaml_log.txt\",  # Log file name\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "}\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Run the AutoML search with Spark backend\n",
 								      "automl.fit(X_train=X, y_train=y, **settings)\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Output the best model and its performance\n",
 								      "print(f\"Best ML model: {automl.model}\")\n",
 								      "print(f\"Best ML model's accuracy: {automl.best_loss}\")\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Stop the Spark session\n",
 								      "spark.stop()\n",
 								      "```\n",
 								      "\n",
 								      "The `register_spark()` function from `joblib-spark` is used to register the Spark backend with joblib, which is utilized for parallel training within FLAML. The `n_jobs=-1` parameter tells FLAML to use all available Spark executors for parallel training.\n",
 								      "\n",
 								      "Please note that the actual process of setting up a Spark cluster can be complex and might involve additional steps such as configuring Spark workers, allocating resources, and more, which are beyond the scope of this code snippet.\n",
 								      "\n",
 								      "If you encounter any issues or need to adjust configurations for your specific Spark setup, please refer to the Spark and FLAML documentation for more details.\n",
 								      "\n",
 								      "When you run the code, ensure that your Spark cluster is properly configured and accessible from your Python environment. Adjust the `.master(\"local[*]\")` to point to your Spark master's URL if you are running a cluster that is not local.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "--------------------------------------------------------------------------------\n",
 								      "To use Spark for parallel training in FLAML (Fast and Lightweight AutoML), you would need to set up a Spark cluster and utilize the `spark` backend for joblib, which FLAML uses internally for parallel training. Here’s an example of how you might set up and use Spark with FLAML for AutoML tasks:\n",
 								      "\n",
 								      "Firstly, ensure that you have the Spark cluster set up and the `pyspark` and `joblib-spark` packages installed in your environment. You can install the required packages using pip if they are not already installed:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "```python\n",
 								      "!pip install flaml pyspark joblib-spark\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Here's a sample code snippet that demonstrates how to use FLAML with Spark for parallel training:\n",
 								      "\n",
 								      "```python\n",
 								      "from flaml import AutoML\n",
 								      "from pyspark.sql import SparkSession\n",
 								      "from sklearn.datasets import load_digits\n",
 								      "from joblibspark import register_spark\n",
 								      "\n",
 								      "# Initialize a Spark session\n",
 								      "spark = SparkSession.builder \\\n",
 								      "    .master(\"local[*]\") \\\n",
 								      "    .appName(\"FLAML_Spark_Example\") \\\n",
 								      "    .getOrCreate()\n",
 								      "\n",
 								      "# Register the joblib spark backend\n",
 								      "register_spark()  # This registers the backend for parallel processing\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Load sample data\n",
 								      "X, y = load_digits(return_X_y=True)\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Initialize an AutoML instance\n",
 								      "automl = AutoML()\n",
 								      "\n",
 								      "# Define the settings for the AutoML run\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 60,  # Total running time in seconds\n",
 								      "    \"metric\": 'accuracy',  # Primary metric for evaluation\n",
 								      "    \"task\": 'classification',  # Task type\n",
 								      "    \"n_jobs\": -1,  # Number of jobs to run in parallel (use -1 for all)\n",
 								      "    \"estimator_list\": ['lgbm', 'rf', 'xgboost'],  # List of estimators to consider\n",
 								      "    \"log_file_name\": \"flaml_log.txt\",  # Log file name\n",
 								      "}\n",
 								      "\n",
 								      "# Run the AutoML search with Spark backend\n",
 								      "automl.fit(X_train=X, y_train=y, **settings)\n",
 								      "\n",
 								      "# Output the best model and its performance\n",
 								      "print(f\"Best ML model: {automl.model}\")\n",
 								      "print(f\"Best ML model's accuracy: {automl.best_loss}\")\n",
 								      "\n",
 								      "# Stop the Spark session\n",
 								      "spark.stop()\n",
 								      "```\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "The `register_spark()` function from `joblib-spark` is used to register the Spark backend with joblib, which is utilized for parallel training within FLAML. The `n_jobs=-1` parameter tells FLAML to use all available Spark executors for parallel training.\n",
 								      "\n",
 								      "Please note that the actual process of setting up a Spark cluster can be complex and might involve additional steps such as configuring Spark workers, allocating resources, and more, which are beyond the scope of this code snippet.\n",
 								      "\n",
 								      "If you encounter any issues or need to adjust configurations for your specific Spark setup, please refer to the Spark and FLAML documentation for more details.\n",
 								      "\n",
 								      "When you run the code, ensure that your Spark cluster is properly configured and accessible from your Python environment. Adjust the `.master(\"local[*]\")` to point to your Spark master's URL if you are running a cluster that is not local.\n",
 								      "\n",
 								      "--------------------------------------------------------------------------------\n",
 								      "\u001b[33mCode_Reviewer\u001b[0m (to chat_manager):\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "TERMINATE\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
 								      "--------------------------------------------------------------------------------\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "norag_chat()"
 								   ]
 								  },
 								  {
-												copy dicts before modifying (#551)

* copy dicts before modifying

* update notebooks

* update notebooks

* close #567
											
										
										
											2023-11-06 13:14:05 -08:00
+								   "attachments": {},
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### RetrieveUserProxyAgent get the correct code\n",
 								    "Since RetrieveUserProxyAgent can perform retrieval-augmented generation based on the given documentation file, ChatGPT can generate the correct code for us!"
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								   "execution_count": 4,
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								   "metadata": {},
 								   "outputs": [
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "2024-04-07 18:26:04,562 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - \u001b[32mUse the existing collection `groupchat`.\u001b[0m\n"
 								     ]
 								    },
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Trying to create collection.\n"
 								     ]
 								    },
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "2024-04-07 18:26:05,485 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 1 chunks.\u001b[0m\n",
 								      "Number of requested results 3 is greater than number of elements in index 1, updating n_results = 1\n",
 								      "Model gpt4-1106-preview not found. Using cl100k_base encoding.\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "VectorDB returns doc_ids:  [['bdfbc921']]\n",
 								      "\u001b[32mAdding content of doc bdfbc921 to context.\u001b[0m\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\u001b[33mBoss_Assistant\u001b[0m (to chat_manager):\n",
 								      "\n",
 								      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
 								      "context provided by the user.\n",
 								      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
 								      "For code generation, you must obey the following rules:\n",
 								      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
 								      "Rule 2. You must follow the formats below to write your code:\n",
 								      "```language\n",
 								      "# your code\n",
 								      "```\n",
 								      "\n",
 								      "User's question is: How to use spark for parallel training in FLAML? Give me sample code.\n",
 								      "\n",
 								      "Context is: # Integrate - Spark\n",
 								      "\n",
 								      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "- Use Spark ML estimators for AutoML.\n",
 								      "- Use Spark to run training in parallel spark jobs.\n",
 								      "\n",
 								      "## Spark ML Estimators\n",
 								      "\n",
 								      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
 								      "\n",
 								      "### Data\n",
 								      "\n",
 								      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
 								      "\n",
 								      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
 								      "\n",
 								      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "- `index_col` is the column name to use as the index, default is None.\n",
 								      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
 								      "\n",
 								      "Here is an example code snippet for Spark Data:\n",
 								      "\n",
 								      "```python\n",
 								      "import pandas as pd\n",
 								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "# Creating a dictionary\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
 								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
 								      "}\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
 								      "# Creating a pandas DataFrame\n",
 								      "dataframe = pd.DataFrame(data)\n",
 								      "label = \"Price\"\n",
 								      "\n",
 								      "# Convert to pandas-on-spark dataframe\n",
 								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "```\n",
 								      "\n",
 								      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
 								      "\n",
 								      "Here is an example of how to use it:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "```python\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "columns = psdf.columns\n",
 								      "feature_cols = [col for col in columns if col != label]\n",
 								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
 								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
 								      "```\n",
 								      "\n",
 								      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
 								      "\n",
 								      "### Estimators\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "#### Model List\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
 								      "\n",
 								      "#### Usage\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "First, prepare your data in the required format as described in the previous section.\n",
 								      "\n",
 								      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
 								      "\n",
 								      "Here is an example code snippet using SparkML models in AutoML:\n",
 								      "\n",
 								      "```python\n",
 								      "import flaml\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
 								      "\n",
 								      "automl = flaml.AutoML()\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
 								      "    \"task\": \"regression\",\n",
 								      "}\n",
 								      "\n",
 								      "automl.fit(\n",
 								      "    dataframe=psdf,\n",
 								      "    label=label,\n",
 								      "    **settings,\n",
 								      ")\n",
 								      "```\n",
 								      "\n",
 								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
 								      "\n",
 								      "## Parallel Spark Jobs\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
 								      "\n",
 								      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
 								      "\n",
 								      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
 								      "\n",
 								      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
 								      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
 								      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
 								      "\n",
 								      "An example code snippet for using parallel Spark jobs:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "```python\n",
 								      "import flaml\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "automl_experiment = flaml.AutoML()\n",
 								      "automl_settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"task\": \"regression\",\n",
 								      "    \"n_concurrent_trials\": 2,\n",
 								      "    \"use_spark\": True,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "}\n",
 								      "\n",
 								      "automl.fit(\n",
 								      "    dataframe=dataframe,\n",
 								      "    label=label,\n",
 								      "    **automl_settings,\n",
 								      ")\n",
 								      "```\n",
 								      "\n",
 								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "--------------------------------------------------------------------------------\n",
 								      "\u001b[33mBoss_Assistant\u001b[0m (to chat_manager):\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
 								      "context provided by the user.\n",
 								      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
 								      "For code generation, you must obey the following rules:\n",
 								      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
 								      "Rule 2. You must follow the formats below to write your code:\n",
 								      "```language\n",
 								      "# your code\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "User's question is: How to use spark for parallel training in FLAML? Give me sample code.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Context is: # Integrate - Spark\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "- Use Spark ML estimators for AutoML.\n",
 								      "- Use Spark to run training in parallel spark jobs.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "## Spark ML Estimators\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "### Data\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "- `index_col` is the column name to use as the index, default is None.\n",
 								      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Here is an example code snippet for Spark Data:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "```python\n",
 								      "import pandas as pd\n",
 								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
 								      "\n",
 								      "# Creating a dictionary\n",
 								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
 								      "\n",
 								      "# Creating a pandas DataFrame\n",
 								      "dataframe = pd.DataFrame(data)\n",
 								      "label = \"Price\"\n",
 								      "\n",
 								      "# Convert to pandas-on-spark dataframe\n",
 								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
 								      "\n",
 								      "Here is an example of how to use it:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "```python\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
 								      "\n",
 								      "columns = psdf.columns\n",
 								      "feature_cols = [col for col in columns if col != label]\n",
 								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
 								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
 								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
 								      "\n",
 								      "### Estimators\n",
 								      "\n",
 								      "#### Model List\n",
 								      "\n",
 								      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
 								      "\n",
 								      "#### Usage\n",
 								      "\n",
 								      "First, prepare your data in the required format as described in the previous section.\n",
 								      "\n",
 								      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
 								      "\n",
 								      "Here is an example code snippet using SparkML models in AutoML:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "```python\n",
 								      "import flaml\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "automl = flaml.AutoML()\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
 								      "    \"task\": \"regression\",\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
 								      "\n",
 								      "automl.fit(\n",
 								      "    dataframe=psdf,\n",
 								      "    label=label,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    **settings,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      ")\n",
 								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
 								      "\n",
 								      "## Parallel Spark Jobs\n",
 								      "\n",
 								      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
 								      "\n",
 								      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
 								      "\n",
 								      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
 								      "\n",
 								      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
 								      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
 								      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
 								      "\n",
 								      "An example code snippet for using parallel Spark jobs:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "```python\n",
 								      "import flaml\n",
 								      "\n",
 								      "automl_experiment = flaml.AutoML()\n",
 								      "automl_settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"task\": \"regression\",\n",
 								      "    \"n_concurrent_trials\": 2,\n",
 								      "    \"use_spark\": True,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
 								      "\n",
 								      "automl.fit(\n",
 								      "    dataframe=dataframe,\n",
 								      "    label=label,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    **automl_settings,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      ")\n",
 								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "--------------------------------------------------------------------------------\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "```python\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "from flaml.automl import AutoML\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "import pandas as pd\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Sample data in a dictionary\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Convert dictionary to pandas DataFrame\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "dataframe = pd.DataFrame(data)\n",
 								      "label = \"Price\"\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Convert pandas DataFrame to pandas-on-spark DataFrame\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Use VectorAssembler to merge feature columns into a single vector column\n",
 								      "feature_cols = [col for col in psdf.columns if col != label]\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\", label]\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Initialize AutoML instance\n",
 								      "automl = AutoML()\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# AutoML settings\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "automl_settings = {\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"time_budget\": 30,  # Total running time in seconds\n",
 								      "    \"metric\": \"r2\",     # Evaluation metric\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "    \"task\": \"regression\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"n_concurrent_trials\": 2,   # Number of concurrent Spark jobs\n",
 								      "    \"use_spark\": True,          # Enable Spark for parallel training\n",
 								      "    \"force_cancel\": True,       # Force cancel Spark jobs if they exceed the time budget\n",
 								      "    \"estimator_list\": [\"lgbm_spark\"]  # Optional: Specific estimator to use\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "}\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# Run AutoML fit with pandas-on-spark dataframe\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "automl.fit(\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    dataframe=psdf,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "    label=label,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    **automl_settings,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      ")\n",
 								      "```\n",
 								      "TERMINATE\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "--------------------------------------------------------------------------------\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "rag_chat()\n",
 								    "# type exit to terminate the chat"
 								   ]
 								  },
 								  {
 								   "attachments": {},
 								   "cell_type": "markdown",
 								   "metadata": {},
 								   "source": [
 								    "### Call RetrieveUserProxyAgent while init chat with another user proxy agent\n",
 								    "Sometimes, there might be a need to use RetrieveUserProxyAgent in group chat without initializing the chat with it. In such scenarios, it becomes essential to create a function that wraps the RAG agents and allows them to be called from other agents."
 								   ]
 								  },
 								  {
 								   "cell_type": "code",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								   "execution_count": 5,
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								   "metadata": {},
 								   "outputs": [
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "How to use spark for parallel training in FLAML? Give me sample code.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "--------------------------------------------------------------------------------\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[32m***** Suggested function call: retrieve_content *****\u001b[0m\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "Arguments: \n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "{\"message\":\"using Apache Spark for parallel training in FLAML with sample code\"}\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\u001b[32m*****************************************************\u001b[0m\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "--------------------------------------------------------------------------------\n",
 								      "\u001b[35m\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      ">>>>>>>> EXECUTING FUNCTION retrieve_content...\u001b[0m\n"
 								     ]
 								    },
 								    {
 								     "name": "stderr",
 								     "output_type": "stream",
 								     "text": [
 								      "Number of requested results 3 is greater than number of elements in index 1, updating n_results = 1\n",
 								      "Model gpt4-1106-preview not found. Using cl100k_base encoding.\n"
 								     ]
 								    },
 								    {
 								     "name": "stdout",
 								     "output_type": "stream",
 								     "text": [
 								      "VectorDB returns doc_ids:  [['bdfbc921']]\n",
 								      "\u001b[32mAdding content of doc bdfbc921 to context.\u001b[0m\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[32m***** Response from calling function (retrieve_content) *****\u001b[0m\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
 								      "context provided by the user.\n",
 								      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
 								      "For code generation, you must obey the following rules:\n",
 								      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
 								      "Rule 2. You must follow the formats below to write your code:\n",
 								      "```language\n",
 								      "# your code\n",
 								      "```\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "User's question is: using Apache Spark for parallel training in FLAML with sample code\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "Context is: # Integrate - Spark\n",
 								      "\n",
 								      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "- Use Spark ML estimators for AutoML.\n",
 								      "- Use Spark to run training in parallel spark jobs.\n",
 								      "\n",
 								      "## Spark ML Estimators\n",
 								      "\n",
 								      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
 								      "\n",
 								      "### Data\n",
 								      "\n",
 								      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
 								      "\n",
 								      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
 								      "\n",
 								      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "- `index_col` is the column name to use as the index, default is None.\n",
 								      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
 								      "\n",
 								      "Here is an example code snippet for Spark Data:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "```python\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "import pandas as pd\n",
 								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# Creating a dictionary\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
 								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
 								      "}\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "# Creating a pandas DataFrame\n",
 								      "dataframe = pd.DataFrame(data)\n",
 								      "label = \"Price\"\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# Convert to pandas-on-spark dataframe\n",
 								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "Here is an example of how to use it:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```python\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "columns = psdf.columns\n",
 								      "feature_cols = [col for col in columns if col != label]\n",
 								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
 								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
 								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
 								      "\n",
 								      "### Estimators\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "#### Model List\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
 								      "\n",
 								      "#### Usage\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "First, prepare your data in the required format as described in the previous section.\n",
 								      "\n",
 								      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
 								      "\n",
 								      "Here is an example code snippet using SparkML models in AutoML:\n",
 								      "\n",
 								      "```python\n",
 								      "import flaml\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
 								      "\n",
 								      "automl = flaml.AutoML()\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
 								      "    \"task\": \"regression\",\n",
 								      "}\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "automl.fit(\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "    dataframe=psdf,\n",
 								      "    label=label,\n",
 								      "    **settings,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      ")\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
 								      "\n",
 								      "## Parallel Spark Jobs\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
 								      "\n",
 								      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
 								      "\n",
 								      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
 								      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
 								      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
 								      "\n",
 								      "An example code snippet for using parallel Spark jobs:\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```python\n",
 								      "import flaml\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "automl_experiment = flaml.AutoML()\n",
 								      "automl_settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"task\": \"regression\",\n",
 								      "    \"n_concurrent_trials\": 2,\n",
 								      "    \"use_spark\": True,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
 								      "\n",
 								      "automl.fit(\n",
 								      "    dataframe=dataframe,\n",
 								      "    label=label,\n",
 								      "    **automl_settings,\n",
 								      ")\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "```\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
 								      "\n",
 								      "\n",
 								      "\u001b[32m*************************************************************\u001b[0m\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "--------------------------------------------------------------------------------\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[33mBoss\u001b[0m (to chat_manager):\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[32m***** Response from calling function (retrieve_content) *****\u001b[0m\n",
 								      "You're a retrieve augmented coding assistant. You answer user's questions based on your own knowledge and the\n",
 								      "context provided by the user.\n",
 								      "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n",
 								      "For code generation, you must obey the following rules:\n",
 								      "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n",
 								      "Rule 2. You must follow the formats below to write your code:\n",
 								      "```language\n",
 								      "# your code\n",
 								      "```\n",
 								      "\n",
 								      "User's question is: using Apache Spark for parallel training in FLAML with sample code\n",
 								      "\n",
 								      "Context is: # Integrate - Spark\n",
 								      "\n",
 								      "FLAML has integrated Spark for distributed training. There are two main aspects of integration with Spark:\n",
 								      "\n",
 								      "- Use Spark ML estimators for AutoML.\n",
 								      "- Use Spark to run training in parallel spark jobs.\n",
 								      "\n",
 								      "## Spark ML Estimators\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "FLAML integrates estimators based on Spark ML models. These models are trained in parallel using Spark, so we called them Spark estimators. To use these models, you first need to organize your data in the required format.\n",
 								      "\n",
 								      "### Data\n",
 								      "\n",
 								      "For Spark estimators, AutoML only consumes Spark data. FLAML provides a convenient function `to_pandas_on_spark` in the `flaml.automl.spark.utils` module to convert your data into a pandas-on-spark (`pyspark.pandas`) dataframe/series, which Spark estimators require.\n",
 								      "\n",
 								      "This utility function takes data in the form of a `pandas.Dataframe` or `pyspark.sql.Dataframe` and converts it into a pandas-on-spark dataframe. It also takes `pandas.Series` or `pyspark.sql.Dataframe` and converts it into a [pandas-on-spark](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/index.html) series. If you pass in a `pyspark.pandas.Dataframe`, it will not make any changes.\n",
 								      "\n",
 								      "This function also accepts optional arguments `index_col` and `default_index_type`.\n",
 								      "\n",
 								      "- `index_col` is the column name to use as the index, default is None.\n",
 								      "- `default_index_type` is the default index type, default is \"distributed-sequence\". More info about default index type could be found on Spark official [documentation](https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/options.html#default-index-type)\n",
 								      "\n",
 								      "Here is an example code snippet for Spark Data:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
 								      "```python\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "import pandas as pd\n",
 								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# Creating a dictionary\n",
 								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "}\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# Creating a pandas DataFrame\n",
 								      "dataframe = pd.DataFrame(data)\n",
 								      "label = \"Price\"\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "# Convert to pandas-on-spark dataframe\n",
 								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "To use Spark ML models you need to format your data appropriately. Specifically, use [`VectorAssembler`](https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.feature.VectorAssembler.html) to merge all feature columns into a single vector column.\n",
 								      "\n",
 								      "Here is an example of how to use it:\n",
 								      "\n",
 								      "```python\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
 								      "\n",
 								      "columns = psdf.columns\n",
 								      "feature_cols = [col for col in columns if col != label]\n",
 								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
 								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
 								      "```\n",
 								      "\n",
 								      "Later in conducting the experiment, use your pandas-on-spark data like non-spark data and pass them using `X_train, y_train` or `dataframe, label`.\n",
 								      "\n",
 								      "### Estimators\n",
 								      "\n",
 								      "#### Model List\n",
 								      "\n",
 								      "- `lgbm_spark`: The class for fine-tuning Spark version LightGBM models, using [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/) API.\n",
 								      "\n",
 								      "#### Usage\n",
 								      "\n",
 								      "First, prepare your data in the required format as described in the previous section.\n",
 								      "\n",
 								      "By including the models you intend to try in the `estimators_list` argument to `flaml.automl`, FLAML will start trying configurations for these models. If your input is Spark data, FLAML will also use estimators with the `_spark` postfix by default, even if you haven't specified them.\n",
 								      "\n",
 								      "Here is an example code snippet using SparkML models in AutoML:\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "```python\n",
 								      "import flaml\n",
 								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "# prepare your data in pandas-on-spark format as we previously mentioned\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "automl = flaml.AutoML()\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    \"estimator_list\": [\"lgbm_spark\"],  # this setting is optional\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "    \"task\": \"regression\",\n",
 								      "}\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "automl.fit(\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "    dataframe=psdf,\n",
 								      "    label=label,\n",
 								      "    **settings,\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      ")\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/automl_bankrupt_synapseml.ipynb)\n",
 								      "\n",
 								      "## Parallel Spark Jobs\n",
 								      "\n",
 								      "You can activate Spark as the parallel backend during parallel tuning in both [AutoML](/docs/Use-Cases/Task-Oriented-AutoML#parallel-tuning) and [Hyperparameter Tuning](/docs/Use-Cases/Tune-User-Defined-Function#parallel-tuning), by setting the `use_spark` to `true`. FLAML will dispatch your job to the distributed Spark backend using [`joblib-spark`](https://github.com/joblib/joblib-spark).\n",
 								      "\n",
 								      "Please note that you should not set `use_spark` to `true` when applying AutoML and Tuning for Spark Data. This is because only SparkML models will be used for Spark Data in AutoML and Tuning. As SparkML models run in parallel, there is no need to distribute them with `use_spark` again.\n",
 								      "\n",
 								      "All the Spark-related arguments are stated below. These arguments are available in both Hyperparameter Tuning and AutoML:\n",
 								      "\n",
 								      "- `use_spark`: boolean, default=False | Whether to use spark to run the training in parallel spark jobs. This can be used to accelerate training on large models and large datasets, but will incur more overhead in time and thus slow down training in some cases. GPU training is not supported yet when use_spark is True. For Spark clusters, by default, we will launch one trial per executor. However, sometimes we want to launch more trials than the number of executors (e.g., local mode). In this case, we can set the environment variable `FLAML_MAX_CONCURRENT` to override the detected `num_executors`. The final number of concurrent trials will be the minimum of `n_concurrent_trials` and `num_executors`.\n",
 								      "- `n_concurrent_trials`: int, default=1 | The number of concurrent trials. When n_concurrent_trials > 1, FLAML performes parallel tuning.\n",
 								      "- `force_cancel`: boolean, default=False | Whether to forcely cancel Spark jobs if the search time exceeded the time budget. Spark jobs include parallel tuning jobs and Spark-based model training jobs.\n",
 								      "\n",
 								      "An example code snippet for using parallel Spark jobs:\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "```python\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "import flaml\n",
 								      "\n",
 								      "automl_experiment = flaml.AutoML()\n",
 								      "automl_settings = {\n",
 								      "    \"time_budget\": 30,\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"task\": \"regression\",\n",
 								      "    \"n_concurrent_trials\": 2,\n",
 								      "    \"use_spark\": True,\n",
 								      "    \"force_cancel\": True,  # Activating the force_cancel option can immediately halt Spark jobs once they exceed the allocated time_budget.\n",
 								      "}\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "automl.fit(\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    dataframe=dataframe,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "    label=label,\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "    **automl_settings,\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      ")\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "```\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/integrate_spark.ipynb)\n",
 								      "\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[32m*************************************************************\u001b[0m\n",
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								      "\n",
 								      "--------------------------------------------------------------------------------\n",
-												Support setting vector_db as a param (#2313)

* Added vectordb base and chromadb

* Remove timer and unused functions

* Added filter by distance

* Added test utils

* Fix format

* Fix type hint of dict

* Rename test

* Add test chromadb

* Fix test no chromadb

* Add coverage

* Don't skip test vectordb utils

* Add types

* Fix tests

* Fix docs build error

* Add types to base

* Update base

* Update utils

* Update chromadb

* Add get_docs_by_ids

* Improve docstring

* Update init params

* Update init vector db

* Add get all docs

* Move chroma_results_to_query_results to utils

* Add init vectordb

* Convert format of results for old version

* Improve type hints

* Update get_context for new query results format

* Fix typo

* Improve init db

* Update default folder

* Update logger

* Update init, add embedding func

* Update distance_threshold

* Fix logger name

* Update qdrant

* Fix init db

* Update notebooks

* Use kwargs to improve readability

* Improve docstring of vectordb, add two attributes

* Add db_config

* Update gitignore

* Update comments

* Add source

* Fix file downloaded from urls have the same name

* Remove files added by mistake

* Improve docstring

* Update docstring

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update docstring

* Update docstring

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
											
										
										
											2024-04-17 16:30:05 +08:00
+								      "\u001b[33mProduct_Manager\u001b[0m (to chat_manager):\n",
 								      "\n",
 								      "To use Apache Spark for parallel training in FLAML, you can follow these steps:\n",
 								      "\n",
 								      "1. Ensure your data is in the required pandas-on-spark format.\n",
 								      "2. Use Spark ML estimators by including them in the `estimator_list`.\n",
 								      "3. Set `use_spark` to `True` for parallel tuning.\n",
 								      "\n",
 								      "Here's a sample code demonstrating how to use Spark for parallel training in FLAML:\n",
 								      "\n",
 								      "```python\n",
 								      "import flaml\n",
 								      "from flaml.automl.spark.utils import to_pandas_on_spark\n",
 								      "import pandas as pd\n",
 								      "from pyspark.ml.feature import VectorAssembler\n",
 								      "\n",
 								      "# Sample data in a pandas DataFrame\n",
 								      "data = {\n",
 								      "    \"Square_Feet\": [800, 1200, 1800, 1500, 850],\n",
 								      "    \"Age_Years\": [20, 15, 10, 7, 25],\n",
 								      "    \"Price\": [100000, 200000, 300000, 240000, 120000],\n",
 								      "}\n",
 								      "label = \"Price\"\n",
 								      "\n",
 								      "# Creating a pandas DataFrame\n",
 								      "dataframe = pd.DataFrame(data)\n",
 								      "\n",
 								      "# Convert to pandas-on-spark dataframe\n",
 								      "psdf = to_pandas_on_spark(dataframe)\n",
 								      "\n",
 								      "# Prepare features using VectorAssembler\n",
 								      "columns = psdf.columns\n",
 								      "feature_cols = [col for col in columns if col != label]\n",
 								      "featurizer = VectorAssembler(inputCols=feature_cols, outputCol=\"features\")\n",
 								      "psdf = featurizer.transform(psdf.to_spark(index_col=\"index\"))[\"index\", \"features\"]\n",
 								      "\n",
 								      "# Initialize AutoML\n",
 								      "automl = flaml.AutoML()\n",
 								      "\n",
 								      "# Configure settings for AutoML\n",
 								      "settings = {\n",
 								      "    \"time_budget\": 30,  # time budget in seconds\n",
 								      "    \"metric\": \"r2\",\n",
 								      "    \"estimator_list\": [\"lgbm_spark\"],  # using Spark ML estimators\n",
 								      "    \"task\": \"regression\",\n",
 								      "    \"n_concurrent_trials\": 2,  # number of parallel trials\n",
 								      "    \"use_spark\": True,  # enable parallel training using Spark\n",
 								      "    \"force_cancel\": True,  # force cancel Spark jobs if time_budget is exceeded\n",
 								      "}\n",
 								      "\n",
 								      "# Start the training\n",
 								      "automl.fit(dataframe=psdf, label=label, **settings)\n",
 								      "```\n",
 								      "\n",
 								      "In this code snippet:\n",
 								      "- The `to_pandas_on_spark` function is used to convert the pandas DataFrame to a pandas-on-spark DataFrame.\n",
 								      "- `VectorAssembler` is used to transform feature columns into a single vector column.\n",
 								      "- The `AutoML` object is created, and settings are configured for the AutoML run, including setting `use_spark` to `True` for parallel training.\n",
 								      "- The `fit` method is called to start the automated machine learning process.\n",
 								      "\n",
 								      "By using these settings, FLAML will train the models in parallel using Spark, which can accelerate the training process on large models and datasets.\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
-												Update speaker selector in GroupChat and update some notebooks (#688)

* Add speaker selection methods

* Update groupchat RAG

* Update seed to cache_seed

* Update RetrieveChat notebook

* Update parameter name

* Add test

* Add more tests

* Add mock to test

* Add mock to test

* Fix typo speaking

* Add gracefully exit manual input

* Update round_robin docstring

* Add method checking

* Remove participant roles

* Fix versions in notebooks

* Minimize installation overhead

* Fix missing lower()

* Add comments for try_count 3

* Update warning for n_agents < 3

* Update warning for n_agents < 3

* Add test_n_agents_less_than_3

* Add a function for manual select

* Update version in notebooks

* Fixed bugs that allow speakers to go twice in a row even when allow_repeat_speaker = False

---------

Co-authored-by: Adam Fourney <adamfo@microsoft.com>
											
										
										
											2023-11-17 21:56:11 +08:00
+								      "TERMINATE\n",
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								      "\n",
 								      "--------------------------------------------------------------------------------\n"
 								     ]
 								    }
 								   ],
 								   "source": [
 								    "call_rag_chat()"
 								   ]
-												Fix issue 1440 by applying new function registration decorator (#1661)

* Reproduce #1440

* Updated code with latest APIs

* Reran notebook

* Fix usage of cache
											
										
										
											2024-02-18 23:47:19 +08:00
+								  },
 								  {
 								   "cell_type": "code",
 								   "execution_count": null,
 								   "metadata": {},
 								   "outputs": [],
 								   "source": []
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								  }
 								 ],
 								 "metadata": {
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								  "front_matter": {
 								   "description": "Implement and manage a multi-agent chat system using AutoGen, where AI assistants retrieve information, generate code, and interact collaboratively to solve complex tasks, especially in areas not covered by their training data.",
 								   "tags": [
 								    "group chat",
 								    "orchestration",
 								    "RAG"
 								   ]
-												Upgrade Quarto and use notebook metadata for frontmatter  (#1836)

* Update process_notebook to use metadata instead of a yaml comment

* upgrade quarto and version check in tool

* formatting

* address comments
											
										
										
											2024-03-02 09:27:11 -05:00
+								  },
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								  "kernelspec": {
 								   "display_name": "flaml",
 								   "language": "python",
 								   "name": "python3"
 								  },
 								  "language_info": {
 								   "codemirror_mode": {
 								    "name": "ipython",
 								    "version": 3
 								   },
 								   "file_extension": ".py",
 								   "mimetype": "text/x-python",
 								   "name": "python",
 								   "nbconvert_exporter": "python",
 								   "pygments_lexer": "ipython3",
-												Supporting callable message (#1852)

* add message field

* send

* message func doc str

* test dict message

* retiring soon

* generate_init_message docstr

* remove todo

* update notebook

* CompressibleAgent

* update notebook

* add test

* retrieve agent

* update test

* summary_method args

* summary

* carryover

* dict message

* update nested doc

* generate_init_message

* fix typo

* update docs for mathchat

* Fix missing message

* Add docstrings

* model

* notebook

* default naming

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: kevin666aa <yrwu000627@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
											
										
										
											2024-03-09 15:27:46 -05:00
+								   "version": "3.10.13"
-												Add group chat and retrieve agent example (#227)

* Add group chat and retrieve agent example

* Fix link and models

* Support call rag in a group chat and not init with rag

* Fix n_results logic

* Update notebook

* Fix format

* Improve wording

* Update variable name

* Revert to main

* Update function call

* Update keys

* Update contents

* Update contents
											
										
										
											2023-10-18 04:31:27 +08:00
+								  }
 								 },
 								 "nbformat": 4,
 								 "nbformat_minor": 2
 								}