{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Use AutoGen in Microsoft Fabric\n", "\n", "[AutoGen](https://github.com/microsoft/autogen) offers conversable LLM agents, which can be used to solve various tasks with human or automatic feedback, including tasks that require using tools via code.\n", "Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).\n", "\n", "[Microsoft Fabric](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview) is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. Its pre-built AI models include GPT-x models such as `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `gpt-4-8k`, `gpt-4-32k`, `gpt-35-turbo`, `gpt-35-turbo-16k` and `gpt-35-turbo-instruct`, etc. It's important to note that the Azure Open AI service is not supported on trial SKUs and only paid SKUs (F64 or higher, or P1 or higher) are supported.\n", "\n", "In this notebook, we demonstrate several examples:\n", "- 1. How to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code.\n", "- 2. How to use `AssistantAgent` and `RetrieveUserProxyAgent` to do Retrieval Augmented Generation (RAG) for QA and Code Generation.\n", "- 3. How to use `MultimodalConversableAgent` to chat with images.\n", "\n", "### Requirements\n", "\n", "AutoGen requires `Python>=3.8`. To run this notebook example, please install:\n", "```bash\n", "pip install \"pyautogen[retrievechat,lmm]>=0.2.28\"\n", "```\n", "\n", "Also, this notebook depends on Microsoft Fabric pre-built LLM endpoints. Running it elsewhere may encounter errors." ] }, { "cell_type": "markdown", "id": "1", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Install AutoGen" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "%pip install \"pyautogen[retrievechat,lmm]>=0.2.28\" -q" ] }, { "cell_type": "markdown", "id": "3", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Set up config_list and llm_config" ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:24:20.5752101Z", "execution_start_time": "2024-06-07T15:24:03.7868628Z", "livy_statement_state": "available", "parent_msg_id": "bf8925aa-a2a2-4686-9388-3ec1eb12c5d7", "queued_time": "2024-06-07T15:23:08.5880731Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 9, "statement_ids": [ 9 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 9, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from synapse.ml.mlflow import get_mlflow_env_config\n", "\n", "\n", "def get_config_list():\n", " mlflow_env_configs = get_mlflow_env_config()\n", " access_token = mlflow_env_configs.driver_aad_token\n", " prebuilt_AI_base_url = mlflow_env_configs.workload_endpoint + \"cognitive/openai/\"\n", "\n", " config_list = [\n", " {\n", " \"model\": \"gpt-4o\",\n", " \"api_key\": access_token,\n", " \"base_url\": prebuilt_AI_base_url,\n", " \"api_type\": \"azure\",\n", " \"api_version\": \"2024-02-01\",\n", " },\n", " ]\n", "\n", " # Set temperature, timeout and other LLM configurations\n", " llm_config = {\n", " \"config_list\": config_list,\n", " \"temperature\": 0,\n", " \"timeout\": 600,\n", " }\n", " return config_list, llm_config\n", "\n", "\n", "config_list, llm_config = get_config_list()\n", "\n", "assert len(config_list) > 0\n", "print(\"models to use: \", [config_list[i][\"model\"] for i in range(len(config_list))])" ] }, { "cell_type": "markdown", "id": "5", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Example 1\n", "How to use `AssistantAgent` and `UserProxyAgent` to write code and execute the code." ] }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:25:04.5390713Z", "execution_start_time": "2024-06-07T15:24:21.6208975Z", "livy_statement_state": "available", "parent_msg_id": "93157ebd-4f6e-4ad6-b089-5b40edea3787", "queued_time": "2024-06-07T15:23:08.5886561Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 10, "statement_ids": [ 10 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 10, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33muser_proxy\u001b[0m (to assistant):\n", "\n", "\n", "Who should read this paper: https://arxiv.org/abs/2308.08155\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to user_proxy):\n", "\n", "To determine who should read the paper titled \"https://arxiv.org/abs/2308.08155\", we need to extract and analyze the abstract and other relevant information from the paper. This will help us understand the content and target audience of the paper.\n", "\n", "Let's write a Python script to fetch and print the abstract and other relevant details from the arXiv page.\n", "\n", "```python\n", "# filename: fetch_arxiv_paper_info.py\n", "\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_arxiv_paper_info(url):\n", " response = requests.get(url)\n", " if response.status_code == 200:\n", " soup = BeautifulSoup(response.content, 'html.parser')\n", " \n", " # Extract the title\n", " title = soup.find('h1', class_='title').text.replace('Title:', '').strip()\n", " \n", " # Extract the authors\n", " authors = soup.find('div', class_='authors').text.replace('Authors:', '').strip()\n", " \n", " # Extract the abstract\n", " abstract = soup.find('blockquote', class_='abstract').text.replace('Abstract:', '').strip()\n", " \n", " # Extract the subjects\n", " subjects = soup.find('span', class_='primary-subject').text.strip()\n", " \n", " print(f\"Title: {title}\\n\")\n", " print(f\"Authors: {authors}\\n\")\n", " print(f\"Abstract: {abstract}\\n\")\n", " print(f\"Subjects: {subjects}\\n\")\n", " else:\n", " print(\"Failed to fetch the paper information.\")\n", "\n", "# URL of the arXiv paper\n", "url = \"https://arxiv.org/abs/2308.08155\"\n", "fetch_arxiv_paper_info(url)\n", "```\n", "\n", "Please save the code in a file named `fetch_arxiv_paper_info.py` and execute it. This script will fetch and print the title, authors, abstract, and subjects of the paper, which will help us determine the target audience.\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[31m\n", ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n", "\u001b[33muser_proxy\u001b[0m (to assistant):\n", "\n", "exitcode: 0 (execution succeeded)\n", "Code output: \n", "Title: AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\n", "\n", "Authors: Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang\n", "\n", "Abstract: AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.\n", "\n", "Subjects: Artificial Intelligence (cs.AI)\n", "\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to user_proxy):\n", "\n", "Based on the extracted information, here is a summary of who should read the paper titled \"AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation\":\n", "\n", "### Title:\n", "**AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation**\n", "\n", "### Authors:\n", "Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang\n", "\n", "### Abstract:\n", "AutoGen is an open-source framework that allows developers to build LLM (Large Language Model) applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes that employ combinations of LLMs, human inputs, and tools. Using AutoGen, developers can also flexibly define agent interaction behaviors. Both natural language and computer code can be used to program flexible conversation patterns for different applications. AutoGen serves as a generic infrastructure to build diverse applications of various complexities and LLM capacities. Empirical studies demonstrate the effectiveness of the framework in many example applications, with domains ranging from mathematics, coding, question answering, operations research, online decision-making, entertainment, etc.\n", "\n", "### Subjects:\n", "**Artificial Intelligence (cs.AI)**\n", "\n", "### Target Audience:\n", "1. **AI Researchers and Practitioners**: Those who are working in the field of artificial intelligence, especially those focusing on large language models (LLMs) and multi-agent systems.\n", "2. **Developers and Engineers**: Software developers and engineers interested in building applications using LLMs and multi-agent frameworks.\n", "3. **Academics and Students**: Academics and students studying AI, machine learning, and related fields who are interested in the latest frameworks and methodologies for building LLM applications.\n", "4. **Industry Professionals**: Professionals in industries such as technology, operations research, and entertainment who are looking to leverage AI and LLMs for various applications.\n", "5. **Open-Source Community**: Contributors and users of open-source AI frameworks who are interested in new tools and frameworks for developing AI applications.\n", "\n", "This paper is particularly relevant for those interested in the practical applications and infrastructure for building complex AI systems using conversational agents.\n", "\n", "TERMINATE\n", "\n", "--------------------------------------------------------------------------------\n" ] } ], "source": [ "import autogen\n", "\n", "# create an AssistantAgent instance named \"assistant\"\n", "assistant = autogen.AssistantAgent(\n", " name=\"assistant\",\n", " llm_config=llm_config,\n", ")\n", "\n", "# create a UserProxyAgent instance named \"user_proxy\"\n", "user_proxy = autogen.UserProxyAgent(\n", " name=\"user_proxy\",\n", " human_input_mode=\"NEVER\", # input() doesn't work, so needs to be \"NEVER\" here\n", " max_consecutive_auto_reply=10,\n", " is_termination_msg=lambda x: x.get(\"content\", \"\").rstrip().endswith(\"TERMINATE\"),\n", " code_execution_config={\n", " \"work_dir\": \"coding\",\n", " \"use_docker\": False, # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n", " },\n", " llm_config=llm_config,\n", " system_message=\"\"\"Reply TERMINATE if the task has been solved at full satisfaction.\n", "Otherwise, reply CONTINUE, or the reason why the task is not solved yet.\"\"\",\n", ")\n", "\n", "# the assistant receives a message from the user, which contains the task description\n", "chat_result = user_proxy.initiate_chat(\n", " assistant,\n", " message=\"\"\"\n", "Who should read this paper: https://arxiv.org/abs/2308.08155\n", "\"\"\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:26:14.0364536Z", "execution_start_time": "2024-06-07T15:26:13.6931272Z", "livy_statement_state": "available", "parent_msg_id": "50747d08-5234-4212-9d18-ea3133cfb35e", "queued_time": "2024-06-07T15:26:12.4397897Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 13, "statement_ids": [ 13 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 13, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Cost for the chat:\n", "{'usage_including_cached_inference': {'total_cost': 0.02107, 'gpt-4o-2024-05-13': {'cost': 0.02107, 'prompt_tokens': 1616, 'completion_tokens': 866, 'total_tokens': 2482}}, 'usage_excluding_cached_inference': {'total_cost': 0.02107, 'gpt-4o-2024-05-13': {'cost': 0.02107, 'prompt_tokens': 1616, 'completion_tokens': 866, 'total_tokens': 2482}}}\n" ] } ], "source": [ "print(f\"Cost for the chat:\\n{chat_result.cost}\")" ] }, { "cell_type": "markdown", "id": "8", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Example 2\n", "How to use `AssistantAgent` and `RetrieveUserProxyAgent` to do Retrieval Augmented Generation (RAG) for QA and Code Generation.\n", "\n", "Check out this [blog](https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat) for more details." ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:26:26.4217205Z", "execution_start_time": "2024-06-07T15:26:26.0872609Z", "livy_statement_state": "available", "parent_msg_id": "2d2b3ee3-300e-4959-b68c-c95843c42eb7", "queued_time": "2024-06-07T15:26:25.1160753Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 14, "statement_ids": [ 14 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 14, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import tempfile\n", "\n", "from autogen.coding import LocalCommandLineCodeExecutor\n", "\n", "# Create a temporary directory to store the code files.\n", "temp_dir = tempfile.TemporaryDirectory()\n", "\n", "# Create a local command line code executor.\n", "code_executor = LocalCommandLineCodeExecutor(\n", " timeout=40, # Timeout for each code execution in seconds.\n", " work_dir=temp_dir.name, # Use the temporary directory to store the code files.\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "10", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "from autogen import AssistantAgent\n", "from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent\n", "\n", "# 1. create an AssistantAgent instance named \"assistant\"\n", "assistant = AssistantAgent(\n", " name=\"assistant\",\n", " system_message=\"You are a helpful assistant.\",\n", " llm_config=llm_config,\n", ")\n", "\n", "# 2. create the RetrieveUserProxyAgent instance named \"ragproxyagent\"\n", "ragproxyagent = RetrieveUserProxyAgent(\n", " name=\"ragproxyagent\",\n", " human_input_mode=\"NEVER\",\n", " max_consecutive_auto_reply=5,\n", " retrieve_config={\n", " \"docs_path\": [\n", " \"https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview\",\n", " \"https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations\",\n", " ],\n", " \"chunk_token_size\": 2000,\n", " \"model\": config_list[0][\"model\"],\n", " \"vector_db\": \"chroma\", # to use the deprecated `client` parameter, set to None and uncomment the line above\n", " \"overwrite\": True, # set to True if you want to overwrite an existing collection\n", " },\n", " code_execution_config={\"executor\": code_executor}, # Use the local command line code executor.\n", ")" ] }, { "cell_type": "markdown", "id": "11", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### 2.1 let's ask a question \"List all the Components of Microsoft Fabric\".\n", "\n", "The answer from **ChatGPT with gpt-4o** at June 7th, 2024 is as below:\n", "```\n", "Microsoft Fabric is a comprehensive data platform that integrates various services and tools for data management, analytics, and collaboration. As of the latest information available, Microsoft Fabric includes the following components:\n", "\n", "Data Integration:\n", "\n", "Azure Data Factory: For creating, scheduling, and orchestrating data workflows.\n", "Power Query: A data transformation and data preparation tool.\n", "Data Engineering:\n", "\n", "Azure Synapse Analytics: For big data and data warehousing solutions, including Synapse SQL, Spark, and Data Explorer.\n", "Data Science:\n", "\n", "Azure Machine Learning: For building, training, and deploying machine learning models.\n", "Azure Databricks: For collaborative big data and AI solutions.\n", "Data Warehousing:\n", "\n", "...\n", "```\n", "\n", "While the answer from AutoGen RAG agent with gpt-4o is as below:\n", "```\n", "The components of Microsoft Fabric are:\n", "\n", "1. Power BI\n", "2. Data Factory\n", "3. Data Activator\n", "4. Industry Solutions\n", "5. Real-Time Intelligence\n", "6. Synapse Data Engineering\n", "7. Synapse Data Science\n", "8. Synapse Data Warehouse\n", "\n", "Sources: [Microsoft Fabric Overview](https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview)\n", "```\n", "\n", "AutoGen RAG agent's answer is exactly the right answer per the official documents while ChatGPT made a few mistakes, it even listed Azure Databricks." ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:27:29.0170714Z", "execution_start_time": "2024-06-07T15:27:14.1923093Z", "livy_statement_state": "available", "parent_msg_id": "47d2a7c5-affb-44c5-9fef-a01d3026c638", "queued_time": "2024-06-07T15:26:25.4548817Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 16, "statement_ids": [ 16 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 16, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Trying to create collection.\n", "Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2\n", "VectorDB returns doc_ids: [['f7c9052b', '621d4a0b']]\n", "\u001b[32mAdding content of doc f7c9052b to context.\u001b[0m\n", "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", "\n", "You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the\n", "context provided by the user. You should follow the following steps to answer a question:\n", "Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or\n", "a question answering task.\n", "Step 2, you reply based on the intent.\n", "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n", "If user's intent is code generation, you must obey the following rules:\n", "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n", "Rule 2. You must follow the formats below to write your code:\n", "```language\n", "# your code\n", "```\n", "\n", "If user's intent is question answering, you must give as short an answer as possible.\n", "\n", "User's question is: List all the Components of Microsoft Fabric\n", "\n", "Context is: # What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn\n", "\n", "What is Microsoft Fabric - Microsoft Fabric | Microsoft Learn\n", "\n", "[Skip to main content](#main)\n", "\n", "This browser is no longer supported.\n", "\n", "Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.\n", "\n", "[Download Microsoft Edge](https://go.microsoft.com/fwlink/p/?LinkID=2092881 ) \n", "[More info about Internet Explorer and Microsoft Edge](https://learn.microsoft.com/en-us/lifecycle/faq/internet-explorer-microsoft-edge) \n", "\n", "Table of contents \n", "\n", "Exit focus mode\n", "\n", "Read in English\n", "\n", "Save\n", "\n", "Table of contents\n", "\n", "Read in English\n", "\n", "Save\n", "\n", "Add to Plan\n", "\n", "[Edit](https://github.com/MicrosoftDocs/fabric-docs/blob/main/docs/get-started/microsoft-fabric-overview.md \"Edit This Document\")\n", "\n", "---\n", "\n", "#### Share via\n", "\n", "Facebook\n", "x.com\n", "LinkedIn\n", "Email\n", "\n", "---\n", "\n", "Print\n", "\n", "Table of contents\n", "\n", "What is Microsoft Fabric?\n", "=========================\n", "\n", "* Article\n", "* 05/21/2024\n", "* 15 contributors\n", "\n", "Feedback\n", "\n", "In this article\n", "---------------\n", "\n", "Microsoft Fabric is an end-to-end analytics and data platform designed for enterprises that require a unified solution. It encompasses data movement, processing, ingestion, transformation, real-time event routing, and report building. It offers a comprehensive suite of services including Data Engineering, Data Factory, Data Science, Real-Time Analytics, Data Warehouse, and Databases.\n", "\n", "With Fabric, you don't need to assemble different services from multiple vendors. Instead, it offers a seamlessly integrated, user-friendly platform that simplifies your analytics requirements. Operating on a Software as a Service (SaaS) model, Fabric brings simplicity and integration to your solutions.\n", "\n", "Microsoft Fabric integrates separate components into a cohesive stack. Instead of relying on different databases or data warehouses, you can centralize data storage with OneLake. AI capabilities are seamlessly embedded within Fabric, eliminating the need for manual integration. With Fabric, you can easily transition your raw data into actionable insights for business users.\n", "\n", "Unification with SaaS foundation\n", "--------------------------------\n", "\n", "Microsoft Fabric is built on a foundation of Software as a Service (SaaS). It combines both new and existing components from Power BI, Azure Synapse Analytics, Azure Data Factory, and more services into a unified environment. These components are then tailored into customized user experiences.\n", "\n", "[![Diagram of the software as a service foundation beneath the different experiences of Fabric.](media/microsoft-fabric-overview/fabric-architecture.png)](media/microsoft-fabric-overview/fabric-architecture.png#lightbox)\n", "\n", "Fabric integrates workloads such as Data Engineering, Data Factory, Data Science, Data Warehouse, Real-Time Intelligence, Industry solutions, and Power BI into a shared SaaS foundation. Each of these experiences is tailored for distinct user roles like data engineers, scientists, or warehousing professionals, and they serve a specific task. The entire Fabric stack has AI integration and it accelerates the data journey. These workloads work together seemlessly and provide the following advantages:\n", "\n", "* Access to an extensive range of deeply integrated analytics in the industry.\n", "* Shared experiences across experiences that are familiar and easy to learn.\n", "* Easy access to, and readily reuse all assets.\n", "* Unified data lake storage that preserves data in its original location while using your preferred analytics tools.\n", "* Centralized administration and governance across all experiences.\n", "\n", "Fabric seamlessly integrates data and services, enabling unified management, governance, and discovery. It ensures security for items, data, and row-level access. You can centrally configure core enterprise capabilities. Permissions are automatically applied across all the underlying services. Additionally, data sensitivity labels inherit automatically across the items in the suite. Governance is powered by Purview which is built into Fabric.\n", "\n", "Fabric allows creators to concentrate on producing their best work, freeing them from the need to integrate, manage, or even understand the underlying infrastructure.\n", "\n", "Components of Microsoft Fabric\n", "------------------------------\n", "\n", "Fabric offers a comprehensive set of analytics experiences designed to work together seamlessly. The platform tailors each of these experiences to a specific persona and a specific task:\n", "\n", "![Screenshot of the Fabric menu of experiences.](media/microsoft-fabric-overview/workload-menu.png)\n", "\n", "* **Power BI** - Power BI lets you easily connect to your data sources, visualize and discover what's important, and share that with anyone or everyone you want. This integrated experience allows business owners to access all data in Fabric quickly and intuitively and to make better decisions with data. For more information, see [What is Power BI?](/en-us/power-bi/fundamentals/power-bi-overview)\n", "* **Data Factory** - Data Factory provides a modern data integration experience to ingest, prepare, and transform data from a rich set of data sources. It incorporates the simplicity of Power Query, and you can use more than 200 native connectors to connect to data sources on-premises and in the cloud. For more information, see [What is Data Factory in Microsoft Fabric?](../data-factory/data-factory-overview)\n", "* **Data Activator** - Data Activator is a no-code experience in Fabric that allows you to specify actions, such as email notifications and Power Automate workflows, to launch when Data Activator detects specific patterns or conditions in your changing data. It monitors data in Power BI reports and eventstreams; when the data hits certain thresholds or matches other patterns, it automatically takes the appropriate action. For more information, see [What is Data Activator?](../data-activator/data-activator-introduction)\n", "* **Industry Solutions** - Fabric provides industry-specific data solutions that address unique industry needs and challenges, and include data management, analytics, and decision-making. For more information, see [Industry Solutions in Microsoft Fabric](/en-us/industry/industry-data-solutions-fabric).\n", "* **Real-Time Intelligence** - Real-time Intelligence is an end-to-end solution for event-driven scenarios, streaming data, and data logs. It enables the extraction of insights, visualization, and action on data in motion by handling data ingestion, transformation, storage, analytics, visualization, tracking, AI, and real-time actions. The [Real-Time hub](#real-time-hub---the-unification-of-data-streams) in Real-Time Intelligence provides a wide variety of no-code connectors, converging into a catalog of organizational data that is protected, governed, and integrated across Fabric. For more information, see [What is Real-Time Intelligence in Fabric?](../real-time-intelligence/overview).\n", "* **Synapse Data Engineering** - Synapse Data Engineering provides a Spark platform with great authoring experiences. It enables you to create, manage, and optimize infrastructures for collecting, storing, processing, and analyzing vast data volumes. Fabric Spark's integration with Data Factory allows you to schedule and orchestrate notebooks and Spark jobs. For more information, see [What is Data engineering in Microsoft Fabric?](../data-engineering/data-engineering-overview)\n", "* **Synapse Data Science** - Synapse Data Science enables you to build, deploy, and operationalize machine learning models from Fabric. It integrates with Azure Machine Learning to provide built-in experiment tracking and model registry. Data scientists can enrich organizational data with predictions and business analysts can integrate those predictions into their BI reports, allowing a shift from descriptive to predictive insights. For more information, see [What is Data science in Microsoft Fabric?](../data-science/data-science-overview)\n", "* **Synapse Data Warehouse** - Synapse Data Warehouse provides industry leading SQL performance and scale. It separates compute from storage, enabling independent scaling of both components. Additionally, it natively stores data in the open Delta Lake format. For more information, see [What is data warehousing in Microsoft Fabric?](../data-warehouse/data-warehousing)\n", "\n", "Microsoft Fabric enables organizations and individuals to turn large and complex data repositories into actionable workloads and analytics, and is an implementation of data mesh architecture. For more information, see [What is a data mesh?](/en-us/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/architectures/what-is-data-mesh)\n", "\n", "OneLake: The unification of lakehouses\n", "--------------------------------------\n", "\n", "The Microsoft Fabric platform unifies the OneLake and lakehouse architecture across an enterprise.\n", "\n", "### OneLake\n", "\n", "A data lake is the foundation on which all the Fabric workloads are built. Microsoft Fabric Lake is also known as [OneLake](../onelake/onelake-overview). OneLake is built into the Fabric platform and provides a unified location to store all organizational data where the workloads operate.\n", "\n", "OneLake is built on ADLS (Azure Data Lake Storage) Gen2. It provides a single SaaS experience and a tenant-wide store for data that serves both professional and citizen developers. OneLake simplifies Fabric experiences by eliminating the need for you to understand infrastructure concepts such as resource groups, RBAC (Role-Based Access Control), Azure Resource Manager, redundancy, or regions. You don't need an Azure account to use Fabric.\n", "\n", "OneLake eliminates data silos, which individual developers often create when they provision and configure their own isolated storage accounts. Instead, OneLake provides a single, unified storage system for all developers. It ensures easy data discovery, sharing, and uniform enforcement of policy and security settings. For more information, see [What is OneLake?](../onelake/onelake-overview)\n", "\n", "### OneLake and lakehouse data hierarchy\n", "\n", "OneLake is hierarchical in nature to simplify management across your organization. Microsoft Fabric includes OneLake and there's no requirement for any up-front provisioning. There's only one OneLake per tenant and it provides a single-pane-of-glass file-system namespace that spans across users, regions, and clouds. OneLake organizes data into manageable containers for easy handling.\n", "\n", "The tenant maps to the root of OneLake and is at the top level of the hierarchy. You can create any number of workspaces, which you can think of as folders, within a tenant.\n", "\n", "The following image shows how Fabric stores data in various items within OneLake. As shown, you can create multiple workspaces within a tenant, and create multiple lakehouses within each workspace. A lakehouse is a collection of files, folders, and tables that represents a database over a data lake. To learn more, see [What is a lakehouse?](../data-engineering/lakehouse-overview).\n", "\n", "![Diagram of the hierarchy of items like lakehouses and semantic models within a workspace within a tenant.](media/microsoft-fabric-overview/hierarchy-within-tenant.png)\n", "\n", "Every developer and business unit in the tenant can easily create their own workspaces in OneLake. They can ingest data into their own lakehouses, then start processing, analyzing, and collaborating on the data, just like OneDrive in Microsoft Office.\n", "\n", "All the Microsoft Fabric compute experiences are prewired to OneLake, just like the Office applications are prewired to use the organizational OneDrive. The experiences such as Data Engineering, Data Warehouse, Data Factory, Power BI, and Real-Time Intelligence use OneLake as their native store. They don't need any extra configuration.\n", "\n", "[![Diagram of different Fabric experiences all accessing the same OneLake data storage.](media/microsoft-fabric-overview/onelake-architecture.png)](media/microsoft-fabric-overview/onelake-architecture.png#lightbox)\n", "\n", "OneLake allows instant mounting of your existing Platform as a Service (PaaS) storage accounts into OneLake with the [Shortcut](../onelake/onelake-shortcuts) feature. You don't need to migrate or move any of your existing data. Using shortcuts, you can access the data stored in your Azure Data Lake Storage.\n", "\n", "Shortcuts also allow you to easily share data between users and applications without moving or duplicating information. You can create shortcuts to other storage systems, allowing you to compose and analyze data across clouds with transparent, intelligent caching that reduces egress costs and brings data closer to compute.\n", "\n", "Real-Time hub - the unification of data streams\n", "-----------------------------------------------\n", "\n", "The Real-Time hub is a foundational location for data in motion.\n", "\n", "The Real-Time hub provides a unified SaaS experience and tenant-wide logical place for all data-in-motion. The Real-Time hub lists all data in motion from all sources that customers can discover, ingest, manage, and consume and react upon, and contains both [streams](../real-time-intelligence/event-streams/overview) and [KQL database](../real-time-intelligence/create-database) tables. Streams includes [**Data streams**](../real-time-intelligence/event-streams/create-manage-an-eventstream), **Microsoft sources** (for example, [Azure Event Hubs](../real-time-hub/add-source-azure-event-hubs), [Azure IoT Hub](../real-time-hub/add-source-azure-iot-hub), [Azure SQL DB Change Data Capture (CDC)](../real-time-hub/add-source-azure-sql-database-cdc), [Azure Cosmos DB CDC](../real-time-hub/add-source-azure-cosmos-db-cdc), and [PostgreSQL DB CDC](../real-time-hub/add-source-postgresql-database-cdc)), and [**Fabric events**](../real-time-intelligence/event-streams/add-source-fabric-workspace) (Fabric system events and external system events brought in from Azure, Microsoft 365, or other clouds).\n", "\n", "The Real-Time hub enables users to easily discover, ingest, manage, and consume data-in-motion from a wide variety of source so that they can collaborate and develop streaming applications within one place. For more information, see [What is the Real-Time hub?](../real-time-hub/real-time-hub-overview)\n", "\n", "Fabric solutions for ISVs\n", "-------------------------\n", "\n", "If you're an Independent Software Vendors (ISVs) looking to integrate your solutions with Microsoft Fabric, you can use one of the following paths based on your desired level of integration:\n", "\n", "* **Interop** - Integrate your solution with the OneLake Foundation and establish basic connections and interoperability with Fabric.\n", "* **Develop on Fabric** - Build your solution on top of the Fabric platform or seamlessly embed Fabric's functionalities into your existing applications. You can easily use Fabric capabilities with this option.\n", "* **Build a Fabric workload** - Create customized workloads and experiences in Fabric tailoring your offerings to maximize their impact within the Fabric ecosystem.\n", "\n", "For more information, see the [Fabric ISV partner ecosystem](../cicd/partners/partner-integration).\n", "\n", "Related content\n", "---------------\n", "\n", "* [Microsoft Fabric terminology](fabric-terminology)\n", "* [Create a workspace](create-workspaces)\n", "* [Navigate to your items from Microsoft Fabric Home page](fabric-home)\n", "* [End-to-end tutorials in Microsoft Fabric](end-to-end-tutorials)\n", "\n", "---\n", "\n", "Feedback\n", "--------\n", "\n", "Was this page helpful?\n", "\n", "Yes\n", "\n", "No\n", "\n", "[Provide product feedback](https://ideas.fabric.microsoft.com/)\n", "|\n", "\n", "[Ask the community](https://community.fabric.microsoft.com/powerbi)\n", "\n", "Feedback\n", "--------\n", "\n", "Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: . \n", "\n", "Submit and view feedback for\n", "\n", "[This product](https://ideas.fabric.microsoft.com/)\n", "This page\n", "\n", "[View all page feedback](https://github.com//issues)\n", "\n", "---\n", "\n", "Additional resources\n", "--------------------\n", "\n", "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", "\n", "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", "\n", "Theme\n", "\n", "* Light\n", "* Dark\n", "* High contrast\n", "\n", "* \n", "* [Previous Versions](/en-us/previous-versions/)\n", "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", "* [Contribute](/en-us/contribute/)\n", "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", "* [Terms of Use](/en-us/legal/termsofuse)\n", "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", "* © Microsoft 2024\n", "\n", "Additional resources\n", "--------------------\n", "\n", "### In this article\n", "\n", "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", "\n", "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", "\n", "Theme\n", "\n", "* Light\n", "* Dark\n", "* High contrast\n", "\n", "* \n", "* [Previous Versions](/en-us/previous-versions/)\n", "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", "* [Contribute](/en-us/contribute/)\n", "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", "* [Terms of Use](/en-us/legal/termsofuse)\n", "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", "* © Microsoft 2024\n", "\n", "\n", "The source of the context is: ['https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview']\n", "\n", "If you can answer the question, in the end of your answer, add the source of the context in the format of `Sources: source1, source2, ...`.\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", "\n", "The components of Microsoft Fabric are:\n", "\n", "1. Power BI\n", "2. Data Factory\n", "3. Data Activator\n", "4. Industry Solutions\n", "5. Real-Time Intelligence\n", "6. Synapse Data Engineering\n", "7. Synapse Data Science\n", "8. Synapse Data Warehouse\n", "\n", "Sources: https://learn.microsoft.com/en-us/fabric/get-started/microsoft-fabric-overview\n", "\n", "--------------------------------------------------------------------------------\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2024-06-07 15:27:15,139 - autogen.agentchat.contrib.retrieve_user_proxy_agent - INFO - Found 2 chunks.\u001b[0m\n", "2024-06-07 15:27:15,142 - autogen.agentchat.contrib.vectordb.chromadb - INFO - No content embedding is provided. Will use the VectorDB's embedding function to generate the content embedding.\u001b[0m\n" ] } ], "source": [ "assistant.reset()\n", "problem = \"List all the Components of Microsoft Fabric\"\n", "chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=problem)" ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:27:30.3621271Z", "execution_start_time": "2024-06-07T15:27:30.0131748Z", "livy_statement_state": "available", "parent_msg_id": "d9d3c442-0b5b-4eee-a34d-187119f9b420", "queued_time": "2024-06-07T15:26:25.6902567Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 17, "statement_ids": [ 17 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 17, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Cost for the chat:\n", "{'usage_including_cached_inference': {'total_cost': 0.019565000000000003, 'gpt-4o-2024-05-13': {'cost': 0.019565000000000003, 'prompt_tokens': 3688, 'completion_tokens': 75, 'total_tokens': 3763}}, 'usage_excluding_cached_inference': {'total_cost': 0.019565000000000003, 'gpt-4o-2024-05-13': {'cost': 0.019565000000000003, 'prompt_tokens': 3688, 'completion_tokens': 75, 'total_tokens': 3763}}}\n" ] } ], "source": [ "print(f\"Cost for the chat:\\n{chat_result.cost}\")" ] }, { "cell_type": "markdown", "id": "14", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### 2.2 let's ask it to generate AutoML code for us\n", "\n", "The question is \"Train a regression model, set time budget to 12s, plot the time line plot after training.\".\n", "\n", "ChatGPT's answer is as below:\n", "\n", "[It showed a figure]\n", "\n", "The timeline plot above shows the elapsed time during the training of a linear regression model. The red dashed line indicates the 12-second time budget. The model was trained iteratively, and the plot demonstrates that the training process was monitored to ensure it stayed within the specified time budget.\n", "```\n", "import time\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import make_regression\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LinearRegression\n", "\n", "# Create a synthetic regression dataset\n", "X, y = make_regression(n_samples=1000, n_features=20, noise=0.1)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", "\n", "# Initialize the model\n", "model = LinearRegression()\n", "\n", "# Record the start time\n", "start_time = time.time()\n", "\n", "# Train the model and record intermediate times\n", "times = []\n", "time_budget = 12 # in seconds\n", "\n", "for _ in range(100):\n", " model.fit(X_train, y_train)\n", " current_time = time.time()\n", " elapsed_time = current_time - start_time\n", " times.append(elapsed_time)\n", " if elapsed_time > time_budget:\n", " break\n", "\n", "# Plot the timeline\n", "plt.figure(figsize=(10, 5))\n", "plt.plot(times, label='Training time')\n", "plt.axhline(y=time_budget, color='r', linestyle='--', label='Time Budget (12s)')\n", "plt.xlabel('Iteration')\n", "plt.ylabel('Elapsed Time (s)')\n", "plt.title('Training Time Line Plot')\n", "plt.legend()\n", "plt.grid(True)\n", "plt.show()\n", "```\n", "\n", "It's not what I need, as ChatGPT has no context of the [AutoML](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations) solution in Fabric Data Science.\n", "\n", "AutoGen RAG agent's answer is much better and ready for deployment. It retrieved the document related to the question and generated code based on the document. It automatically ran the code, fixed the errors in the code based on the output, and finally it got the correct code." ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:28:21.4439921Z", "execution_start_time": "2024-06-07T15:27:31.3321982Z", "livy_statement_state": "available", "parent_msg_id": "19420cb8-2f86-495b-8f20-5349cb41d940", "queued_time": "2024-06-07T15:26:25.8861394Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 18, "statement_ids": [ 18 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 18, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Number of requested results 20 is greater than number of elements in index 2, updating n_results = 2\n", "VectorDB returns doc_ids: [['621d4a0b', 'f7c9052b']]\n", "\u001b[32mAdding content of doc 621d4a0b to context.\u001b[0m\n", "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", "\n", "You're a retrieve augmented chatbot. You answer user's questions based on your own knowledge and the\n", "context provided by the user. You should follow the following steps to answer a question:\n", "Step 1, you estimate the user's intent based on the question and context. The intent can be a code generation task or\n", "a question answering task.\n", "Step 2, you reply based on the intent.\n", "If you can't answer the question with or without the current context, you should reply exactly `UPDATE CONTEXT`.\n", "If user's intent is code generation, you must obey the following rules:\n", "Rule 1. You MUST NOT install any packages because all the packages needed are already installed.\n", "Rule 2. You must follow the formats below to write your code:\n", "```language\n", "# your code\n", "```\n", "\n", "If user's intent is question answering, you must give as short an answer as possible.\n", "\n", "User's question is: Train a regression model, set time budget to 12s, plot the time line plot after training.\n", "\n", "Context is: # Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn\n", "\n", "Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn\n", "\n", "[Skip to main content](#main)\n", "\n", "This browser is no longer supported.\n", "\n", "Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.\n", "\n", "[Download Microsoft Edge](https://go.microsoft.com/fwlink/p/?LinkID=2092881 ) \n", "[More info about Internet Explorer and Microsoft Edge](https://learn.microsoft.com/en-us/lifecycle/faq/internet-explorer-microsoft-edge) \n", "\n", "Table of contents \n", "\n", "Exit focus mode\n", "\n", "Read in English\n", "\n", "Save\n", "\n", "Table of contents\n", "\n", "Read in English\n", "\n", "Save\n", "\n", "Add to Plan\n", "\n", "[Edit](https://github.com/MicrosoftDocs/fabric-docs/blob/main/docs/data-science/tuning-automated-machine-learning-visualizations.md \"Edit This Document\")\n", "\n", "---\n", "\n", "#### Share via\n", "\n", "Facebook\n", "x.com\n", "LinkedIn\n", "Email\n", "\n", "---\n", "\n", "Print\n", "\n", "Table of contents\n", "\n", "Training visualizations (preview)\n", "=================================\n", "\n", "* Article\n", "* 03/26/2024\n", "* 4 contributors\n", "\n", "Feedback\n", "\n", "In this article\n", "---------------\n", "\n", "A hyperparameter trial or AutoML trial searches for the optimal parameters for a machine learning model. Each trial consists of multiple runs, where each run evaluates a specific parameter combination. Users can monitor these runs using ML experiment items in Fabric.\n", "\n", "The `flaml.visualization` module offers functions to plot and compare the runs in FLAML. Users can use Plotly to interact with their AutoML experiment plots. To use these functions, users need to input their optimized `flaml.AutoML` or `flaml.tune.tune.ExperimentAnalysis` object.\n", "\n", "This article teaches you how to use the `flaml.visualization` module to analyze and explore your AutoML trial results. You can follow the same steps for your hyperparameter trial as well.\n", "\n", "Important\n", "\n", "This feature is in [preview](../get-started/preview).\n", "\n", "Create an AutoML trial\n", "----------------------\n", "\n", "AutoML offers a suite of automated processes that can identify the best machine learning pipeline for your dataset, making the entire modeling process more straightforward and often more accurate. In essence, it saves you the trouble of hand-tuning different models and hyperparameters.\n", "\n", "In the code cell below, we will:\n", "\n", "1. Load the Iris dataset.\n", "2. Split the data into training and test sets.\n", "3. Initiate an AutoML trial to fit our training data.\n", "4. Explore the results of our AutoML trial with the visualizations from `flaml.visualization`.\n", "\n", "```\n", "from sklearn.datasets import load_iris\n", "from sklearn.model_selection import train_test_split\n", "from flaml import AutoML\n", "\n", "# Load the Iris data and split it into train and test sets\n", "x, y = load_iris(return_X_y=True, as_frame=True)\n", "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", "\n", "# Create an AutoML instance and set the parameters\n", "automl = AutoML()\n", "automl_settings = {\n", " \"time_budget\": 10, # Time limit in seconds\n", " \"task\": \"classification\", # Type of machine learning task\n", " \"log_file_name\": \"aml_iris.log\", # Name of the log file\n", " \"metric\": \"accuracy\", # Evaluation metric\n", " \"log_type\": \"all\", # Level of logging\n", "}\n", "# Fit the AutoML instance on the training data\n", "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", "\n", "```\n", "\n", "Visualize the experiment results\n", "--------------------------------\n", "\n", "Once you run an AutoML trial, you need to visualize the outcomes to analyze how well the models performed and how they behaved. In this part of our documentation, we show you how to use the built-in utilities in the FLAML library for this purpose.\n", "\n", "### Import visualization module\n", "\n", "To access these visualization utilities, we run the following import command:\n", "\n", "```\n", "import flaml.visualization as fviz\n", "\n", "```\n", "\n", "### Optimization history\n", "\n", "An optimization history plot typically has the number of trials/iterations on the x-axis and a performance metric (like accuracy, RMSE, etc.) on the y-axis. As the number of trials increases, you would see a line or scatter plot indicating the performance of each trial.\n", "\n", "```\n", "fig = fviz.plot_optimization_history(automl)\n", "# or\n", "fig = fviz.plot(automl, \"optimization_history\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of optimization history plot.](media/model-training/optimization-history.png)](media/model-training/optimization-history.png#lightbox)\n", "\n", "### Feature importance\n", "\n", "A feature importance plot is a powerful visualization tool that allows you to understand the significance of different input features in determining the predictions of a model.\n", "\n", "```\n", "fig = fviz.plot_feature_importance(automl)\n", "# or\n", "fig = fviz.plot(automl, \"feature_importance\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of feature importance plot.](media/model-training/feature-importance.png)](media/model-training/feature-importance.png#lightbox)\n", "\n", "### Parallel coordinate plot\n", "\n", "A parallel coordinate plot is a visualization tool that represents multi-dimensional data by drawing multiple vertical lines (axes) corresponding to variables or hyperparameters, with data points plotted as connected lines across these axes. In the context of an AutoML or tuning experiment, it's instrumental in visualizing and analyzing the performance of different hyperparameter combinations. By tracing the paths of high-performing configurations, one can discern patterns or trends in hyperparameter choices and their interactions. This plot aids in understanding which combinations lead to optimal performance, pinpointing potential areas for further exploration, and identifying any trade-offs between different hyperparameters.\n", "\n", "This utility takes the following other arguments:\n", "\n", "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", "\n", "```\n", "fig = fviz.plot_parallel_coordinate(automl, learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", "# or\n", "fig = fviz.plot(automl, \"parallel_coordinate\", learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of parallel coordinate plot.](media/model-training/parallel-coordinate-plot.png)](media/model-training/parallel-coordinate-plot.png#lightbox)\n", "\n", "### Contour plot\n", "\n", "A contour plot visualizes three-dimensional data in two dimensions, where the x and y axes represent two hyperparameters, and the contour lines or filled contours depict levels of a performance metric (for example, accuracy or loss). In the context of an AutoML or tuning experiment, a contour plot is beneficial for understanding the relationship between two hyperparameters and their combined effect on model performance.\n", "\n", "By examining the density and positioning of the contour lines, one can identify regions of hyperparameter space where performance is optimized, ascertain potential trade-offs between hyperparameters, and gain insights into their interactions. This visualization helps refine the search space and tuning process.\n", "\n", "This utility also takes the following arguments:\n", "\n", "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", "\n", "```\n", "fig = fviz.plot_contour(automl, learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", "# or\n", "fig = fviz.plot(automl, \"contour\", learner=\"lgbm\", params=[\"n_estimators\", \"num_leaves\", \"learning_rate\"])\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of contour plot.](media/model-training/contour-plot.png)](media/model-training/contour-plot.png#lightbox)\n", "\n", "### Empirical distribution function\n", "\n", "An empirical distribution function (EDF) plot, often visualized as a step function, represents the cumulative probability of data points being less than or equal to a particular value. Within an AutoML or tuning experiment, an EDF plot can be employed to visualize the distribution of model performances across different hyperparameter configurations.\n", "\n", "By observing the steepness or flatness of the curve at various points, one can understand the concentration of good or poor model performances, respectively. This visualization offers insights into the overall efficacy of the tuning process, highlighting whether most of the attempted configurations are yielding satisfactory results or if only a few configurations stand out.\n", "\n", "Note\n", "\n", "For AutoML experiments, multiple models will be applied during training. The trials of each learner are represented as an optimization series.\n", "For hyperparameter tuning experiments, there will be only a single learner that is evaluated. However, you can provide additional tuning experiments to see the trends across each learner.\n", "\n", "```\n", "fig = fviz.plot_edf(automl)\n", "# or\n", "fig = fviz.plot(automl, \"edf\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of the empirical distribution function plot.](media/model-training/empirical-distribution-function-plot.png)](media/model-training/empirical-distribution-function-plot.png#lightbox)\n", "\n", "### Timeline plot\n", "\n", "A timeline plot, often represented as a Gantt chart or a sequence of bars, visualizes the start, duration, and completion of tasks over time. In the context of an AutoML or tuning experiment, a timeline plot can showcase the progression of various model evaluations and their respective durations, plotted against time. By observing this plot, users can grasp the efficiency of the search process, identify any potential bottlenecks or idle periods, and understand the temporal dynamics of different hyperparameter evaluations.\n", "\n", "```\n", "fig = fviz.plot_timeline(automl)\n", "# or\n", "fig = fviz.plot(automl, \"timeline\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of timeline plot.](media/model-training/timeline-plot.png)](media/model-training/timeline-plot.png#lightbox)\n", "\n", "### Slice plot\n", "\n", "Plot the parameter relationship as slice plot in a study.\n", "\n", "This utility also takes the following arguments:\n", "\n", "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", "\n", "```\n", "fig = fviz.plot_slice(automl, learner=\"sgd\")\n", "# or\n", "fig = fviz.plot(automl, \"slice\", learner=\"sgd\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of slice plot.](media/model-training/slice-plot.png)](media/model-training/slice-plot.png#lightbox)\n", "\n", "### Hyperparameter importance\n", "\n", "A hyperparameter importance plot visually ranks hyperparameters based on their influence on model performance in an AutoML or tuning experiment. Displayed typically as a bar chart, it quantifies the impact of each hyperparameter on the target metric. By examining this plot, practitioners can discern which hyperparameters are pivotal in determining model outcomes and which ones have minimal effect.\n", "\n", "This utility also takes the following arguments:\n", "\n", "* `learner`: Specify the learner you intend to study in the experiment. This parameter is only applicable for AutoML experiment results. By leaving this blank, the system chooses the best learner in the whole experiment.\n", "* `params`: A list to specify which hyperparameter to display. By leaving this blank, the system displays all the available hyperparameters.\n", "\n", "```\n", "fig = fviz.plot_param_importance(automl, learner=\"sgd\")\n", "# or\n", "fig = fviz.plot(automl, \"param_importance\", learner=\"sgd\")\n", "fig.show()\n", "\n", "```\n", "\n", "Here is the resulting plot:\n", "\n", "[![Graph of hyperparameter importance plot.](media/model-training/hyperparameter-importance-plot.png)](media/model-training/hyperparameter-importance-plot.png#lightbox)\n", "\n", "Related content\n", "---------------\n", "\n", "* [Tune a SynapseML Spark LightGBM model](how-to-tune-lightgbm-flaml)\n", "\n", "---\n", "\n", "Feedback\n", "--------\n", "\n", "Was this page helpful?\n", "\n", "Yes\n", "\n", "No\n", "\n", "[Provide product feedback](https://ideas.fabric.microsoft.com/?forum=f2a1a698-503e-ed11-bba2-000d3a8b12b6&category=91402968-e13f-ed11-bba3-000d3a8b12b6)\n", "|\n", "\n", "[Ask the community](https://community.fabric.microsoft.com/synapse)\n", "\n", "Feedback\n", "--------\n", "\n", "Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: . \n", "\n", "Submit and view feedback for\n", "\n", "[This product](https://ideas.fabric.microsoft.com/?forum=f2a1a698-503e-ed11-bba2-000d3a8b12b6&category=91402968-e13f-ed11-bba3-000d3a8b12b6)\n", "This page\n", "\n", "[View all page feedback](https://github.com//issues)\n", "\n", "---\n", "\n", "Additional resources\n", "--------------------\n", "\n", "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", "\n", "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", "\n", "Theme\n", "\n", "* Light\n", "* Dark\n", "* High contrast\n", "\n", "* \n", "* [Previous Versions](/en-us/previous-versions/)\n", "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", "* [Contribute](/en-us/contribute/)\n", "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", "* [Terms of Use](/en-us/legal/termsofuse)\n", "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", "* © Microsoft 2024\n", "\n", "Additional resources\n", "--------------------\n", "\n", "### In this article\n", "\n", "[California Consumer Privacy Act (CCPA) Opt-Out Icon\n", "\n", "Your Privacy Choices](https://aka.ms/yourcaliforniaprivacychoices)\n", "\n", "Theme\n", "\n", "* Light\n", "* Dark\n", "* High contrast\n", "\n", "* \n", "* [Previous Versions](/en-us/previous-versions/)\n", "* [Blog](https://techcommunity.microsoft.com/t5/microsoft-learn-blog/bg-p/MicrosoftLearnBlog)\n", "* [Contribute](/en-us/contribute/)\n", "* [Privacy](https://go.microsoft.com/fwlink/?LinkId=521839)\n", "* [Terms of Use](/en-us/legal/termsofuse)\n", "* [Trademarks](https://www.microsoft.com/legal/intellectualproperty/Trademarks/)\n", "* © Microsoft 2024\n", "\n", "\n", "The source of the context is: ['https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations']\n", "\n", "If you can answer the question, in the end of your answer, add the source of the context in the format of `Sources: source1, source2, ...`.\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", "\n", "Step 1: The user's intent is a code generation task to train a regression model with a time budget of 12 seconds and plot the timeline plot after training.\n", "\n", "Step 2: Here is the code to achieve this:\n", "\n", "```python\n", "from sklearn.datasets import load_boston\n", "from sklearn.model_selection import train_test_split\n", "from flaml import AutoML\n", "import flaml.visualization as fviz\n", "\n", "# Load the Boston housing data and split it into train and test sets\n", "x, y = load_boston(return_X_y=True)\n", "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", "\n", "# Create an AutoML instance and set the parameters\n", "automl = AutoML()\n", "automl_settings = {\n", " \"time_budget\": 12, # Time limit in seconds\n", " \"task\": \"regression\", # Type of machine learning task\n", " \"log_file_name\": \"aml_boston.log\", # Name of the log file\n", " \"metric\": \"rmse\", # Evaluation metric\n", " \"log_type\": \"all\", # Level of logging\n", "}\n", "\n", "# Fit the AutoML instance on the training data\n", "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", "\n", "# Plot the timeline plot\n", "fig = fviz.plot_timeline(automl)\n", "fig.show()\n", "```\n", "\n", "Sources: [Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations)\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[31m\n", ">>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...\u001b[0m\n", "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", "\n", "exitcode: 1 (execution failed)\n", "Code output: Traceback (most recent call last):\n", " File \"/tmp/tmp41070gi5/tmp_code_4463932bbc95a1921034eb428e7ded0c.py\", line 1, in \n", " from sklearn.datasets import load_boston\n", " File \"/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/sklearn/datasets/__init__.py\", line 157, in __getattr__\n", " raise ImportError(msg)\n", "ImportError: \n", "`load_boston` has been removed from scikit-learn since version 1.2.\n", "\n", "The Boston housing prices dataset has an ethical problem: as\n", "investigated in [1], the authors of this dataset engineered a\n", "non-invertible variable \"B\" assuming that racial self-segregation had a\n", "positive impact on house prices [2]. Furthermore the goal of the\n", "research that led to the creation of this dataset was to study the\n", "impact of air quality but it did not give adequate demonstration of the\n", "validity of this assumption.\n", "\n", "The scikit-learn maintainers therefore strongly discourage the use of\n", "this dataset unless the purpose of the code is to study and educate\n", "about ethical issues in data science and machine learning.\n", "\n", "In this special case, you can fetch the dataset from the original\n", "source::\n", "\n", " import pandas as pd\n", " import numpy as np\n", "\n", " data_url = \"http://lib.stat.cmu.edu/datasets/boston\"\n", " raw_df = pd.read_csv(data_url, sep=\"\\s+\", skiprows=22, header=None)\n", " data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])\n", " target = raw_df.values[1::2, 2]\n", "\n", "Alternative datasets include the California housing dataset and the\n", "Ames housing dataset. You can load the datasets as follows::\n", "\n", " from sklearn.datasets import fetch_california_housing\n", " housing = fetch_california_housing()\n", "\n", "for the California housing dataset and::\n", "\n", " from sklearn.datasets import fetch_openml\n", " housing = fetch_openml(name=\"house_prices\", as_frame=True)\n", "\n", "for the Ames housing dataset.\n", "\n", "[1] M Carlisle.\n", "\"Racist data destruction?\"\n", "\n", "\n", "[2] Harrison Jr, David, and Daniel L. Rubinfeld.\n", "\"Hedonic housing prices and the demand for clean air.\"\n", "Journal of environmental economics and management 5.1 (1978): 81-102.\n", "\n", "\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", "\n", "The `load_boston` dataset has been removed from scikit-learn due to ethical concerns. Let's use the California housing dataset instead.\n", "\n", "Here is the updated code:\n", "\n", "```python\n", "from sklearn.datasets import fetch_california_housing\n", "from sklearn.model_selection import train_test_split\n", "from flaml import AutoML\n", "import flaml.visualization as fviz\n", "\n", "# Load the California housing data and split it into train and test sets\n", "housing = fetch_california_housing()\n", "x, y = housing.data, housing.target\n", "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", "\n", "# Create an AutoML instance and set the parameters\n", "automl = AutoML()\n", "automl_settings = {\n", " \"time_budget\": 12, # Time limit in seconds\n", " \"task\": \"regression\", # Type of machine learning task\n", " \"log_file_name\": \"aml_california.log\", # Name of the log file\n", " \"metric\": \"rmse\", # Evaluation metric\n", " \"log_type\": \"all\", # Level of logging\n", "}\n", "\n", "# Fit the AutoML instance on the training data\n", "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", "\n", "# Plot the timeline plot\n", "fig = fviz.plot_timeline(automl)\n", "fig.show()\n", "```\n", "\n", "Sources: [Visualize tuning and AutoML trials - Microsoft Fabric | Microsoft Learn](https://learn.microsoft.com/en-us/fabric/data-science/tuning-automated-machine-learning-visualizations)\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[31m\n", ">>>>>>>> EXECUTING CODE BLOCK (inferred language is python)...\u001b[0m\n", "\u001b[33mragproxyagent\u001b[0m (to assistant):\n", "\n", "exitcode: 0 (execution succeeded)\n", "Code output: Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", "Please make sure you passed environment EnvConfigs to workers by calling `set_mlflow_env_config` in order to trigger mlflow on workers correctly.\n", "[flaml.automl.logger: 06-07 15:28:07] {1767} INFO - task = regression\n", "[flaml.automl.logger: 06-07 15:28:07] {1778} INFO - Evaluation method: holdout\n", "[flaml.automl.logger: 06-07 15:28:07] {1881} INFO - Minimizing error metric: rmse\n", "[flaml.automl.logger: 06-07 15:28:09] {1999} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'catboost']\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 0, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2444} INFO - Estimated sufficient time budget=3982s. Estimated necessary time budget=34s.\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 1, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 2, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.8172,\tbest estimator lgbm's best error=0.8172\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 3, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 4.9s,\testimator lgbm's best error=0.6288,\tbest estimator lgbm's best error=0.6288\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 4, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6288,\tbest estimator lgbm's best error=0.6288\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 5, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 6, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 7, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.6104,\tbest estimator lgbm's best error=0.6104\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 8, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.5627,\tbest estimator lgbm's best error=0.5627\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 9, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.5627,\tbest estimator lgbm's best error=0.5627\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 10, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:09] {2493} INFO - at 5.1s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", "[flaml.automl.logger: 06-07 15:28:09] {2309} INFO - iteration 11, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.3s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 12, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.3s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 13, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.4s,\testimator lgbm's best error=0.5001,\tbest estimator lgbm's best error=0.5001\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 14, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.6s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 15, current learner sgd\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 5.6s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 16, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.0s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 17, current learner sgd\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.0s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 18, current learner sgd\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.1s,\testimator sgd's best error=1.1240,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 19, current learner sgd\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.1s,\testimator sgd's best error=1.1067,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 20, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:10] {2493} INFO - at 6.2s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:10] {2309} INFO - iteration 21, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.5s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 22, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.6s,\testimator xgboost's best error=1.3843,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 23, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=1.3843,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 24, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.9469,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 25, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 26, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 27, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6871,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 28, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.7s,\testimator xgboost's best error=0.6203,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 29, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.8s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 30, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 31, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator xgboost's best error=0.6053,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 32, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:11] {2493} INFO - at 6.9s,\testimator xgboost's best error=0.5953,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:11] {2309} INFO - iteration 33, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator lgbm's best error=0.4888,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 34, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 35, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 36, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.4s,\testimator xgboost's best error=0.5550,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 37, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.5s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 38, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.5s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 39, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.6s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 40, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.6s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4888\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 41, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.7s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 42, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 7.8s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 43, current learner extra_tree\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator extra_tree's best error=0.8723,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 44, current learner sgd\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator sgd's best error=1.1055,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 45, current learner extra_tree\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.0s,\testimator extra_tree's best error=0.7612,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 46, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:12] {2493} INFO - at 8.1s,\testimator xgboost's best error=0.5285,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:12] {2309} INFO - iteration 47, current learner extra_tree\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.3s,\testimator extra_tree's best error=0.7612,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 48, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.4s,\testimator rf's best error=0.8142,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 49, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.5s,\testimator rf's best error=0.6937,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 50, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.6s,\testimator rf's best error=0.6937,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 51, current learner extra_tree\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.6s,\testimator extra_tree's best error=0.7209,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 52, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 8.8s,\testimator rf's best error=0.6425,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 53, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:13] {2493} INFO - at 9.0s,\testimator rf's best error=0.6055,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:13] {2309} INFO - iteration 54, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.2s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 55, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.4s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 56, current learner xgboost\n", "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.5s,\testimator xgboost's best error=0.5187,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 57, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:14] {2493} INFO - at 9.8s,\testimator lgbm's best error=0.4824,\tbest estimator lgbm's best error=0.4824\n", "[flaml.automl.logger: 06-07 15:28:14] {2309} INFO - iteration 58, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.2s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 59, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.5s,\testimator rf's best error=0.6055,\tbest estimator lgbm's best error=0.4794\n", "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 60, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 10.7s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 61, current learner rf\n", "[flaml.automl.logger: 06-07 15:28:15] {2493} INFO - at 11.0s,\testimator rf's best error=0.5968,\tbest estimator lgbm's best error=0.4794\n", "[flaml.automl.logger: 06-07 15:28:15] {2309} INFO - iteration 62, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:16] {2493} INFO - at 12.1s,\testimator lgbm's best error=0.4794,\tbest estimator lgbm's best error=0.4794\n", "[flaml.automl.logger: 06-07 15:28:17] {2736} INFO - retrain lgbm for 0.5s\n", "[flaml.automl.logger: 06-07 15:28:17] {2739} INFO - retrained model: LGBMRegressor(colsample_bytree=0.591579264701285,\n", " learning_rate=0.0715412842452619, max_bin=511,\n", " min_child_samples=2, n_estimators=1, n_jobs=-1, num_leaves=168,\n", " reg_alpha=0.01435520144866301, reg_lambda=0.006874802748054268,\n", " verbose=-1)\n", "[flaml.automl.logger: 06-07 15:28:17] {2740} INFO - Auto Feature Engineering pipeline: None\n", "[flaml.automl.logger: 06-07 15:28:17] {2035} INFO - fit succeeded\n", "[flaml.automl.logger: 06-07 15:28:17] {2036} INFO - Time taken to find the best model: 10.24332308769226\n", "\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[33massistant\u001b[0m (to ragproxyagent):\n", "\n", "TERMINATE\n", "\n", "--------------------------------------------------------------------------------\n" ] } ], "source": [ "assistant.reset()\n", "problem = \"Train a regression model, set time budget to 12s, plot the time line plot after training.\"\n", "\n", "chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=problem)" ] }, { "cell_type": "code", "execution_count": null, "id": "16", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:28:22.7924281Z", "execution_start_time": "2024-06-07T15:28:22.4431692Z", "livy_statement_state": "available", "parent_msg_id": "8c89a821-45eb-47f0-8608-11ac711f02e9", "queued_time": "2024-06-07T15:26:26.0620587Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 19, "statement_ids": [ 19 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 19, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Cost for the chat:\n", "{'usage_including_cached_inference': {'total_cost': 0.04863, 'gpt-4o-2024-05-13': {'cost': 0.04863, 'prompt_tokens': 7737, 'completion_tokens': 663, 'total_tokens': 8400}}, 'usage_excluding_cached_inference': {'total_cost': 0.04863, 'gpt-4o-2024-05-13': {'cost': 0.04863, 'prompt_tokens': 7737, 'completion_tokens': 663, 'total_tokens': 8400}}}\n" ] } ], "source": [ "print(f\"Cost for the chat:\\n{chat_result.cost}\")" ] }, { "cell_type": "markdown", "id": "17", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "Below is the code generated by AutoGen RAG agent. It's not a copy of the code in the related document as we asked for different task and training time, but AutoGen RAG agent adapted it very well." ] }, { "cell_type": "code", "execution_count": null, "id": "18", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:28:56.954585Z", "execution_start_time": "2024-06-07T15:28:23.7618029Z", "livy_statement_state": "available", "parent_msg_id": "ced1bbe3-3ab3-421a-a8a9-6eb151a3a7d3", "queued_time": "2024-06-07T15:26:26.2444398Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 20, "statement_ids": [ 20 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 20, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "[flaml.automl.logger: 06-07 15:28:28] {1767} INFO - task = regression\n", "[flaml.automl.logger: 06-07 15:28:28] {1778} INFO - Evaluation method: holdout\n", "[flaml.automl.logger: 06-07 15:28:28] {1881} INFO - Minimizing error metric: rmse\n", "[flaml.automl.logger: 06-07 15:28:28] {1999} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'sgd', 'catboost']\n", "[flaml.automl.logger: 06-07 15:28:28] {2309} INFO - iteration 0, current learner lgbm\n", "[flaml.automl.logger: 06-07 15:28:28] {2444} INFO - Estimated sufficient time budget=145s. Estimated necessary time budget=1s.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/trusted-service-user/cluster-env/trident_env/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.\n", " warnings.warn(\"Setuptools is replacing distutils.\")\n", "2024/06/07 15:28:47 WARNING mlflow.utils.requirements_utils: The following packages were not found in the public PyPI package index as of 2024-02-29; if these packages are not present in the public PyPI index, you must install them manually before loading your model: {'synapseml-internal', 'synapseml-mlflow'}\n" ] }, { "data": { "application/vnd.mlflow.run-widget+json": { "data": { "metrics": { "best_validation_loss": 0.9510965242768078, "iter_counter": 0, "rmse": 0.9510965242768078, "trial_time": 0.012721061706542969, "validation_loss": 0.9510965242768078, "wall_clock_time": 4.973712205886841 }, "params": { "colsample_bytree": "1.0", "learner": "lgbm", "learning_rate": "0.09999999999999995", "log_max_bin": "8", "min_child_samples": "20", "n_estimators": "4", "num_leaves": "4", "reg_alpha": "0.0009765625", "reg_lambda": "1.0", "sample_size": "14860" }, "tags": { "flaml.best_run": "False", "flaml.estimator_class": "LGBMEstimator", "flaml.estimator_name": "lgbm", "flaml.iteration_number": "0", "flaml.learner": "lgbm", "flaml.log_type": "r_autolog", "flaml.meric": "rmse", "flaml.run_source": "flaml-automl", "flaml.sample_size": "14860", "flaml.version": "2.1.2.post1", "mlflow.rootRunId": "da4aff39-ef24-4953-ab30-f9adc0c843bd", "mlflow.runName": "careful_stomach_bzw71tb4", "mlflow.user": "0e0e6551-b66b-41f3-bc82-bd86e0d203dc", "synapseml.experiment.artifactId": "2ba08dad-7edc-4af2-b41b-5802fb6180c2", "synapseml.experimentName": "autogen", "synapseml.livy.id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "synapseml.notebook.artifactId": "72c91c1d-9cbf-4ca5-8180-2e318bb7d1d5", "synapseml.user.id": "8abb9091-0a62-4ecd-bf6a-e49dbbf94431", "synapseml.user.name": "Li Jiang" } }, "info": { "artifact_uri": "sds://onelakedxt.pbidedicated.windows.net/a9c17701-dbed-452d-91ee-ffeef4d6674f/2ba08dad-7edc-4af2-b41b-5802fb6180c2/da4aff39-ef24-4953-ab30-f9adc0c843bd/artifacts", "end_time": 1717774129, "experiment_id": "9d1ec9c8-d313-40a4-9ed8-b9bf496195ae", "lifecycle_stage": "active", "run_id": "da4aff39-ef24-4953-ab30-f9adc0c843bd", "run_name": "", "run_uuid": "da4aff39-ef24-4953-ab30-f9adc0c843bd", "start_time": 1717774109, "status": "FINISHED", "user_id": "9ec1a2ed-32f8-4061-910f-25871321251b" }, "inputs": { "dataset_inputs": [] } } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "[flaml.automl.logger: 06-07 15:28:53] {2493} INFO - at 5.0s,\testimator lgbm's best error=0.9511,\tbest estimator lgbm's best error=0.9511\n", "[flaml.automl.logger: 06-07 15:28:54] {2736} INFO - retrain lgbm for 0.0s\n", "[flaml.automl.logger: 06-07 15:28:54] {2739} INFO - retrained model: LGBMRegressor(learning_rate=0.09999999999999995, max_bin=255, n_estimators=1,\n", " n_jobs=-1, num_leaves=4, reg_alpha=0.0009765625, reg_lambda=1.0,\n", " verbose=-1)\n", "[flaml.automl.logger: 06-07 15:28:54] {2740} INFO - Auto Feature Engineering pipeline: None\n", "[flaml.automl.logger: 06-07 15:28:54] {2742} INFO - Best MLflow run name: \n", "[flaml.automl.logger: 06-07 15:28:54] {2743} INFO - Best MLflow run id: da4aff39-ef24-4953-ab30-f9adc0c843bd\n", "[flaml.automl.logger: 06-07 15:28:54] {2035} INFO - fit succeeded\n", "[flaml.automl.logger: 06-07 15:28:54] {2036} INFO - Time taken to find the best model: 4.973712205886841\n" ] }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "base": [ 4.960991144180298 ], "name": "lgbm", "orientation": "h", "type": "bar", "x": [ 0.012721061706542969 ], "y": [ 0 ] } ], "layout": { "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "heatmapgl": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmapgl" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Timeline Plot" }, "xaxis": { "title": { "text": "Time (s)" } }, "yaxis": { "title": { "text": "Trial" } } } }, "text/html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import flaml.visualization as fviz\n", "from flaml import AutoML\n", "from sklearn.datasets import fetch_california_housing\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Load the California housing data and split it into train and test sets\n", "housing = fetch_california_housing()\n", "x, y = housing.data, housing.target\n", "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=7654321)\n", "\n", "# Create an AutoML instance and set the parameters\n", "automl = AutoML()\n", "automl_settings = {\n", " \"time_budget\": 12, # Time limit in seconds\n", " \"task\": \"regression\", # Type of machine learning task\n", " \"log_file_name\": \"aml_california.log\", # Name of the log file\n", " \"metric\": \"rmse\", # Evaluation metric\n", " \"log_type\": \"all\", # Level of logging\n", "}\n", "\n", "# Fit the AutoML instance on the training data\n", "automl.fit(X_train=x_train, y_train=y_train, **automl_settings)\n", "\n", "# Plot the timeline plot\n", "fig = fviz.plot_timeline(automl)\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "19", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### Example 3\n", "How to use `MultimodalConversableAgent` to chat with images.\n", "\n", "Check out this [blog](https://microsoft.github.io/autogen/blog/2023/11/06/LMM-Agent) for more details." ] }, { "cell_type": "markdown", "id": "20", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "We'll ask a question about below image:![image-alt-text](https://th.bing.com/th/id/R.422068ce8af4e15b0634fe2540adea7a?rik=y4OcXBE%2fqutDOw&pid=ImgRaw&r=0)" ] }, { "cell_type": "code", "execution_count": null, "id": "21", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:29:04.6027047Z", "execution_start_time": "2024-06-07T15:28:57.9532564Z", "livy_statement_state": "available", "parent_msg_id": "71bfdcee-445d-4564-b423-61d9a6378939", "queued_time": "2024-06-07T15:26:26.4400435Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 21, "statement_ids": [ 21 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 21, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[33mUser_proxy\u001b[0m (to image-explainer):\n", "\n", "What's the breed of this dog?\n", ".\n", "\n", "--------------------------------------------------------------------------------\n", "\u001b[31m\n", ">>>>>>>> USING AUTO REPLY...\u001b[0m\n", "\u001b[33mimage-explainer\u001b[0m (to User_proxy):\n", "\n", "The dog in the image appears to be a Poodle or a Poodle mix, such as a Labradoodle or a Goldendoodle, based on its curly coat and overall appearance.\n", "\n", "--------------------------------------------------------------------------------\n" ] } ], "source": [ "from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent\n", "\n", "image_agent = MultimodalConversableAgent(\n", " name=\"image-explainer\",\n", " max_consecutive_auto_reply=10,\n", " llm_config={\"config_list\": config_list, \"temperature\": 0.5, \"max_tokens\": 300},\n", ")\n", "\n", "user_proxy = autogen.UserProxyAgent(\n", " name=\"User_proxy\",\n", " system_message=\"A human admin.\",\n", " human_input_mode=\"NEVER\", # Try between ALWAYS or NEVER\n", " max_consecutive_auto_reply=0,\n", " code_execution_config={\n", " \"use_docker\": False\n", " }, # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n", ")\n", "\n", "# Ask the question with an image\n", "chat_result = user_proxy.initiate_chat(\n", " image_agent,\n", " message=\"\"\"What's the breed of this dog?\n", ".\"\"\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "22", "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [ { "data": { "application/vnd.livy.statement-meta+json": { "execution_finish_time": "2024-06-07T15:29:05.9669658Z", "execution_start_time": "2024-06-07T15:29:05.613333Z", "livy_statement_state": "available", "parent_msg_id": "af81a0c7-9ee8-4da4-aa6e-dcd735209961", "queued_time": "2024-06-07T15:26:26.7741139Z", "session_id": "1d5e9aec-2019-408c-a19a-5db9fb175ae2", "session_start_time": null, "spark_pool": null, "state": "finished", "statement_id": 22, "statement_ids": [ 22 ] }, "text/plain": [ "StatementMeta(, 1d5e9aec-2019-408c-a19a-5db9fb175ae2, 22, Finished, Available)" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Cost for the chat:\n", "{'usage_including_cached_inference': {'total_cost': 0.0053950000000000005, 'gpt-4o-2024-05-13': {'cost': 0.0053950000000000005, 'prompt_tokens': 965, 'completion_tokens': 38, 'total_tokens': 1003}}, 'usage_excluding_cached_inference': {'total_cost': 0.0053950000000000005, 'gpt-4o-2024-05-13': {'cost': 0.0053950000000000005, 'prompt_tokens': 965, 'completion_tokens': 38, 'total_tokens': 1003}}}\n" ] } ], "source": [ "print(f\"Cost for the chat:\\n{chat_result.cost}\")" ] } ], "metadata": { "kernel_info": { "name": "synapse_pyspark" }, "kernelspec": { "display_name": "synapse_pyspark", "name": "synapse_pyspark" }, "language_info": { "name": "python" }, "nteract": { "version": "nteract-front-end@1.0.0" }, "spark_compute": { "compute_id": "/trident/default" } }, "nbformat": 4, "nbformat_minor": 5 }