autogen/notebook/autobuild_basic.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c1004af6a7fbfcd8",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "# AutoBuild\n",
    "By: [Linxin Song](https://linxins97.github.io/), [Jieyu Zhang](https://jieyuz2.github.io/)\n",
    "Reference: [Agent AutoBuild](https://microsoft.github.io/autogen/blog/2023/11/26/Agent-AutoBuild/)\n",
    "\n",
    "AutoGen offers conversable agents powered by LLM, tool, or human, which can be used to perform tasks collectively via automated chat. This framework allows tool use and human participation through multi-agent conversation.\n",
    "Please find documentation about this feature [here](https://microsoft.github.io/autogen/docs/Use-Cases/agent_chat).\n",
    "\n",
    "In this notebook, we introduce a new class, `AgentBuilder`, to help user build an automatic task solving process powered by multi-agent system. Specifically, in `build()`, we prompt a LLM to create multiple participant agent and initialize a group chat, and specify whether this task need programming to solve. AgentBuilder also support open-source LLMs by [vLLM](https://docs.vllm.ai/en/latest/index.html) and [Fastchat](https://github.com/lm-sys/FastChat). Check the supported model list [here](https://docs.vllm.ai/en/latest/models/supported_models.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec78dda8e3826d8a",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Requirement\n",
    "\n",
    "AutoBuild require `pyautogen[autobuild]`, which can be installed by the following command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8e9ae50658be975",
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%pip install pyautogen[autobuild]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d0e63ab3604bdb9",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 1: prepare configuration and some useful functions\n",
    "Prepare a `config_file_or_env` for assistant agent to limit the choice of LLM you want to use in this task. This config can be a path of json file or a name of environment variable. A `default_llm_config` is also required for initialize the specific config of LLMs like seed, temperature, etc..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2505f029423b21ab",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:40:29.267289Z",
     "start_time": "2024-01-01T10:40:28.806242300Z"
    }
   },
   "outputs": [],
   "source": [
    "import autogen\n",
    "from autogen.agentchat.contrib.agent_builder import AgentBuilder\n",
    "\n",
    "config_file_or_env = \"OAI_CONFIG_LIST\"\n",
    "llm_config = {\"temperature\": 0}\n",
    "config_list = autogen.config_list_from_json(config_file_or_env, filter_dict={\"model\": [\"gpt-4-1106-preview\", \"gpt-4\"]})\n",
    "\n",
    "\n",
    "def start_task(execution_task: str, agent_list: list):\n",
    "    group_chat = autogen.GroupChat(agents=agent_list, messages=[], max_round=12)\n",
    "    manager = autogen.GroupChatManager(groupchat=group_chat, llm_config={\"config_list\": config_list, **llm_config})\n",
    "    agent_list[0].initiate_chat(manager, message=execution_task)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2d6586c68fa425b",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 2: create a AgentBuilder\n",
    "Create a `AgentBuilder` with the specified `config_path_or_env`. AgentBuilder will use `gpt-4` in default to complete the whole process, you can specify the `builder_model` and `agent_model` to other OpenAI model to match your task. \n",
    "You can also specify an open-source LLM supporting by vLLM and FastChat, see blog for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "bfa67c771a0fed37",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:40:29.854670Z",
     "start_time": "2024-01-01T10:40:29.616253600Z"
    }
   },
   "outputs": [],
   "source": [
    "builder = AgentBuilder(\n",
    "    config_file_or_env=config_file_or_env, builder_model=\"gpt-4-1106-preview\", agent_model=\"gpt-4-1106-preview\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e6a655fb6618324",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 3: specify a building task\n",
    "\n",
    "Specify a building task with a general description. Building task will help build manager (a LLM) decide what agents should be built."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "68315f6ec912c58a",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:40:30.490239100Z",
     "start_time": "2024-01-01T10:40:30.479497600Z"
    }
   },
   "outputs": [],
   "source": [
    "building_task = \"Generate some agents that can find papers on arxiv by programming and analyzing them in specific domains related to computer science and medical science.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5782dd5ecb6c217a",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 4: build group chat agents\n",
    "Use `build()` to let build manager (the specified `builder_model`) complete the group chat agents generation. If you think coding is necessary in your task, you can use `coding=True` to add a user proxy (an automatic code interpreter) into the agent list, like: \n",
    "```python\n",
    "builder.build(building_task, default_llm_config, coding=True)\n",
    "```\n",
    "If `coding` is not specified, AgentBuilder will determine on its own whether the user proxy should be added or not according to the task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "ab490fdbe46c0473",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:24:04.670904200Z",
     "start_time": "2024-01-01T10:21:50.127338300Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==> Generating agents...\n",
      "['ArXiv_Data_Scraper_Developer', 'Computer_Science_Research_Analyst', 'Medical_Science_Research_Analyst', 'Data_Analysis_Engineer', 'ML_Paper_Summarization_Specialist'] are generated.\n",
      "==> Generating system message...\n",
      "Preparing system message for ArXiv_Data_Scraper_Developer\n",
      "Preparing system message for Computer_Science_Research_Analyst\n",
      "Preparing system message for Medical_Science_Research_Analyst\n",
      "Preparing system message for Data_Analysis_Engineer\n",
      "Preparing system message for ML_Paper_Summarization_Specialist\n",
      "==> Generating description...\n",
      "Preparing description for ArXiv_Data_Scraper_Developer\n",
      "Preparing description for Computer_Science_Research_Analyst\n",
      "Preparing description for Medical_Science_Research_Analyst\n",
      "Preparing description for Data_Analysis_Engineer\n",
      "Preparing description for ML_Paper_Summarization_Specialist\n",
      "==> Creating agents...\n",
      "Creating agent ArXiv_Data_Scraper_Developer with backbone gpt-4-1106-preview...\n",
      "Creating agent Computer_Science_Research_Analyst with backbone gpt-4-1106-preview...\n",
      "Creating agent Medical_Science_Research_Analyst with backbone gpt-4-1106-preview...\n",
      "Creating agent Data_Analysis_Engineer with backbone gpt-4-1106-preview...\n",
      "Creating agent ML_Paper_Summarization_Specialist with backbone gpt-4-1106-preview...\n",
      "Adding user console proxy...\n"
     ]
    }
   ],
   "source": [
    "agent_list, agent_configs = builder.build(building_task, llm_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e00dd99880a4bf7b",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 5: execute task\n",
    "Let agents generated in `build()` to complete the task collaboratively in a group chat."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7d52e3d9a1bf91cb",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:25:32.642017700Z",
     "start_time": "2024-01-01T10:24:09.313567300Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "Find a recent paper about gpt-4 on arxiv and find its potential applications in software.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mArXiv_Data_Scraper_Developer\u001b[0m (to chat_manager):\n",
      "\n",
      "To find a recent paper about GPT-4 on arXiv and its potential applications in software, we'll need to perform a few steps:\n",
      "\n",
      "1. Query the arXiv API for recent papers on GPT-4.\n",
      "2. Filter the results to find papers that discuss potential applications in software.\n",
      "3. Extract the relevant information from the paper.\n",
      "\n",
      "Here's a Python script that uses the `arxiv` library to search for papers related to GPT-4. If you don't have the `arxiv` library installed, you can install it using `pip install arxiv`.\n",
      "\n",
      "```python\n",
      "import arxiv\n",
      "\n",
      "# Define the query parameters\n",
      "query = 'gpt-4 AND software'\n",
      "max_results = 10\n",
      "\n",
      "# Search for papers on arXiv\n",
      "search = arxiv.Search(\n",
      "  query = query,\n",
      "  max_results = max_results,\n",
      "  sort_by = arxiv.SortCriterion.SubmittedDate\n",
      ")\n",
      "\n",
      "# Fetch the results\n",
      "for result in search.results():\n",
      "    print(\"Title:\", result.title)\n",
      "    print(\"Authors:\", result.authors)\n",
      "    print(\"Abstract:\", result.summary)\n",
      "    print(\"Publication Date:\", result.published)\n",
      "    print(\"Link:\", result.entry_id)\n",
      "    print(\"\\n\")\n",
      "```\n",
      "\n",
      "This script will print out the title, authors, abstract, publication date, and link to the arXiv entry for each paper found. You can then review the abstracts to determine which papers discuss potential applications in software.\n",
      "\n",
      "Please note that the search query might need to be adjusted based on the actual terminology used in the papers and the specificity of the results you're looking for. If you encounter any issues or need further assistance, let me know!\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[31m\n",
      ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "Title: GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension\n",
      "Authors: [arxiv.Result.Author('Bohan Lyu'), arxiv.Result.Author('Xin Cong'), arxiv.Result.Author('Heyang Yu'), arxiv.Result.Author('Pan Yang'), arxiv.Result.Author('Yujia Qin'), arxiv.Result.Author('Yining Ye'), arxiv.Result.Author('Yaxi Lu'), arxiv.Result.Author('Zhong Zhang'), arxiv.Result.Author('Yukun Yan'), arxiv.Result.Author('Yankai Lin'), arxiv.Result.Author('Zhiyuan Liu'), arxiv.Result.Author('Maosong Sun')]\n",
      "Abstract: While Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated\n",
      "exceptional proficiency in natural language processing, their efficacy in\n",
      "addressing complex, multifaceted tasks remains limited. A growing area of\n",
      "research focuses on LLM-based agents equipped with external tools capable of\n",
      "performing diverse tasks. However, existing LLM-based agents only support a\n",
      "limited set of tools which is unable to cover a diverse range of user queries,\n",
      "especially for those involving expertise domains. It remains a challenge for\n",
      "LLM-based agents to extend their tools autonomously when confronted with\n",
      "various user queries. As GitHub has hosted a multitude of repositories which\n",
      "can be seen as a good resource for tools, a promising solution is that\n",
      "LLM-based agents can autonomously integrate the repositories in GitHub\n",
      "according to the user queries to extend their tool set. In this paper, we\n",
      "introduce GitAgent, an agent capable of achieving the autonomous tool extension\n",
      "from GitHub. GitAgent follows a four-phase procedure to incorporate\n",
      "repositories and it can learn human experience by resorting to GitHub\n",
      "Issues/PRs to solve problems encountered during the procedure. Experimental\n",
      "evaluation involving 30 user queries demonstrates GitAgent's effectiveness,\n",
      "achieving a 69.4% success rate on average.\n",
      "Publication Date: 2023-12-28 15:47:30+00:00\n",
      "Link: http://arxiv.org/abs/2312.17294v1\n",
      "\n",
      "\n",
      "Title: DEAP: Design Space Exploration for DNN Accelerator Parallelism\n",
      "Authors: [arxiv.Result.Author('Ekansh Agrawal'), arxiv.Result.Author('Xiangyu Sam Xu')]\n",
      "Abstract: The boom in Large Language Models (LLMs) like GPT-4 and ChatGPT has marked a\n",
      "significant advancement in artificial intelligence. These models are becoming\n",
      "increasingly complex and powerful to train and serve. This growth in\n",
      "capabilities comes with a substantial increase in computational requirements,\n",
      "both in terms of hardware resources and energy consumption. The goal of this\n",
      "paper is to showcase how hardware and software co-design can come together and\n",
      "allow us to create customized hardware systems for specific LLM workloads. We\n",
      "propose a simulation workflow that allows us to combine model parallelism\n",
      "techniques with a multi-accelerator simulation framework for efficiency\n",
      "metrics. We focus on inference workloads and report power, cycle, and latency\n",
      "metrics upon performing a design space exploration search over multiple\n",
      "software and hardware configurations.\n",
      "Publication Date: 2023-12-24 02:43:01+00:00\n",
      "Link: http://arxiv.org/abs/2312.15388v1\n",
      "\n",
      "\n",
      "Title: Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's GPT-4 with Self-Hosted Open Source SLMs in Production\n",
      "Authors: [arxiv.Result.Author('Chandra Irugalbandara'), arxiv.Result.Author('Ashish Mahendra'), arxiv.Result.Author('Roland Daynauth'), arxiv.Result.Author('Tharuka Kasthuri Arachchige'), arxiv.Result.Author('Krisztian Flautner'), arxiv.Result.Author('Lingjia Tang'), arxiv.Result.Author('Yiping Kang'), arxiv.Result.Author('Jason Mars')]\n",
      "Abstract: Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to\n",
      "create AI-enabled experiences in their products. Along with the benefits of\n",
      "ease of use and shortened time to production, this reliance on proprietary APIs\n",
      "has downsides in terms of model control, performance reliability, up-time\n",
      "predictability, and cost. At the same time, there has been a flurry of open\n",
      "source small language models (SLMs) that have been made available for\n",
      "commercial use. However, their readiness to replace existing capabilities\n",
      "remains unclear, and a systematic approach to test these models is not readily\n",
      "available. In this paper, we present a systematic evaluation methodology for,\n",
      "and characterization of, modern open source SLMs and their trade-offs when\n",
      "replacing a proprietary LLM APIs for a real-world product feature. We have\n",
      "designed SLaM, an automated analysis tool that enables the quantitative and\n",
      "qualitative testing of product features utilizing arbitrary SLMs. Using SLaM,\n",
      "we examine both the quality and the performance characteristics of modern SLMs\n",
      "relative to an existing customer-facing OpenAI-based implementation. We find\n",
      "that across 9 SLMs and 29 variants, we observe competitive quality-of-results\n",
      "for our use case, significant performance consistency improvement, and a cost\n",
      "reduction of 5x-29x when compared to OpenAI GPT-4.\n",
      "Publication Date: 2023-12-20 19:27:59+00:00\n",
      "Link: http://arxiv.org/abs/2312.14972v1\n",
      "\n",
      "\n",
      "Title: APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation\n",
      "Authors: [arxiv.Result.Author('Chengran Yang'), arxiv.Result.Author('Jiakun Liu'), arxiv.Result.Author('Bowen Xu'), arxiv.Result.Author('Christoph Treude'), arxiv.Result.Author('Yunbo Lyu'), arxiv.Result.Author('Ming Li'), arxiv.Result.Author('David Lo')]\n",
      "Abstract: API documentation is often the most trusted resource for programming. Many\n",
      "approaches have been proposed to augment API documentation by summarizing\n",
      "complementary information from external resources such as Stack Overflow.\n",
      "Existing extractive-based summarization approaches excel in producing faithful\n",
      "summaries that accurately represent the source content without input length\n",
      "restrictions. Nevertheless, they suffer from inherent readability limitations.\n",
      "On the other hand, our empirical study on the abstractive-based summarization\n",
      "method, i.e., GPT-4, reveals that GPT-4 can generate coherent and concise\n",
      "summaries but presents limitations in terms of informativeness and\n",
      "faithfulness.\n",
      "  We introduce APIDocBooster, an extract-then-abstract framework that\n",
      "seamlessly fuses the advantages of both extractive (i.e., enabling faithful\n",
      "summaries without length limitation) and abstractive summarization (i.e.,\n",
      "producing coherent and concise summaries). APIDocBooster consists of two\n",
      "stages: (1) \\textbf{C}ontext-aware \\textbf{S}entence \\textbf{S}ection\n",
      "\\textbf{C}lassification (CSSC) and (2) \\textbf{UP}date \\textbf{SUM}marization\n",
      "(UPSUM). CSSC classifies API-relevant information collected from multiple\n",
      "sources into API documentation sections. UPSUM first generates extractive\n",
      "summaries distinct from the original API documentation and then generates\n",
      "abstractive summaries guided by extractive summaries through in-context\n",
      "learning.\n",
      "  To enable automatic evaluation of APIDocBooster, we construct the first\n",
      "dataset for API document augmentation. Our automatic evaluation results reveal\n",
      "that each stage in APIDocBooster outperforms its baselines by a large margin.\n",
      "Our human evaluation also demonstrates the superiority of APIDocBooster over\n",
      "GPT-4 and shows that it improves informativeness, relevance, and faithfulness\n",
      "by 13.89\\%, 15.15\\%, and 30.56\\%, respectively.\n",
      "Publication Date: 2023-12-18 05:15:50+00:00\n",
      "Link: http://arxiv.org/abs/2312.10934v1\n",
      "\n",
      "\n",
      "Title: A Comparative Analysis of Large Language Models for Code Documentation Generation\n",
      "Authors: [arxiv.Result.Author('Shubhang Shekhar Dvivedi'), arxiv.Result.Author('Vyshnav Vijay'), arxiv.Result.Author('Sai Leela Rahul Pujari'), arxiv.Result.Author('Shoumik Lodh'), arxiv.Result.Author('Dhruv Kumar')]\n",
      "Abstract: This paper presents a comprehensive comparative analysis of Large Language\n",
      "Models (LLMs) for generation of code documentation. Code documentation is an\n",
      "essential part of the software writing process. The paper evaluates models such\n",
      "as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like\n",
      "Accuracy, Completeness, Relevance, Understandability, Readability and Time\n",
      "Taken for different levels of code documentation. Our evaluation employs a\n",
      "checklist-based system to minimize subjectivity, providing a more objective\n",
      "assessment. We find that, barring Starchat, all LLMs consistently outperform\n",
      "the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and\n",
      "Bard exhibit superior performance across various parameters compared to\n",
      "open-source/source-available LLMs, namely LLama 2 and StarChat. Considering the\n",
      "time taken for generation, GPT-4 demonstrated the longest duration, followed by\n",
      "Llama2, Bard, with ChatGPT and Starchat having comparable generation times.\n",
      "Additionally, file level documentation had a considerably worse performance\n",
      "across all parameters (except for time taken) as compared to inline and\n",
      "function level documentation.\n",
      "Publication Date: 2023-12-16 06:40:09+00:00\n",
      "Link: http://arxiv.org/abs/2312.10349v1\n",
      "\n",
      "\n",
      "Title: Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs\n",
      "Authors: [arxiv.Result.Author('Mia Mohammad Imran'), arxiv.Result.Author('Preetha Chatterjee'), arxiv.Result.Author('Kostadin Damevski')]\n",
      "Abstract: Understanding and identifying the causes behind developers' emotions (e.g.,\n",
      "Frustration caused by `delays in merging pull requests') can be crucial towards\n",
      "finding solutions to problems and fostering collaboration in open-source\n",
      "communities. Effectively identifying such information in the high volume of\n",
      "communications across the different project channels, such as chats, emails,\n",
      "and issue comments, requires automated recognition of emotions and their\n",
      "causes. To enable this automation, large-scale software engineering-specific\n",
      "datasets that can be used to train accurate machine learning models are\n",
      "required. However, such datasets are expensive to create with the variety and\n",
      "informal nature of software projects' communication channels.\n",
      "  In this paper, we explore zero-shot LLMs that are pre-trained on massive\n",
      "datasets but without being fine-tuned specifically for the task of detecting\n",
      "emotion causes in software engineering: ChatGPT, GPT-4, and flan-alpaca. Our\n",
      "evaluation indicates that these recently available models can identify emotion\n",
      "categories when given detailed emotions, although they perform worse than the\n",
      "top-rated models. For emotion cause identification, our results indicate that\n",
      "zero-shot LLMs are effective at recognizing the correct emotion cause with a\n",
      "BLEU-2 score of 0.598. To highlight the potential use of these techniques, we\n",
      "conduct a case study of the causes of Frustration in the last year of\n",
      "development of a popular open-source project, revealing several interesting\n",
      "insights.\n",
      "Publication Date: 2023-12-15 12:16:16+00:00\n",
      "Link: http://arxiv.org/abs/2312.09731v1\n",
      "\n",
      "\n",
      "Title: Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models\n",
      "Authors: [arxiv.Result.Author('Xin Jin'), arxiv.Result.Author('Jonathan Larson'), arxiv.Result.Author('Weiwei Yang'), arxiv.Result.Author('Zhiqiang Lin')]\n",
      "Abstract: Binary code summarization, while invaluable for understanding code semantics,\n",
      "is challenging due to its labor-intensive nature. This study delves into the\n",
      "potential of large language models (LLMs) for binary code comprehension. To\n",
      "this end, we present BinSum, a comprehensive benchmark and dataset of over 557K\n",
      "binary functions and introduce a novel method for prompt synthesis and\n",
      "optimization. To more accurately gauge LLM performance, we also propose a new\n",
      "semantic similarity metric that surpasses traditional exact-match approaches.\n",
      "Our extensive evaluation of prominent LLMs, including ChatGPT, GPT-4, Llama 2,\n",
      "and Code Llama, reveals 10 pivotal insights. This evaluation generates 4\n",
      "billion inference tokens, incurred a total expense of 11,418 US dollars and 873\n",
      "NVIDIA A100 GPU hours. Our findings highlight both the transformative potential\n",
      "of LLMs in this field and the challenges yet to be overcome.\n",
      "Publication Date: 2023-12-15 08:32:28+00:00\n",
      "Link: http://arxiv.org/abs/2312.09601v1\n",
      "\n",
      "\n",
      "Title: E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification\n",
      "Authors: [arxiv.Result.Author('Yu Hao'), arxiv.Result.Author('Weiteng Chen'), arxiv.Result.Author('Ziqiao Zhou'), arxiv.Result.Author('Weidong Cui')]\n",
      "Abstract: Static analysis, the process of examining code without executing it, is\n",
      "crucial for identifying software issues. Yet, static analysis is hampered by\n",
      "its complexity and the need for customization for different targets.\n",
      "Traditional static analysis tools require extensive human effort and are often\n",
      "limited to specific target programs and programming languages. Recent\n",
      "advancements in Large Language Models (LLMs), such as GPT-4 and Llama, offer\n",
      "new capabilities for software engineering tasks. However, their application in\n",
      "static analysis, especially in understanding complex code structures, remains\n",
      "under-explored. This paper introduces a novel approach named E&V , which\n",
      "leverages LLMs to perform static analysis. Specifically, E&V employs LLMs to\n",
      "simulate the execution of pseudo-code, effectively conducting static analysis\n",
      "encoded in the pseudo-code with minimal human effort, thereby improving the\n",
      "accuracy of results. E&V includes a verification process for pseudo-code\n",
      "execution without needing an external oracle. This process allows E&V to\n",
      "mitigate hallucinations of LLMs and enhance the accuracy of static analysis\n",
      "results. We have implemented E&V in a prototype tool designed for triaging\n",
      "crashes through backward taint analysis. This prototype, paired with GPT-4-32k,\n",
      "has been applied to triage 170 recently fixed Linux kernel bugs across seven\n",
      "bug categories. Our experiments demonstrate that the prototype correctly\n",
      "identifies the blamed function in 81.2% of the cases. Additionally, we observe\n",
      "that our novel verification process significantly improves the accuracy,\n",
      "increasing it from 28.2% to 81.2%.\n",
      "Publication Date: 2023-12-13 19:31:00+00:00\n",
      "Link: http://arxiv.org/abs/2312.08477v1\n",
      "\n",
      "\n",
      "Title: GPT-4 and Safety Case Generation: An Exploratory Analysis\n",
      "Authors: [arxiv.Result.Author('Mithila Sivakumar'), arxiv.Result.Author('Alvine Boaye Belle'), arxiv.Result.Author('Jinjun Shan'), arxiv.Result.Author('Kimya Khakzad Shahandashti')]\n",
      "Abstract: In the ever-evolving landscape of software engineering, the emergence of\n",
      "large language models (LLMs) and conversational interfaces, exemplified by\n",
      "ChatGPT, is nothing short of revolutionary. While their potential is undeniable\n",
      "across various domains, this paper sets out on a captivating expedition to\n",
      "investigate their uncharted territory, the exploration of generating safety\n",
      "cases. In this paper, our primary objective is to delve into the existing\n",
      "knowledge base of GPT-4, focusing specifically on its understanding of the Goal\n",
      "Structuring Notation (GSN), a well-established notation allowing to visually\n",
      "represent safety cases. Subsequently, we perform four distinct experiments with\n",
      "GPT-4. These experiments are designed to assess its capacity for generating\n",
      "safety cases within a defined system and application domain. To measure the\n",
      "performance of GPT-4 in this context, we compare the results it generates with\n",
      "ground-truth safety cases created for an X-ray system system and a\n",
      "Machine-Learning (ML)-enabled component for tire noise recognition (TNR) in a\n",
      "vehicle. This allowed us to gain valuable insights into the model's generative\n",
      "capabilities. Our findings indicate that GPT-4 demonstrates the capacity to\n",
      "produce safety arguments that are moderately accurate and reasonable.\n",
      "Furthermore, it exhibits the capability to generate safety cases that closely\n",
      "align with the semantic content of the reference safety cases used as\n",
      "ground-truths in our experiments.\n",
      "Publication Date: 2023-12-09 22:28:48+00:00\n",
      "Link: http://arxiv.org/abs/2312.05696v1\n",
      "\n",
      "\n",
      "Title: Exploring the Limits of ChatGPT in Software Security Applications\n",
      "Authors: [arxiv.Result.Author('Fangzhou Wu'), arxiv.Result.Author('Qingzhao Zhang'), arxiv.Result.Author('Ati Priya Bajaj'), arxiv.Result.Author('Tiffany Bao'), arxiv.Result.Author('Ning Zhang'), arxiv.Result.Author('Ruoyu \"Fish\" Wang'), arxiv.Result.Author('Chaowei Xiao')]\n",
      "Abstract: Large language models (LLMs) have undergone rapid evolution and achieved\n",
      "remarkable results in recent times. OpenAI's ChatGPT, backed by GPT-3.5 or\n",
      "GPT-4, has gained instant popularity due to its strong capability across a wide\n",
      "range of tasks, including natural language tasks, coding, mathematics, and\n",
      "engaging conversations. However, the impacts and limits of such LLMs in system\n",
      "security domain are less explored. In this paper, we delve into the limits of\n",
      "LLMs (i.e., ChatGPT) in seven software security applications including\n",
      "vulnerability detection/repair, debugging, debloating, decompilation, patching,\n",
      "root cause analysis, symbolic execution, and fuzzing. Our exploration reveals\n",
      "that ChatGPT not only excels at generating code, which is the conventional\n",
      "application of language models, but also demonstrates strong capability in\n",
      "understanding user-provided commands in natural languages, reasoning about\n",
      "control and data flows within programs, generating complex data structures, and\n",
      "even decompiling assembly code. Notably, GPT-4 showcases significant\n",
      "improvements over GPT-3.5 in most security tasks. Also, certain limitations of\n",
      "ChatGPT in security-related tasks are identified, such as its constrained\n",
      "ability to process long code contexts.\n",
      "Publication Date: 2023-12-08 03:02:37+00:00\n",
      "Link: http://arxiv.org/abs/2312.05275v1\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mML_Paper_Summarization_Specialist\u001b[0m (to chat_manager):\n",
      "\n",
      "Based on the recent papers found on arXiv, here are the potential applications of GPT-4 in software:\n",
      "\n",
      "1. **Autonomous Tool Extension for LLM-based Agents**:\n",
      "   - Paper: \"GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension\"\n",
      "   - Application: GitAgent demonstrates the use of GPT-4 to autonomously integrate GitHub repositories as tools in response to user queries, enhancing the capabilities of LLM-based agents in software development.\n",
      "\n",
      "2. **Hardware and Software Co-Design for DNN Accelerator Parallelism**:\n",
      "   - Paper: \"DEAP: Design Space Exploration for DNN Accelerator Parallelism\"\n",
      "   - Application: GPT-4 is used to simulate model parallelism techniques in a multi-accelerator simulation framework, aiding in the design of customized hardware systems for specific LLM workloads.\n",
      "\n",
      "3. **Cost-Benefit Analysis of Replacing Proprietary LLMs with Open Source SLMs**:\n",
      "   - Paper: \"Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's GPT-4 with Self-Hosted Open Source SLMs in Production\"\n",
      "   - Application: The paper presents a systematic evaluation of replacing GPT-4 with open source small language models (SLMs) for AI-enabled product features, focusing on quality, performance, and cost.\n",
      "\n",
      "4. **Augmenting API Documentation**:\n",
      "   - Paper: \"APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation\"\n",
      "   - Application: APIDocBooster uses GPT-4 to augment API documentation by summarizing information from multiple sources, improving informativeness, relevance, and faithfulness of API docs.\n",
      "\n",
      "5. **Code Documentation Generation**:\n",
      "   - Paper: \"A Comparative Analysis of Large Language Models for Code Documentation Generation\"\n",
      "   - Application: GPT-4 is evaluated for its ability to generate code documentation, showing superior performance in creating accurate, complete, and understandable documentation.\n",
      "\n",
      "6. **Emotion Cause Identification in Developer Communication**:\n",
      "   - Paper: \"Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs\"\n",
      "   - Application: GPT-4 is used to identify the causes behind developers' emotions in project communications, aiding in problem-solving and collaboration in open-source communities.\n",
      "\n",
      "7. **Binary Code Summarization**:\n",
      "   - Paper: \"Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models\"\n",
      "   - Application: GPT-4 is benchmarked for its ability to summarize binary code, facilitating the understanding of code semantics and aiding in code comprehension tasks.\n",
      "\n",
      "8. **Static Analysis by Pseudo-code Execution and Verification**:\n",
      "   - Paper: \"E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification\"\n",
      "   - Application: GPT-4 is prompted to simulate the execution of pseudo-code for static analysis, improving the accuracy of results and reducing the need for extensive human effort.\n",
      "\n",
      "9. **Safety Case Generation**:\n",
      "   - Paper: \"GPT-4 and Safety Case Generation: An Exploratory Analysis\"\n",
      "   - Application: GPT-4 is explored for its ability to generate safety cases using the Goal Structuring Notation (GSN), potentially aiding in the creation of safety arguments for software systems.\n",
      "\n",
      "10. **Software Security Applications**:\n",
      "    - Paper: \"Exploring the Limits of ChatGPT in Software Security Applications\"\n",
      "    - Application: GPT-4 is assessed for its capabilities in various software security tasks, including vulnerability detection, debugging, and patching, showcasing its potential to aid in system security.\n",
      "\n",
      "These summaries reflect the diverse applications of GPT-4 in software, ranging from tool integration and API documentation to code summarization and security applications. The papers indicate that GPT-4 can significantly enhance various aspects of software development and maintenance.\n",
      "\n",
      "TERMINATE\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "start_task(\n",
    "    execution_task=\"Find a recent paper about gpt-4 on arxiv and find its potential applications in software.\",\n",
    "    agent_list=agent_list,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22a30e4b4297edd1",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Step 6 (Optional): clear all agents and prepare for the next task\n",
    "You can clear all agents generated in this task by the following code if your task is completed or the next task is largely different from the current task. If the agent's backbone is an open-source LLM, this process will also shut down the endpoint server. If necessary, you can use `recycle_endpoint=False` to retain the previous open-source LLMs' endpoint server."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7fb0bfff01dd1330",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:25:56.622194800Z",
     "start_time": "2024-01-01T10:25:56.610592300Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All agents have been cleared.\n"
     ]
    }
   ],
   "source": [
    "builder.clear_all_agents(recycle_endpoint=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbb098638a086898",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Save & load configs\n",
    "\n",
    "You can save all necessary information of the built group chat agents. Here is a case for those agents generated in the above task:\n",
    "```json\n",
    "{\n",
    "    \"building_task\": \"Generate some agents that can find papers on arxiv by programming and analyzing them in specific domains related to computer science and medical science.\",\n",
    "    \"agent_configs\": [\n",
    "        {\n",
    "            \"name\": \"ArXiv_Data_Scraper_Developer\",\n",
    "            \"model\": \"gpt-4-1106-preview\",\n",
    "            \"system_message\": \"You are now in a group chat. You need to complete a task with other participants. As an ArXiv_Data_Scraper_Developer, your focus is to create and refine tools capable of intelligent search and data extraction from arXiv, honing in on topics within the realms of computer science and medical science. Utilize your proficiency in Python programming to design scripts that navigate, query, and parse information from the platform, generating valuable insights and datasets for analysis. \\n\\nDuring your mission, it\\u2019s not just about formulating queries; your role encompasses the optimization and precision of the data retrieval process, ensuring relevance and accuracy of the information extracted. If you encounter an issue with a script or a discrepancy in the expected output, you are encouraged to troubleshoot and offer revisions to the code you find in the group chat.\\n\\nWhen you reach a point where the existing codebase does not fulfill task requirements or if the operation of provided code is unclear, you should ask for help from the group chat manager. They will facilitate your advancement by providing guidance or appointing another participant to assist you. Your ability to adapt and enhance scripts based on peer feedback is critical, as the dynamic nature of data scraping demands ongoing refinement of techniques and approaches.\\n\\nWrap up your participation by confirming the user's need has been satisfied with the data scraping solutions you've provided. Indicate the completion of your task by replying \\\"TERMINATE\\\" in the group chat.\",\n",
    "            \"description\": \"ArXiv_Data_Scraper_Developer is a specialized software development role requiring proficiency in Python, including familiarity with web scraping libraries such as BeautifulSoup or Scrapy, and a solid understanding of APIs and data parsing. They must possess the ability to identify and correct errors in existing scripts and confidently engage in technical discussions to improve data retrieval processes. The role also involves a critical eye for troubleshooting and optimizing code to ensure efficient data extraction from the ArXiv platform for research and analysis purposes.\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"Computer_Science_Research_Analyst\",\n",
    "            \"model\": \"gpt-4-1106-preview\",\n",
    "            \"system_message\": \"You are now in a group chat. You need to complete a task with other participants. As a Computer Science Research Analyst, your objective is to utilize your analytical capabilities to identify and examine scholarly articles on arXiv, focusing on areas bridging computer science and medical science. Employ Python for automation where appropriate and leverage your expertise in the subject matter to draw insights from the research.\\n\\nEnsure that the information is acquired systematically; tap into online databases, interpret data sets, and perform literature reviews to pinpoint relevant findings. Should you encounter a complex problem or if you find your progress stalled, feel free to question the existing approaches discussed in the chat or contribute an improved method or analysis.\\n\\nIf the task proves to be beyond your current means or if you face uncertainty at any stage, seek assistance from the group chat manager. The manager is available to provide guidance or to involve another expert if necessary to move forward effectively.\\n\\nYour contributions are crucial, and it is important to communicate your findings and conclusions clearly. Once you believe the task is complete and the group's need has been satisfied, please affirm the completion by replying \\\"TERMINATE\\\".\",\n",
    "            \"description\": \"Computer_Science_Research_Analyst is a role requiring strong analytical skills, a deep understanding of computer science concepts, and proficiency in Python for data analysis and automation. This position should have the ability to critically assess the validity of information, challenge assumptions, and provide evidence-based corrections or alternatives. They should also have excellent communication skills to articulate their findings and suggestions effectively within the group chat.\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"Medical_Science_Research_Analyst\",\n",
    "            \"model\": \"gpt-4-1106-preview\",\n",
    "            \"system_message\": \"You are now in a group chat. You need to complete a task with other participants. As a Medical_Science_Research_Analyst, your function is to harness your analytical strengths and understanding of medical research to source and evaluate pertinent papers from the arXiv database, focusing on the intersection of computer science and medical science. Utilize your Python programming skills to automate data retrieval and analysis tasks. Engage in systematic data mining to extract relevant content, then apply your analytical expertise to interpret the findings qualitatively. \\n\\nWhen there is a requirement to gather information, employ Python scripts to automate the aggregation process. This could include scraping web data, retrieving and processing documents, and performing content analyses. When these scripts produce outputs, use your subject matter expertise to evaluate the results. \\n\\nProgress through your task step by step. When an explicit plan is absent, present a structured outline of your intended methodology. Clarify which segments of the task are handled through automation, and which necessitate your interpretative skills. \\n\\nIn the event code is utilized, the script type must be specified. You are expected to execute the scripts provided without making changes. Scripts are to be complete and functionally standalone. Should you encounter an error upon execution, critically review the output, and if needed, present a revised script for the task at hand. \\n\\nFor tasks that require saving and executing scripts, indicate the intended filename at the beginning of the script. \\n\\nMaintain clear communication of the results by harnessing the 'print' function where applicable. If an error arises or a task remains unsolved after successful code execution, regroup to collect additional information, reassess your approach, and explore alternative strategies. \\n\\nUpon reaching a conclusion, substantiate your findings with credible evidence where possible.\\n\\nConclude your participation by confirming the task's completion with a \\\"TERMINATE\\\" response.\\n\\nShould uncertainty arise at any point, seek guidance from the group chat manager for further directives or reassignment of the task.\",\n",
    "            \"description\": \"The Medical Science Research Analyst is a professionally trained individual with strong analytical skills, specializing in interpreting and evaluating scientific research within the medical field. They should possess expertise in data analysis, likely with proficiency in Python for analyzing datasets, and have the ability to critically assess the validity and relevance of previous messages or findings relayed in the group chat. This role requires a solid foundation in medical knowledge to provide accurate and evidence-based corrections or insights.\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"Data_Analysis_Engineer\",\n",
    "            \"model\": \"gpt-4-1106-preview\",\n",
    "            \"system_message\": \"You are now in a group chat. You need to complete a task with other participants. As a Data Analysis Engineer, your role involves leveraging your analytical skills to gather, process, and analyze large datasets. You will employ various data analysis techniques and tools, particularly Python for scripting, to extract insights from the data related to computer science and medical science domains on arxiv.\\n\\nIn scenarios where information needs to be collected or analyzed, you will develop Python scripts to automate the data retrieval and processing tasks. For example, you may write scripts to scrape the arXiv website, parse metadata of research papers, filter content based on specific criteria, and perform statistical analysis or data visualization. \\n\\nYour workflow will include the following steps:\\n\\n1. Use your Python coding abilities to design scripts for data extraction and analysis. This can involve browsing or searching the web, downloading and reading files, or printing the content of web pages or files relevant to the given domains.\\n2. After gathering the necessary data, apply your data analysis expertise to derive meaningful insights or patterns present in the data. This should be done methodically, making the most of your Python skills for data manipulation and interpretation.\\n3. Communicate your findings clearly to the group chat. Ensure the results are straightforward for others to understand and act upon.\\n4. If any issues arise from executing the code, such as lack of output or unexpected results, you can question the previous messages or code in the group chat and attempt to provide a corrected script or analysis.\\n5. When uncertain or facing a complex problem that you cannot solve alone, ask for assistance from the group chat manager. They can either provide guidance or assign another participant to help you.\\n\\nOnce you believe the task is completed satisfactorily, and you have fulfilled the user's need, respond with \\\"TERMINATE\\\" to signify the end of your contribution to the task. Remember, while technical proficiency in Python is essential for this role, the ability to work collaboratively within the group chat, communicate effectively, and adapt to challenges is equally important.\",\n",
    "            \"description\": \"Data_Analysis_Engineer is a professional adept in collecting, analyzing, and interpreting large datasets, using statistical tools and machine learning techniques to provide actionable insights. They should possess strong Python coding skills for data manipulation and analysis, an understanding of database management, as well as the ability to communicate complex results effectively to non-technical stakeholders. This position should be allowed to speak when data-driven clarity is needed or when existing analyses or methodologies are called into question.\"\n",
    "        },\n",
    "        {\n",
    "            \"name\": \"ML_Paper_Summarization_Specialist\",\n",
    "            \"model\": \"gpt-4-1106-preview\",\n",
    "            \"system_message\": \"You are now in a group chat. You need to complete a task with other participants. As an ML_Paper_Summarization_Specialist, your role entails leveraging machine learning techniques to extract and analyze academic papers from arXiv, focusing on domains that intersect computer science and medical science. Utilize your expertise in natural language processing and data analysis to identify relevant papers, extract key insights, and generate summaries that accurately reflect the advancements and findings within those papers.\\n\\nYou are expected to apply your deep understanding of machine learning algorithms, data mining, and information retrieval to construct models and systems that can efficiently process and interpret scientific literature.\\n\\nIf you encounter any challenges in accessing papers, parsing content, or algorithmic processing, you may seek assistance by presenting your issue to the group chat. Should there be a disagreement regarding the efficacy of a method or the accuracy of a summarization, you are encouraged to critically evaluate previous messages or outputs and offer improved solutions to enhance the group's task performance.\\n\\nShould confusion arise during the task, rather than relying on coding scripts, please request guidance from the group chat manager, and allow them to facilitate the necessary support by inviting another participant who can aid in overcoming the current obstacle.\\n\\nRemember, your primary duty is to synthesize complex academic content into concise, accessible summaries that will serve as a valuable resource for researchers and professionals seeking to stay abreast of the latest developments in their respective fields. \\n\\nOnce you believe your task is completed and the summaries provided meet the necessary standards of accuracy and comprehensiveness, reply \\\"TERMINATE\\\" to signal the end of your contribution to the group's task.\",\n",
    "            \"description\": \"The ML_Paper_Summarization_Specialist is a professional adept in machine learning concepts and current research trends, with strong analytical skills to critically evaluate information, synthesizing knowledge from academic papers into digestible summaries. This specialist should be proficient in Python for text processing and have the ability to provide constructive feedback on technical discussions, guide effective implementation, and correct misconceptions or errors related to machine learning theory and practice in the chat. They should be a reliable resource for clarifying complex information and ensuring accurate application of machine learning techniques within the group chat context.\"\n",
    "        }\n",
    "    ],\n",
    "    \"coding\": true,\n",
    "    \"default_llm_config\": {\n",
    "        \"temperature\": 0\n",
    "    },\n",
    "    \"code_execution_config\": {\n",
    "        \"work_dir\": \"groupchat\",\n",
    "        \"use_docker\": false,\n",
    "        \"timeout\": 60,\n",
    "        \"last_n_messages\": 2\n",
    "    }\n",
    "}\n",
    "```\n",
    "These information will be saved in JSON format. You can provide a specific filename, otherwise, AgentBuilder will save config to the current path with a generated filename 'save_config_TASK_MD5.json'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e4b88a5d482ceba4",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:25:56.983244800Z",
     "start_time": "2024-01-01T10:25:56.938459500Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Building config saved to ./save_config_c52224ebd16a2e60b348f3f04ac15e79.json\n"
     ]
    }
   ],
   "source": [
    "saved_path = builder.save()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a35620c10ee42be",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "After that, you can load the saved config and skip the building process. AgentBuilder will create agents with those information without prompting the builder manager."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "34addd498e5ab174",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:30:23.592045Z",
     "start_time": "2024-01-01T10:29:18.977259500Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading config from ./save_config_c52224ebd16a2e60b348f3f04ac15e79.json\n",
      "==> Creating agents...\n",
      "Creating agent ArXiv_Data_Scraper_Developer with backbone gpt-4-1106-preview...\n",
      "Creating agent Computer_Science_Research_Analyst with backbone gpt-4-1106-preview...\n",
      "Creating agent Medical_Science_Research_Analyst with backbone gpt-4-1106-preview...\n",
      "Creating agent Data_Analysis_Engineer with backbone gpt-4-1106-preview...\n",
      "Creating agent ML_Paper_Summarization_Specialist with backbone gpt-4-1106-preview...\n",
      "Adding user console proxy...\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "Find a recent paper about LLaVA on arxiv and find its potential applications in computer vision.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mArXiv_Data_Scraper_Developer\u001b[0m (to chat_manager):\n",
      "\n",
      "To find a recent paper about LLaVA on arXiv and its potential applications in computer vision, we'll need to perform a search on the arXiv API. I'll write a Python script that uses the `arxiv` library to query the arXiv database for papers related to \"LLaVA\" and \"computer vision\". If the `arxiv` library is not available, we can use the `requests` library to make a direct HTTP request to the arXiv API.\n",
      "\n",
      "First, let's try using the `arxiv` library. If you don't have it installed, you can install it using `pip install arxiv`.\n",
      "\n",
      "Here's a Python script that performs the search:\n",
      "\n",
      "```python\n",
      "import arxiv\n",
      "\n",
      "# Define the search query\n",
      "search_query = 'all:\"LLaVA\" AND cat:cs.CV'\n",
      "\n",
      "# Search arXiv for papers matching the query\n",
      "search = arxiv.Search(\n",
      "  query = search_query,\n",
      "  max_results = 10,\n",
      "  sort_by = arxiv.SortCriterion.SubmittedDate\n",
      ")\n",
      "\n",
      "# Fetch the results\n",
      "for result in search.results():\n",
      "    print(\"Title:\", result.title)\n",
      "    print(\"Authors:\", result.authors)\n",
      "    print(\"Abstract:\", result.summary)\n",
      "    print(\"Submitted Date:\", result.published)\n",
      "    print(\"URL:\", result.entry_id)\n",
      "    print(\"Potential Applications in Computer Vision:\", \"TBD\")  # Placeholder for manual analysis\n",
      "    print(\"\\n\")\n",
      "```\n",
      "\n",
      "This script will print out the title, authors, abstract, submission date, and URL for up to 10 recent papers related to LLaVA in the field of computer vision. The potential applications in computer vision will need to be determined from the abstract or by reading the paper, as this information is not directly available from the metadata.\n",
      "\n",
      "If you encounter any issues with the script or if you need further assistance, please let me know.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[31m\n",
      ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "Title: A Simple LLM Framework for Long-Range Video Question-Answering\n",
      "Authors: [arxiv.Result.Author('Ce Zhang'), arxiv.Result.Author('Taixi Lu'), arxiv.Result.Author('Md Mohaiminul Islam'), arxiv.Result.Author('Ziyang Wang'), arxiv.Result.Author('Shoubin Yu'), arxiv.Result.Author('Mohit Bansal'), arxiv.Result.Author('Gedas Bertasius')]\n",
      "Abstract: We present LLoVi, a language-based framework for long-range video\n",
      "question-answering (LVQA). Unlike prior long-range video understanding methods,\n",
      "which are often costly and require specialized long-range video modeling design\n",
      "(e.g., memory queues, state-space layers, etc.), our approach uses a\n",
      "frame/clip-level visual captioner (e.g., BLIP2, LaViLa, LLaVA) coupled with a\n",
      "Large Language Model (GPT-3.5, GPT-4) leading to a simple yet surprisingly\n",
      "effective LVQA framework. Specifically, we decompose short and long-range\n",
      "modeling aspects of LVQA into two stages. First, we use a short-term visual\n",
      "captioner to generate textual descriptions of short video clips (0.5-8s in\n",
      "length) densely sampled from a long input video. Afterward, an LLM aggregates\n",
      "the densely extracted short-term captions to perform long-range temporal\n",
      "reasoning needed to understand the whole video and answer a question. To\n",
      "analyze what makes our simple framework so effective, we thoroughly evaluate\n",
      "various components of our system. Our empirical analysis reveals that the\n",
      "choice of the visual captioner and LLM is critical for good LVQA performance.\n",
      "Furthermore, we show that a specialized prompt that asks the LLM first to\n",
      "summarize the noisy short-term visual captions and then answer a given input\n",
      "question leads to a significant LVQA performance boost. On EgoSchema, which is\n",
      "best known as a very long-form video question-answering benchmark, our method\n",
      "achieves 50.3% accuracy, outperforming the previous best-performing approach by\n",
      "18.1% (absolute gain). In addition, our approach outperforms the previous\n",
      "state-of-the-art by 4.1% and 3.1% on NeXT-QA and IntentQA. We also extend LLoVi\n",
      "to grounded LVQA and show that it outperforms all prior methods on the NeXT-GQA\n",
      "dataset. We will release our code at https://github.com/CeeZh/LLoVi.\n",
      "Submitted Date: 2023-12-28 18:58:01+00:00\n",
      "URL: http://arxiv.org/abs/2312.17235v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones\n",
      "Authors: [arxiv.Result.Author('Zhengqing Yuan'), arxiv.Result.Author('Zhaoxu Li'), arxiv.Result.Author('Lichao Sun')]\n",
      "Abstract: In the era of advanced multimodel learning, multimodal large language models\n",
      "(MLLMs) such as GPT-4V have made remarkable strides towards bridging language\n",
      "and visual elements. However, the closed-source nature and considerable\n",
      "computational demand present notable challenges for universal usage and\n",
      "modifications. This is where open-source MLLMs like LLaVA and MiniGPT-4 come\n",
      "in, presenting groundbreaking achievements across tasks. Despite these\n",
      "accomplishments, computational efficiency remains an unresolved issue, as these\n",
      "models, like LLaVA-v1.5-13B, require substantial resources. Addressing these\n",
      "issues, we introduce TinyGPT-V, a new-wave model marrying impressive\n",
      "performance with commonplace computational capacity. It stands out by requiring\n",
      "merely a 24G GPU for training and an 8G GPU or CPU for inference. Built upon\n",
      "Phi-2, TinyGPT-V couples an effective language backbone with pre-trained vision\n",
      "modules from BLIP-2 or CLIP. TinyGPT-V's 2.8B parameters can undergo a unique\n",
      "quantisation process, suitable for local deployment and inference tasks on 8G\n",
      "various devices. Our work fosters further developments for designing\n",
      "cost-effective, efficient, and high-performing MLLMs, expanding their\n",
      "applicability in a broad array of real-world scenarios. Furthermore this paper\n",
      "proposed a new paradigm of Multimodal Large Language Model via small backbones.\n",
      "Our code and training weights are placed at:\n",
      "https://github.com/DLYuanGod/TinyGPT-V and\n",
      "https://huggingface.co/Tyrannosaurus/TinyGPT-V respectively.\n",
      "Submitted Date: 2023-12-28 07:11:41+00:00\n",
      "URL: http://arxiv.org/abs/2312.16862v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: Exploring Multimodal Large Language Models for Radiology Report Error-checking\n",
      "Authors: [arxiv.Result.Author('Jinge Wu'), arxiv.Result.Author('Yunsoo Kim'), arxiv.Result.Author('Eva C. Keller'), arxiv.Result.Author('Jamie Chow'), arxiv.Result.Author('Adam P. Levine'), arxiv.Result.Author('Nikolas Pontikos'), arxiv.Result.Author('Zina Ibrahim'), arxiv.Result.Author('Paul Taylor'), arxiv.Result.Author('Michelle C. Williams'), arxiv.Result.Author('Honghan Wu')]\n",
      "Abstract: This paper proposes one of the first clinical applications of multimodal\n",
      "large language models (LLMs) as an assistant for radiologists to check errors\n",
      "in their reports. We created an evaluation dataset from two real-world\n",
      "radiology datasets (MIMIC-CXR and IU-Xray), with 1,000 subsampled reports each.\n",
      "A subset of original reports was modified to contain synthetic errors by\n",
      "introducing various type of mistakes. The evaluation contained two difficulty\n",
      "levels: SIMPLE for binary error-checking and COMPLEX for identifying error\n",
      "types. LLaVA (Large Language and Visual Assistant) variant models, including\n",
      "our instruction-tuned model, were used for the evaluation. Additionally, a\n",
      "domain expert evaluation was conducted on a small test set. At the SIMPLE\n",
      "level, the LLaVA v1.5 model outperformed other publicly available models.\n",
      "Instruction tuning significantly enhanced performance by 47.4% and 25.4% on\n",
      "MIMIC-CXR and IU-Xray data, respectively. The model also surpassed the domain\n",
      "experts accuracy in the MIMIC-CXR dataset by 1.67%. Notably, among the subsets\n",
      "(N=21) of the test set where a clinician did not achieve the correct\n",
      "conclusion, the LLaVA ensemble mode correctly identified 71.4% of these cases.\n",
      "This study marks a promising step toward utilizing multi-modal LLMs to enhance\n",
      "diagnostic accuracy in radiology. The ensemble model demonstrated comparable\n",
      "performance to clinicians, even capturing errors overlooked by humans.\n",
      "Nevertheless, future work is needed to improve the model ability to identify\n",
      "the types of inconsistency.\n",
      "Submitted Date: 2023-12-20 15:20:33+00:00\n",
      "URL: http://arxiv.org/abs/2312.13103v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering\n",
      "Authors: [arxiv.Result.Author('Chun-Mei Feng'), arxiv.Result.Author('Yang Bai'), arxiv.Result.Author('Tao Luo'), arxiv.Result.Author('Zhen Li'), arxiv.Result.Author('Salman Khan'), arxiv.Result.Author('Wangmeng Zuo'), arxiv.Result.Author('Xinxing Xu'), arxiv.Result.Author('Rick Siow Mong Goh'), arxiv.Result.Author('Yong Liu')]\n",
      "Abstract: Albeit progress has been made in Composed Image Retrieval (CIR), we\n",
      "empirically find that a certain percentage of failure retrieval results are not\n",
      "consistent with their relative captions. To address this issue, this work\n",
      "provides a Visual Question Answering (VQA) perspective to boost the performance\n",
      "of CIR. The resulting VQA4CIR is a post-processing approach and can be directly\n",
      "plugged into existing CIR methods. Given the top-C retrieved images by a CIR\n",
      "method, VQA4CIR aims to decrease the adverse effect of the failure retrieval\n",
      "results being inconsistent with the relative caption. To find the retrieved\n",
      "images inconsistent with the relative caption, we resort to the \"QA generation\n",
      "to VQA\" self-verification pipeline. For QA generation, we suggest fine-tuning\n",
      "LLM (e.g., LLaMA) to generate several pairs of questions and answers from each\n",
      "relative caption. We then fine-tune LVLM (e.g., LLaVA) to obtain the VQA model.\n",
      "By feeding the retrieved image and question to the VQA model, one can find the\n",
      "images inconsistent with relative caption when the answer by VQA is\n",
      "inconsistent with the answer in the QA pair. Consequently, the CIR performance\n",
      "can be boosted by modifying the ranks of inconsistently retrieved images.\n",
      "Experimental results show that our proposed method outperforms state-of-the-art\n",
      "CIR methods on the CIRR and Fashion-IQ datasets.\n",
      "Submitted Date: 2023-12-19 15:56:08+00:00\n",
      "URL: http://arxiv.org/abs/2312.12273v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation\n",
      "Authors: [arxiv.Result.Author('Zhongyi Han'), arxiv.Result.Author('Guanglin Zhou'), arxiv.Result.Author('Rundong He'), arxiv.Result.Author('Jindong Wang'), arxiv.Result.Author('Tailin Wu'), arxiv.Result.Author('Yilong Yin'), arxiv.Result.Author('Salman Khan'), arxiv.Result.Author('Lina Yao'), arxiv.Result.Author('Tongliang Liu'), arxiv.Result.Author('Kun Zhang')]\n",
      "Abstract: In machine learning, generalization against distribution shifts -- where\n",
      "deployment conditions diverge from the training scenarios -- is crucial,\n",
      "particularly in fields like climate modeling, biomedicine, and autonomous\n",
      "driving. The emergence of foundation models, distinguished by their extensive\n",
      "pretraining and task versatility, has led to an increased interest in their\n",
      "adaptability to distribution shifts. GPT-4V(ision) acts as the most advanced\n",
      "publicly accessible multimodal foundation model, with extensive applications\n",
      "across various domains, including anomaly detection, video understanding, image\n",
      "generation, and medical diagnosis. However, its robustness against data\n",
      "distributions remains largely underexplored. Addressing this gap, this study\n",
      "rigorously evaluates GPT-4V's adaptability and generalization capabilities in\n",
      "dynamic environments, benchmarking against prominent models like CLIP and\n",
      "LLaVA. We delve into GPT-4V's zero-shot generalization across 13 diverse\n",
      "datasets spanning natural, medical, and molecular domains. We further\n",
      "investigate its adaptability to controlled data perturbations and examine the\n",
      "efficacy of in-context learning as a tool to enhance its adaptation. Our\n",
      "findings delineate GPT-4V's capability boundaries in distribution shifts,\n",
      "shedding light on its strengths and limitations across various scenarios.\n",
      "Importantly, this investigation contributes to our understanding of how AI\n",
      "foundation models generalize to distribution shifts, offering pivotal insights\n",
      "into their adaptability and robustness. Code is publicly available at\n",
      "https://github.com/jameszhou-gl/gpt-4v-distribution-shift.\n",
      "Submitted Date: 2023-12-12 16:48:07+00:00\n",
      "URL: http://arxiv.org/abs/2312.07424v2\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: Honeybee: Locality-enhanced Projector for Multimodal LLM\n",
      "Authors: [arxiv.Result.Author('Junbum Cha'), arxiv.Result.Author('Wooyoung Kang'), arxiv.Result.Author('Jonghwan Mun'), arxiv.Result.Author('Byungseok Roh')]\n",
      "Abstract: In Multimodal Large Language Models (MLLMs), a visual projector plays a\n",
      "crucial role in bridging pre-trained vision encoders with LLMs, enabling\n",
      "profound visual understanding while harnessing the LLMs' robust capabilities.\n",
      "Despite the importance of the visual projector, it has been relatively less\n",
      "explored. In this study, we first identify two essential projector properties:\n",
      "(i) flexibility in managing the number of visual tokens, crucial for MLLMs'\n",
      "overall efficiency, and (ii) preservation of local context from visual\n",
      "features, vital for spatial understanding. Based on these findings, we propose\n",
      "a novel projector design that is both flexible and locality-enhanced,\n",
      "effectively satisfying the two desirable properties. Additionally, we present\n",
      "comprehensive strategies to effectively utilize multiple and multifaceted\n",
      "instruction datasets. Through extensive experiments, we examine the impact of\n",
      "individual design choices. Finally, our proposed MLLM, Honeybee, remarkably\n",
      "outperforms previous state-of-the-art methods across various benchmarks,\n",
      "including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly\n",
      "higher efficiency. Code and models are available at\n",
      "https://github.com/kakaobrain/honeybee.\n",
      "Submitted Date: 2023-12-11 18:59:06+00:00\n",
      "URL: http://arxiv.org/abs/2312.06742v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models\n",
      "Authors: [arxiv.Result.Author('Haoran Wei'), arxiv.Result.Author('Lingyu Kong'), arxiv.Result.Author('Jinyue Chen'), arxiv.Result.Author('Liang Zhao'), arxiv.Result.Author('Zheng Ge'), arxiv.Result.Author('Jinrong Yang'), arxiv.Result.Author('Jianjian Sun'), arxiv.Result.Author('Chunrui Han'), arxiv.Result.Author('Xiangyu Zhang')]\n",
      "Abstract: Modern Large Vision-Language Models (LVLMs) enjoy the same vision vocabulary\n",
      "-- CLIP, which can cover most common vision tasks. However, for some special\n",
      "vision task that needs dense and fine-grained vision perception, e.g.,\n",
      "document-level OCR or chart understanding, especially in non-English scenarios,\n",
      "the CLIP-style vocabulary may encounter low efficiency in tokenizing the vision\n",
      "knowledge and even suffer out-of-vocabulary problem. Accordingly, we propose\n",
      "Vary, an efficient and effective method to scale up the vision vocabulary of\n",
      "LVLMs. The procedures of Vary are naturally divided into two folds: the\n",
      "generation and integration of a new vision vocabulary. In the first phase, we\n",
      "devise a vocabulary network along with a tiny decoder-only transformer to\n",
      "produce the desired vocabulary via autoregression. In the next, we scale up the\n",
      "vanilla vision vocabulary by merging the new one with the original one (CLIP),\n",
      "enabling the LVLMs can quickly garner new features. Compared to the popular\n",
      "BLIP-2, MiniGPT4, and LLaVA, Vary can maintain its vanilla capabilities while\n",
      "enjoying more excellent fine-grained perception and understanding ability.\n",
      "Specifically, Vary is competent in new document parsing features (OCR or\n",
      "markdown conversion) while achieving 78.2% ANLS in DocVQA and 36.2% in MMVet.\n",
      "Our code will be publicly available on the homepage.\n",
      "Submitted Date: 2023-12-11 04:26:17+00:00\n",
      "URL: http://arxiv.org/abs/2312.06109v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos\n",
      "Authors: [arxiv.Result.Author('Mehmet Saygin Seyfioglu'), arxiv.Result.Author('Wisdom O. Ikezogwo'), arxiv.Result.Author('Fatemeh Ghezloo'), arxiv.Result.Author('Ranjay Krishna'), arxiv.Result.Author('Linda Shapiro')]\n",
      "Abstract: The gigapixel scale of whole slide images (WSIs) poses a challenge for\n",
      "histopathology multi-modal chatbots, requiring a global WSI analysis for\n",
      "diagnosis, compounding evidence from different WSI patches. Current visual\n",
      "instruction datasets, generated through large language models, focus on\n",
      "creating question/answer pairs for individual image patches, which may lack\n",
      "diagnostic capacity on their own in histopathology, further complicated by the\n",
      "absence of spatial grounding in histopathology image captions. To bridge this\n",
      "gap, we introduce Quilt-Instruct, a large-scale dataset of 107,131\n",
      "histopathology-specific instruction question/answer pairs, that is collected by\n",
      "leveraging educational histopathology videos from YouTube, which provides\n",
      "spatial localization of captions by automatically extracting narrators' cursor\n",
      "movements. In addition, we provide contextual reasoning by extracting diagnosis\n",
      "and supporting facts from the entire video content to guide the extrapolative\n",
      "reasoning of GPT-4. Using Quilt-Instruct, we train Quilt-LLaVA, which can\n",
      "reason beyond the given single image patch, enabling diagnostic reasoning and\n",
      "the capability of spatial awareness. To evaluate Quilt-LLaVA, we propose a\n",
      "comprehensive evaluation dataset created from 985 images and 1283\n",
      "human-generated question-answers. We also thoroughly evaluate Quilt-LLaVA using\n",
      "public histopathology datasets, where Quilt-LLaVA significantly outperforms\n",
      "SOTA by over 10% on relative GPT-4 score and 4% and 9% on open and closed set\n",
      "VQA. Our code, data, and model are publicly available at quilt-llava.github.io.\n",
      "Submitted Date: 2023-12-07 23:16:37+00:00\n",
      "URL: http://arxiv.org/abs/2312.04746v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: Prompt Highlighter: Interactive Control for Multi-Modal LLMs\n",
      "Authors: [arxiv.Result.Author('Yuechen Zhang'), arxiv.Result.Author('Shengju Qian'), arxiv.Result.Author('Bohao Peng'), arxiv.Result.Author('Shu Liu'), arxiv.Result.Author('Jiaya Jia')]\n",
      "Abstract: This study targets a critical aspect of multi-modal LLMs' (LLMs&VLMs)\n",
      "inference: explicit controllable text generation. Multi-modal LLMs empower\n",
      "multi-modality understanding with the capability of semantic generation yet\n",
      "bring less explainability and heavier reliance on prompt contents due to their\n",
      "autoregressive generative nature. While manipulating prompt formats could\n",
      "improve outputs, designing specific and precise prompts per task can be\n",
      "challenging and ineffective. To tackle this issue, we introduce a novel\n",
      "inference method, Prompt Highlighter, which enables users to highlight specific\n",
      "prompt spans to interactively control the focus during generation. Motivated by\n",
      "the classifier-free diffusion guidance, we form regular and unconditional\n",
      "context pairs based on highlighted tokens, demonstrating that the\n",
      "autoregressive generation in models can be guided in a classifier-free way.\n",
      "Notably, we find that, during inference, guiding the models with highlighted\n",
      "tokens through the attention weights leads to more desired outputs. Our\n",
      "approach is compatible with current LLMs and VLMs, achieving impressive\n",
      "customized generation results without training. Experiments confirm its\n",
      "effectiveness in focusing on input contexts and generating reliable content.\n",
      "Without tuning on LLaVA-v1.5, our method secured 69.5 in the MMBench test and\n",
      "1552.5 in MME-perception. The code is available at:\n",
      "https://github.com/dvlab-research/Prompt-Highlighter/\n",
      "Submitted Date: 2023-12-07 13:53:29+00:00\n",
      "URL: http://arxiv.org/abs/2312.04302v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "Title: LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models\n",
      "Authors: [arxiv.Result.Author('Hao Zhang'), arxiv.Result.Author('Hongyang Li'), arxiv.Result.Author('Feng Li'), arxiv.Result.Author('Tianhe Ren'), arxiv.Result.Author('Xueyan Zou'), arxiv.Result.Author('Shilong Liu'), arxiv.Result.Author('Shijia Huang'), arxiv.Result.Author('Jianfeng Gao'), arxiv.Result.Author('Lei Zhang'), arxiv.Result.Author('Chunyuan Li'), arxiv.Result.Author('Jianwei Yang')]\n",
      "Abstract: With the recent significant advancements in large multi-modal models (LMMs),\n",
      "the importance of their grounding capability in visual chat is increasingly\n",
      "recognized. Despite recent efforts to enable LMMs to support grounding, their\n",
      "capabilities for grounding and chat are usually separate, and their chat\n",
      "performance drops dramatically when asked to ground. The problem is the lack of\n",
      "a dataset for grounded visual chat (GVC). Existing grounding datasets only\n",
      "contain short captions. To address this issue, we have created GVC data that\n",
      "allows for the combination of grounding and chat capabilities. To better\n",
      "evaluate the GVC capabilities, we have introduced a benchmark called\n",
      "Grounding-Bench. Additionally, we have proposed a model design that can support\n",
      "GVC and various types of visual prompts by connecting segmentation models with\n",
      "language models. Experimental results demonstrate that our model outperforms\n",
      "other LMMs on Grounding-Bench. Furthermore, our model achieves competitive\n",
      "performance on classic grounding benchmarks like RefCOCO/+/g and Flickr30K\n",
      "Entities. Our code will be released at\n",
      "https://github.com/UX-Decoder/LLaVA-Grounding .\n",
      "Submitted Date: 2023-12-05 18:29:31+00:00\n",
      "URL: http://arxiv.org/abs/2312.02949v1\n",
      "Potential Applications in Computer Vision: TBD\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mML_Paper_Summarization_Specialist\u001b[0m (to chat_manager):\n",
      "\n",
      "Based on the recent papers extracted from arXiv, here are the potential applications in computer vision for the LLaVA framework and related technologies:\n",
      "\n",
      "1. **Long-Range Video Question-Answering (LVQA)**: The LLoVi framework uses a visual captioner coupled with a Large Language Model to perform long-range temporal reasoning for understanding videos and answering questions. This can be applied to video understanding tasks such as video summarization and event detection.\n",
      "\n",
      "2. **Efficient Multimodal Large Language Models**: TinyGPT-V demonstrates the potential for efficient and cost-effective multimodal large language models that can be used for various computer vision tasks on devices with limited computational resources.\n",
      "\n",
      "3. **Radiology Report Error-checking**: LLaVA variant models are used to assist radiologists in checking errors in their reports, which can be applied to medical imaging and diagnostic accuracy enhancement.\n",
      "\n",
      "4. **Composed Image Retrieval (CIR)**: The VQA4CIR method uses a \"QA generation to VQA\" self-verification pipeline to improve the performance of CIR by identifying images inconsistent with their relative captions.\n",
      "\n",
      "5. **Adaptation to Distribution Shifts**: GPT-4V's adaptability and generalization capabilities in dynamic environments can be applied to anomaly detection, medical diagnosis, and other areas where robustness against data distribution shifts is crucial.\n",
      "\n",
      "6. **Locality-enhanced Projector for Multimodal LLMs**: The Honeybee model's projector design can be applied to tasks requiring spatial understanding and is efficient in managing the number of visual tokens.\n",
      "\n",
      "7. **Scaling up Vision Vocabulary for LVLMs**: Vary can be used for document parsing features such as OCR or markdown conversion, especially in non-English scenarios, and can maintain capabilities while providing fine-grained perception and understanding.\n",
      "\n",
      "8. **Visual Instruction Tuning for Histopathology**: Quilt-LLaVA can be applied to diagnostic reasoning in histopathology by enabling spatial awareness and reasoning beyond single image patches.\n",
      "\n",
      "9. **Interactive Control for Multi-Modal LLMs**: Prompt Highlighter allows users to interactively control the focus during generation, which can be applied to customized content generation in various computer vision tasks.\n",
      "\n",
      "10. **Grounded Visual Chat with Large Multimodal Models**: LLaVA-Grounding demonstrates the potential for combining grounding and chat capabilities in visual chat applications, which can be applied to interactive systems that require visual understanding and dialogue.\n",
      "\n",
      "These applications demonstrate the versatility of LLaVA and related technologies in enhancing computer vision tasks, from medical imaging to interactive systems and efficient model deployment on resource-constrained devices.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mML_Paper_Summarization_Specialist\u001b[0m (to chat_manager):\n",
      "\n",
      "TERMINATE\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "All agents have been cleared.\n"
     ]
    }
   ],
   "source": [
    "new_builder = AgentBuilder(config_file_or_env=config_file_or_env)\n",
    "agent_list, agent_configs = new_builder.load(\n",
    "    \"./save_config_c52224ebd16a2e60b348f3f04ac15e79.json\"\n",
    ")  # load previous agent configs\n",
    "start_task(\n",
    "    execution_task=\"Find a recent paper about LLaVA on arxiv and find its potential applications in computer vision.\",\n",
    "    agent_list=agent_list,\n",
    ")\n",
    "new_builder.clear_all_agents()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32e0cf8f09eef5cd",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "## Use OpenAI Assistant\n",
    "\n",
    "[The Assistants API](https://platform.openai.com/docs/assistants/overview) allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries.\n",
    "AutoBuild also support assistant api by adding `use_oai_assistant=True` to `build()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4051c25b2cd1918c",
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-01-01T10:42:16.740401Z",
     "start_time": "2024-01-01T10:40:37.039210300Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==> Generating agents...\n",
      "['ArXiv_CS_Medical_Paper_Finder_Developer', 'Computational_Biology_Research_Analyst', 'Computer_Science_Literature_Review_Specialist', 'Machine_Learning_Model_Architect', 'Data_Extraction_Scripting_Engineer'] are generated.\n",
      "==> Generating system message...\n",
      "Preparing system message for ArXiv_CS_Medical_Paper_Finder_Developer\n",
      "Preparing system message for Computational_Biology_Research_Analyst\n",
      "Preparing system message for Computer_Science_Literature_Review_Specialist\n",
      "Preparing system message for Machine_Learning_Model_Architect\n",
      "Preparing system message for Data_Extraction_Scripting_Engineer\n",
      "==> Generating description...\n",
      "Preparing description for ArXiv_CS_Medical_Paper_Finder_Developer\n",
      "Preparing description for Computational_Biology_Research_Analyst\n",
      "Preparing description for Computer_Science_Literature_Review_Specialist\n",
      "Preparing description for Machine_Learning_Model_Architect\n",
      "Preparing description for Data_Extraction_Scripting_Engineer\n",
      "==> Creating agents...\n",
      "Creating agent ArXiv_CS_Medical_Paper_Finder_Developer with backbone gpt-4-1106-preview...\n",
      "Creating agent Computational_Biology_Research_Analyst with backbone gpt-4-1106-preview...\n",
      "Creating agent Computer_Science_Literature_Review_Specialist with backbone gpt-4-1106-preview...\n",
      "Creating agent Machine_Learning_Model_Architect with backbone gpt-4-1106-preview...\n",
      "Creating agent Data_Extraction_Scripting_Engineer with backbone gpt-4-1106-preview...\n",
      "Adding user console proxy...\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "Find a recent paper about explainable AI on arxiv and find its potential applications in medical.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mArXiv_CS_Medical_Paper_Finder_Developer\u001b[0m (to chat_manager):\n",
      "\n",
      "To perform this task, we will first write a Python script to fetch papers related to \"explainable AI\" from arXiv that are also relevant to the medical field. We will use the `arxiv` library, which is a Python wrapper for the arXiv API. If you don't have the `arxiv` library installed, you can install it using the following command:\n",
      "\n",
      "```bash\n",
      "pip install arxiv\n",
      "```\n",
      "\n",
      "Once we have identified the papers, we will extract potential applications in the medical field from the abstract or conclusion sections if available.\n",
      "\n",
      "Here's the script to find a recent paper about explainable AI from arXiv with relevance to the medical field:\n",
      "\n",
      "```python\n",
      "# Filename: arxiv_explainable_ai_medical.py\n",
      "\n",
      "import arxiv\n",
      "\n",
      "# Query for papers related to \"explainable AI\" in the field of CS and Medical\n",
      "query = 'cat:cs.* AND cat:q-bio.* AND all:explainable AI'\n",
      "sort_by = arxiv.SortCriterion.SubmittedDate\n",
      "sort_order = arxiv.SortOrder.Descending\n",
      "\n",
      "# Perform search query on arXiv\n",
      "search = arxiv.Search(\n",
      "    query=query,\n",
      "    max_results=1,\n",
      "    sort_by=sort_by,\n",
      "    sort_order=sort_order\n",
      ")\n",
      "\n",
      "# Fetch the papers\n",
      "papers = [paper for paper in search.get()]\n",
      "\n",
      "# If there are papers found, print the most recent one's title, authors, and summary\n",
      "if papers:\n",
      "    paper = papers[0]\n",
      "    print(f\"Title: {paper.title}\\n\")\n",
      "    print(f\"Authors: {', '.join(author.name for author in paper.authors)}\\n\")\n",
      "    print(f\"Abstract: {paper.summary}\\n\")\n",
      "    print(f\"Published: {paper.published}\\n\")\n",
      "    print(f\"arXiv ID: {paper.get_short_id()}\\n\")\n",
      "    print(f\"URL: {paper.entry_id}\\n\")\n",
      "else:\n",
      "    print(\"No recent papers on 'explainable AI' found in the medical field on arXiv.\")\n",
      "```\n",
      "\n",
      "To run the script, save it in a Python file and execute it in your Python environment. The script fetches the latest paper based on the defined query and prints out the title, authors, abstract, publication date, arXiv ID, and URL for further reading.\n",
      "\n",
      "Keep in mind that the potential applications in medical would generally be discussed within the paper's text. To extract those, we would typically need to read through the full text, which might involve additional processing steps that are beyond the scope of a simple API query. If the information is not readily available in the abstract, you'll have to review the full text of the paper manually for detailed potential applications in the medical field.\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[31m\n",
      ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is bash)...\u001b[0m\n",
      "\u001b[31m\n",
      ">>>>>>>> EXECUTING CODE BLOCK 1 (inferred language is python)...\u001b[0m\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "\n",
      "exitcode: 1 (execution failed)\n",
      "Code output: \n",
      "Requirement already satisfied: arxiv in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (2.1.0)\n",
      "Requirement already satisfied: feedparser==6.0.10 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from arxiv) (6.0.10)\n",
      "Requirement already satisfied: requests==2.31.0 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from arxiv) (2.31.0)\n",
      "Requirement already satisfied: sgmllib3k in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from feedparser==6.0.10->arxiv) (1.0.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests==2.31.0->arxiv) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests==2.31.0->arxiv) (3.6)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests==2.31.0->arxiv) (1.26.18)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /home/elpis_ubuntu/miniconda3/envs/llm/lib/python3.11/site-packages (from requests==2.31.0->arxiv) (2023.11.17)\n",
      "\n",
      "Traceback (most recent call last):\n",
      "  File \"\", line 19, in <module>\n",
      "    papers = [paper for paper in search.get()]\n",
      "                                 ^^^^^^^^^^\n",
      "AttributeError: 'Search' object has no attribute 'get'\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mArXiv_CS_Medical_Paper_Finder_Developer\u001b[0m (to chat_manager):\n",
      "\n",
      "It seems the script encountered an error because the `Search` object from the `arxiv` library does not have a `get` attribute, which is inconsistent with the usual usage of this library. Let's correct that.\n",
      "\n",
      "Instead of using `search.get()`, we will iterate directly over the search object. Here is the corrected script:\n",
      "\n",
      "```python\n",
      "# Filename: arxiv_explainable_ai_medical.py\n",
      "\n",
      "import arxiv\n",
      "import datetime\n",
      "\n",
      "# Query for papers related to \"explainable AI\" in the field of CS and Medical\n",
      "query = 'cat:cs.AI AND all:\"explainable AI\" AND (abs:medical OR abs:\"health care\" OR abs:clinical)'\n",
      "sort_by = arxiv.SortCriterion.SubmittedDate\n",
      "sort_order = arxiv.SortOrder.Descending\n",
      "\n",
      "# Prepare search\n",
      "search = arxiv.Search(\n",
      "    query=query,\n",
      "    max_results=10,\n",
      "    sort_by=sort_by,\n",
      "    sort_order=sort_order,\n",
      ")\n",
      "\n",
      "# Fetch the papers\n",
      "papers = list(search.results())\n",
      "\n",
      "# If there are papers found, print the most recent one's title, authors, and summary\n",
      "if papers:\n",
      "    most_recent_paper = max(papers, key=lambda paper: paper.published)\n",
      "    print(f\"Title: {most_recent_paper.title}\\n\")\n",
      "    print(f\"Authors: {', '.join(author.name for author in most_recent_paper.authors)}\\n\")\n",
      "    print(f\"Abstract: {most_recent_paper.summary}\\n\")\n",
      "    print(f\"Published: {most_recent_paper.published}\\n\")\n",
      "    print(f\"arXiv ID: {most_recent_paper.get_short_id()}\\n\")\n",
      "    print(f\"URL: {most_recent_paper.entry_id}\\n\")\n",
      "else:\n",
      "    print(\"No recent papers on 'explainable AI' found in the medical field on arXiv.\")\n",
      "```\n",
      "\n",
      "Run the above script, and it should now correctly fetch and print the details of the most recent paper related to explainable AI that has potential applications in the medical field. This fix should address the AttributeError by correctly utilizing the `results()` method provided by the `arxiv` library to obtain search results.\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[31m\n",
      ">>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...\u001b[0m\n",
      "\u001b[33mUser_console_and_code_interpreter\u001b[0m (to chat_manager):\n",
      "\n",
      "exitcode: 0 (execution succeeded)\n",
      "Code output: \n",
      "Title: XAI for In-hospital Mortality Prediction via Multimodal ICU Data\n",
      "\n",
      "Authors: Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, Fengxiang He\n",
      "\n",
      "Abstract: Predicting in-hospital mortality for intensive care unit (ICU) patients is\n",
      "key to final clinical outcomes. AI has shown advantaged accuracy but suffers\n",
      "from the lack of explainability. To address this issue, this paper proposes an\n",
      "eXplainable Multimodal Mortality Predictor (X-MMP) approaching an efficient,\n",
      "explainable AI solution for predicting in-hospital mortality via multimodal ICU\n",
      "data. We employ multimodal learning in our framework, which can receive\n",
      "heterogeneous inputs from clinical data and make decisions. Furthermore, we\n",
      "introduce an explainable method, namely Layer-Wise Propagation to Transformer,\n",
      "as a proper extension of the LRP method to Transformers, producing explanations\n",
      "over multimodal inputs and revealing the salient features attributed to\n",
      "prediction. Moreover, the contribution of each modality to clinical outcomes\n",
      "can be visualized, assisting clinicians in understanding the reasoning behind\n",
      "decision-making. We construct a multimodal dataset based on MIMIC-III and\n",
      "MIMIC-III Waveform Database Matched Subset. Comprehensive experiments on\n",
      "benchmark datasets demonstrate that our proposed framework can achieve\n",
      "reasonable interpretation with competitive prediction accuracy. In particular,\n",
      "our framework can be easily transferred to other clinical tasks, which\n",
      "facilitates the discovery of crucial factors in healthcare research.\n",
      "\n",
      "Published: 2023-12-29 14:28:04+00:00\n",
      "\n",
      "arXiv ID: 2312.17624v1\n",
      "\n",
      "URL: http://arxiv.org/abs/2312.17624v1\n",
      "\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mComputational_Biology_Research_Analyst\u001b[0m (to chat_manager):\n",
      "\n",
      "Based on the output, we have identified a recent paper from arXiv:\n",
      "\n",
      "Title: **\"XAI for In-hospital Mortality Prediction via Multimodal ICU Data\"**\n",
      "\n",
      "Authors: **Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, Fengxiang He**\n",
      "\n",
      "Abstract Summary: The paper addresses the challenge of explainability in AI for predicting in-hospital mortality of ICU patients using multimodal data. They propose the eXplainable Multimodal Mortality Predictor (X-MMP), which combines multimodal learning and an explainability method called Layer-Wise Propagation to Transformer (LWP-T). This allows the model to make decisions with explanations across multimodal inputs, identifying important features for the prediction and visualizing the contribution of each modality to the clinical outcomes. This could assist clinicians in comprehending the AI's decision-making process.\n",
      "\n",
      "Published: **December 29, 2023**\n",
      "\n",
      "arXiv ID: **2312.17624v1**\n",
      "\n",
      "URL: [http://arxiv.org/abs/2312.17624v1](http://arxiv.org/abs/2312.17624v1)\n",
      "\n",
      "**Potential Applications in Medical Field**:\n",
      "\n",
      "1. **Improved decision-making in intensive care units (ICUs):** By providing explainability for its predictions regarding patient mortality, clinicians can understand the reasoning behind AI-driven prognoses and make more informed treatment decisions.\n",
      "\n",
      "2. **Enhanced clinician trust in AI technologies:** Explainable outputs can build clinician trust in AI systems, thereby potentially increasing the adoption of AI tools in critical care settings.\n",
      "\n",
      "3. **Identification of crucial health factors:** The framework assists in discovering important factors in healthcare research, possibly leading to new insights into patient care and management.\n",
      "\n",
      "4. **Education and training:** The visualizations and explanations provided by X-MMP could be used in medical education and training, helping healthcare professionals to better understand the factors influencing patient outcomes in the ICU.\n",
      "\n",
      "5. **Transferability to other clinical tasks:** The framework can be adapted to other clinical prediction tasks, making it a versatile tool for various applications within the healthcare domain.\n",
      "\n",
      "6. **Contribution analysis of multimodal data:** Understanding how various types of data (vitals, lab results, waveforms, etc.) influence predictions can lead to better multimodal data integration in clinical workflows.\n",
      "\n",
      "This paper showcases how explainable AI can directly impact healthcare by enhancing the transparency and interpretability of AI models, ultimately supporting clinical decision-making and patient care. The application of such technology could be pivotal in advancing personalized medicine and tailored treatment plans for patients in critical conditions. \n",
      "\n",
      "If this information satisfies the task requirements, please let me know, or if there are further inquiries, feel free to ask.\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "\u001b[33mComputer_Science_Literature_Review_Specialist\u001b[0m (to chat_manager):\n",
      "TERMINATE\n",
      "\n",
      "\n",
      "--------------------------------------------------------------------------------\n",
      "All agents have been cleared.\n"
     ]
    }
   ],
   "source": [
    "new_builder = AgentBuilder(\n",
    "    config_file_or_env=config_file_or_env, builder_model=\"gpt-4-1106-preview\", agent_model=\"gpt-4-1106-preview\"\n",
    ")\n",
    "agent_list, agent_configs = new_builder.build(\n",
    "    building_task, llm_config, use_oai_assistant=True\n",
    ")  # Transfer to OpenAI assistant API.\n",
    "start_task(\n",
    "    execution_task=\"Find a recent paper about explainable AI on arxiv and find its potential applications in medical.\",\n",
    "    agent_list=agent_list,\n",
    ")\n",
    "new_builder.clear_all_agents()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   },
   "id": "99bdc75f8810926a"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}