"<a href=\"https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/agentchat_video_transcript_translate_with_whisper.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"id": "a5b4540e-4987-4774-9305-764c3133e953",
"metadata": {},
"source": [
"<a id=\"toc\"></a>\n",
"# Auto Generated Agent Chat: Translating Video audio using Whisper and GPT-3.5-turbo\n",
"In this notebook, we demonstrate how to use whisper and GPT-3.5-turbo with `AssistantAgent` and `UserProxyAgent` to recognize and translate\n",
"the speech sound from a video file and add the timestamp like a subtitle file based on [agentchat_function_call.ipynb](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_function_call.ipynb)\n"
]
},
{
"cell_type": "markdown",
"id": "4fd644cc-2b14-4700-8b1d-959fb2e9acb0",
"metadata": {},
"source": [
"## Requirements\n",
"AutoGen requires `Python>=3.8`. To run this notebook example, please install `openai`, `pyautogen`, `whisper`, and `moviepy`:\n",
"Below is an example of speech recognition from a [Peppa Pig cartoon video clip](https://drive.google.com/file/d/1QY0naa2acHw2FuH7sY3c-g2sBLtC2Sv4/view?usp=drive_link) originally in English and translated into Chinese.\n",
"'FFmpeg' does not support online files. To run the code on the example video, you need to download the example video locally. You can change `your_file_path` to your local video file path."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ed549b75-b4ea-4ec5-8c0b-a15e93ffd618",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33muser_proxy\u001b[0m (to chatbot):\n",
"\n",
"For the video located in E:\\pythonProject\\gpt_detection\\peppa pig.mp4, recognize the speech and transfer it into a script file, then translate from English text to a Chinese video subtitle text. \n",
">>>>>>>> EXECUTING FUNCTION recognize_transcript_from_video...\u001b[0m\n",
"Detecting language using up to the first 30 seconds. Use `--language` to specify the language\n",
"Detected language: English\n",
"[00:00.000 --> 00:03.000] This is my little brother George.\n",
"[00:03.000 --> 00:05.000] This is Mummy Pig.\n",
"[00:05.000 --> 00:07.000] And this is Daddy Pig.\n",
"[00:07.000 --> 00:09.000] Pee-pah Pig.\n",
"[00:09.000 --> 00:11.000] Desert Island.\n",
"[00:11.000 --> 00:14.000] Pepper and George are at Danny Dog's house.\n",
"[00:14.000 --> 00:17.000] Captain Dog is telling stories of when he was a sailor.\n",
"[00:17.000 --> 00:20.000] I sailed all around the world.\n",
"[00:20.000 --> 00:22.000] And then I came home again.\n",
"[00:22.000 --> 00:25.000] But now I'm back for good.\n",
"[00:25.000 --> 00:27.000] I'll never forget you.\n",
"[00:27.000 --> 00:29.000] Daddy, do you miss the sea?\n",
"[00:29.000 --> 00:31.000] Well, sometimes.\n",
"[00:31.000 --> 00:36.000] It is Grandad Dog, Grandpa Pig and Grumpy Rabbit.\n",
"[00:36.000 --> 00:37.000] Hello.\n",
"[00:37.000 --> 00:40.000] Can Captain Dog come out to play?\n",
"[00:40.000 --> 00:43.000] What? We are going on a fishing trip.\n",
"[00:43.000 --> 00:44.000] On a boat?\n",
"[00:44.000 --> 00:45.000] On the sea!\n",
"[00:45.000 --> 00:47.000] OK, let's go.\n",
"[00:47.000 --> 00:51.000] But Daddy, you said you'd never get on a boat again.\n",
"[00:51.000 --> 00:54.000] I'm not going to get on a boat again.\n",
"[00:54.000 --> 00:57.000] You said you'd never get on a boat again.\n",
"[00:57.000 --> 01:00.000] Oh, yes. So I did.\n",
"[01:00.000 --> 01:02.000] OK, bye-bye.\n",
"[01:02.000 --> 01:03.000] Bye.\n",
"\u001b[33muser_proxy\u001b[0m (to chatbot):\n",
"\n",
"\u001b[32m***** Response from calling function \"recognize_transcript_from_video\" *****\u001b[0m\n",
"[{'sentence': 'This is my little brother George..', 'timestamp_start': 0, 'timestamp_end': 3.0}, {'sentence': 'This is Mummy Pig..', 'timestamp_start': 3.0, 'timestamp_end': 5.0}, {'sentence': 'And this is Daddy Pig..', 'timestamp_start': 5.0, 'timestamp_end': 7.0}, {'sentence': 'Pee-pah Pig..', 'timestamp_start': 7.0, 'timestamp_end': 9.0}, {'sentence': 'Desert Island..', 'timestamp_start': 9.0, 'timestamp_end': 11.0}, {'sentence': \"Pepper and George are at Danny Dog's house..\", 'timestamp_start': 11.0, 'timestamp_end': 14.0}, {'sentence': 'Captain Dog is telling stories of when he was a sailor..', 'timestamp_start': 14.0, 'timestamp_end': 17.0}, {'sentence': 'I sailed all around the world..', 'timestamp_start': 17.0, 'timestamp_end': 20.0}, {'sentence': 'And then I came home again..', 'timestamp_start': 20.0, 'timestamp_end': 22.0}, {'sentence': \"But now I'm back for good..\", 'timestamp_start': 22.0, 'timestamp_end': 25.0}, {'sentence': \"I'll never forget you..\", 'timestamp_start': 25.0, 'timestamp_end': 27.0}, {'sentence': 'Daddy, do you miss the sea?.', 'timestamp_start': 27.0, 'timestamp_end': 29.0}, {'sentence': 'Well, sometimes..', 'timestamp_start': 29.0, 'timestamp_end': 31.0}, {'sentence': 'It is Grandad Dog, Grandpa Pig and Grumpy Rabbit..', 'timestamp_start': 31.0, 'timestamp_end': 36.0}, {'sentence': 'Hello..', 'timestamp_start': 36.0, 'timestamp_end': 37.0}, {'sentence': 'Can Captain Dog come out to play?.', 'timestamp_start': 37.0, 'timestamp_end': 40.0}, {'sentence': 'What? We are going on a fishing trip..', 'timestamp_start': 40.0, 'timestamp_end': 43.0}, {'sentence': 'On a boat?.', 'timestamp_start': 43.0, 'timestamp_end': 44.0}, {'sentence': 'On the sea!.', 'timestamp_start': 44.0, 'timestamp_end': 45.0}, {'sentence': \"OK, let's go..\", 'timestamp_start': 45.0, 'timestamp_end': 47.0}, {'sentence': \"But Daddy, you said you'd never get on a boat again..\", 'timestamp_start': 47.0, 'timestamp_end': 51.0}, {'sentence': \"I'm not going to get on a boat again..\", 'timestamp_start': 51.0, 'timestamp_end': 54.0}, {'sentence': \"You said you'd never get on a boat again..\", 'timestamp_start': 54.0, 'timestamp_end': 57.0}, {'sentence': 'Oh, yes. So I did..', 'timestamp_start': 57.0, 'timestamp_end': 60.0}, {'sentence': 'OK, bye-bye..', 'timestamp_start': 60.0, 'timestamp_end': 62.0}, {'sentence': 'Bye..', 'timestamp_start': 62.0, 'timestamp_end': 63.0}]\n",
" }, # Please set use_docker=True if docker is available to run the generated code. Using docker is safer than running the generated code directly.\n",