llama-hub/loader_hub/youtube_transcript/base.py

"""Simple Reader that reads transcript of youtube video."""
from typing import Any, List, Optional

from llama_index.readers.base import BaseReader
from llama_index.readers.schema.base import Document


class YoutubeTranscriptReader(BaseReader):
    """Youtube Transcript reader."""

    def load_data(self, ytlinks: List[str], languages: Optional[List[str]] = ['en'], **load_kwargs: Any) -> List[Document]:
        """Load data from the input directory.

        Args:
            pages (List[str]): List of youtube links \
                for which transcripts are to be read.

        """
        from youtube_transcript_api import YouTubeTranscriptApi

        results = []
        for link in ytlinks:
            video_id = link.split("?v=")[-1]
            srt = YouTubeTranscriptApi.get_transcript(video_id, languages=languages)
            transcript = ""
            for chunk in srt:
                transcript = transcript + chunk["text"] + "\n"
            results.append(Document(transcript))
        return results
Added all other files 2023-02-03 00:05:28 -08:00			`"""Simple Reader that reads transcript of youtube video."""`
add option to pass languages to YoutubeTranscriptReader (#39) * add option to pass languages to YoutubeTranscriptReader * Add optional type --------- Co-authored-by: lukas frischknecht <lukas.frischknecht@srf.ch> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-16 17:49:54 +01:00			`from typing import Any, List, Optional`
Added all other files 2023-02-03 00:05:28 -08:00
swap out gpt_index imports for llama_index imports (#49) * cr * cr * cr --------- Co-authored-by: Jerry Liu <jerry@robustintelligence.com> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-20 21:46:58 -08:00			`from llama_index.readers.base import BaseReader`
			`from llama_index.readers.schema.base import Document`
Added all other files 2023-02-03 00:05:28 -08:00

			`class YoutubeTranscriptReader(BaseReader):`
			`"""Youtube Transcript reader."""`

add option to pass languages to YoutubeTranscriptReader (#39) * add option to pass languages to YoutubeTranscriptReader * Add optional type --------- Co-authored-by: lukas frischknecht <lukas.frischknecht@srf.ch> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-16 17:49:54 +01:00			`def load_data(self, ytlinks: List[str], languages: Optional[List[str]] = ['en'], **load_kwargs: Any) -> List[Document]:`
Added all other files 2023-02-03 00:05:28 -08:00			`"""Load data from the input directory.`

			`Args:`
			`pages (List[str]): List of youtube links \`
			`for which transcripts are to be read.`

			`"""`
Added new file readers 2023-02-03 20:12:03 -08:00			`from youtube_transcript_api import YouTubeTranscriptApi`
Added all other files 2023-02-03 00:05:28 -08:00
			`results = []`
			`for link in ytlinks:`
			`video_id = link.split("?v=")[-1]`
add option to pass languages to YoutubeTranscriptReader (#39) * add option to pass languages to YoutubeTranscriptReader * Add optional type --------- Co-authored-by: lukas frischknecht <lukas.frischknecht@srf.ch> Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com> 2023-02-16 17:49:54 +01:00			`srt = YouTubeTranscriptApi.get_transcript(video_id, languages=languages)`
Added all other files 2023-02-03 00:05:28 -08:00			`transcript = ""`
			`for chunk in srt:`
			`transcript = transcript + chunk["text"] + "\n"`
			`results.append(Document(transcript))`
			`return results`