Invidious

LLMs work wonders on text data but if you want to use audio or video files instead of text, things get a bit trickier. An easy solution is to transcribe the audio or video files. This would work but you will lose valuable information, especially in multi-speaker situations, like how many people were speaking and who said what.

In this video, we’ll learn how to build an LLM application in 10 minutes that can take multiple speakers into account when answering a question.

Colab notebook: github.com/deepset-ai/haystack-cookbook/blob/main/…

7 months ago | [YT] | 15

Hi! Looks like you have JavaScript turned off. Click here to view comments, keep in mind they may take a bit longer to load.