Large scale video-search using CLIP

Large scale video-search using CLIP

Using new advancements in connecting text and images, we helped a large media production company implement effective video search without the need for meta-data.

abstract

traditionally, searching through visual material requires meta-data. but with the invention of clip, we can find the relationship between a text string and an image. that's the basis for this project.

background

about the customer

Our customers is a large media production company with a platform for creating and storing content. There are previous functionality to search for content, but they all required tags and meta-data. With the invention of CLIP, we wanted to implement a scalable video-search functionality that can search trough thousands of videos quickly.

challenges

amount of data

The most challenging aspect of this project is the amount of data. Edisen is an established media production company that store an enormous amount of videos in their system. To process everything with a complex algorithm like CLIP requires a lot of computation. Also, searching works using vector multiplication and that demands memory.

solution

implementation

We delivered two microservices that were easy to deploy in the current platform. One microservice for the algorithm and post-processing, and one search engine. To deal with the challenge of scale, we implemented dynamic sampling where we only store results for relevant frames and an approximate nearest neighbor search.

Similar applications

You can use the exact same solution to do super-resolution or in-painting for another domain, such as fashion. Just replace the dataset with the images you want.

If you have something similar in mind, schedule a meeting, and let’s talk!

other projects