client:
SAÉ 5C01 — BUT Informatique
role:
Data Engineer, Full-Stack Dev
year:
2025

SeriesFlix

TV Series Search

& Recommendation

SeriesFlix is a full-stack search and recommendation engine for TV series, leveraging Natural Language Processing (NLP) to analyze over 15,000 subtitles. Built with Flask, React, and PostgreSQL, it features a BM25 ranking algorithm and a content-based filtering system for personalized recommendations.
SeriesFlix Interface

The Challenge

The main challenge was handling and indexing massive amounts of unstructured textual data from SRT files. I developed an automated ETL pipeline that unzips, parses, cleans, and indexes subtitles using Mistral AI for metadata enrichment and TMDB API for artwork and cast information.

Key Features

  • BM25 Search: Search series by actual dialogue content with relevance scoring.
  • AI Recommendations: Content-based filtering using TF-IDF and cosine similarity.
  • Metadata Ingestion: Automated pipeline using Mistral AI for series identification.
  • Admin Dashboard: Full CRUD management and subtitle upload processing.