client:
SAÉ 6C01 — BUT Informatique
role:
Project Architect, Project Manager
year:
2026

Yelp Data

Engineering & NLP

Analytics

A full-scale academic data project on the Yelp Academic dataset. I acted primarily as project architect and project manager, planning the full project in Linear to enable efficient collaboration across a 5-person team. I designed the end-to-end workflow: ingestion of large JSONL sources, data quality cleaning, Parquet conversion, NLP preprocessing, and dashboard-driven analysis to extract actionable insights on reviewer behavior and review quality patterns.

Pipeline & Method

Main challenge: transforming high-volume, heterogeneous raw Yelp data into a reliable analytics base while coordinating a multi-contributor delivery. I structured the roadmap, milestones, and task ownership in Linear for a 5-person team. On execution, the pipeline includes robust cleaning (duplicates, missing critical keys, type/date normalization), Parquet persistence for performance, and NLP preparation (tokenization, stopwords, lemmatization) before TF-IDF/Word2Vec-based analysis.
Yelp - severity score by reviewer profile

Key Insights

  • Reviewer Profiles: clear severity and verbosity differences by user profile.
  • Review Dynamics: observable relation between review length and star ratings.
  • Satisfaction Split: sentiment distribution mapped from rating buckets.
  • Reusable Stack: notebook insights consolidated with reusable Python modules.
Yelp - review length versus stars
Yelp - satisfaction pie chart