Adam El Hirch | Yelp Data Analytics

Yelp Data

Engineering & NLP

Analytics

A full-scale academic data project on the Yelp Academic dataset. I acted primarily as project architect and project manager, planning the full project in Linear to enable efficient collaboration across a 5-person team. I designed the end-to-end workflow: ingestion of large JSONL sources, data quality cleaning, Parquet conversion, NLP preprocessing, and dashboard-driven analysis to extract actionable insights on reviewer behavior and review quality patterns.

Pipeline & Method

Main challenge: transforming high-volume, heterogeneous raw Yelp data into a reliable analytics base while coordinating a multi-contributor delivery. I structured the roadmap, milestones, and task ownership in Linear for a 5-person team. On execution, the pipeline includes robust cleaning (duplicates, missing critical keys, type/date normalization), Parquet persistence for performance, and NLP preparation (tokenization, stopwords, lemmatization) before TF-IDF/Word2Vec-based analysis.

Key Insights

Reviewer Profiles: clear severity and verbosity differences by user profile.
Review Dynamics: observable relation between review length and star ratings.
Satisfaction Split: sentiment distribution mapped from rating buckets.
Reusable Stack: notebook insights consolidated with reusable Python modules.

Yelp Data

Engineering & NLP

Analytics

Pipeline & Method

Key Insights

next