Main challenge: transforming high-volume, heterogeneous raw Yelp data into a reliable analytics base while coordinating a multi-contributor delivery. I structured the roadmap, milestones, and task ownership in Linear for a 5-person team. On execution, the pipeline includes robust cleaning (duplicates, missing critical keys, type/date normalization), Parquet persistence for performance, and NLP preparation (tokenization, stopwords, lemmatization) before TF-IDF/Word2Vec-based analysis.