Project Description
I have a CSV that logs product sales alongside TV, Radio, and Newspaper advertising spend. I want a clean, reproducible Python workflow that turns this data into an accurate sales-forecasting model and a concise set of insights for decision-makers.
The raw file contains a few gaps, so your first step is to inspect, clean, and appropriately impute missing values. Once the data is tidy, explore it with Pandas, NumPy, Matplotlib, and Seaborn to surface correlations and outliers.
For modelling, start with a well-documented Linear Regression approach using scikit-learn. If you see a clear uplift from adding regularisation or polynomial terms, feel free to include that comparison, but keep the final deliverable focused on a model that is simple to interpret and easy to retrain.
Deliverables
• Google colab (Python 3) with every step: loading, cleaning, EDA, model training, evaluation, and prediction examples
• Visualisations that illustrate key relationships and model performance metrics (R², MAE, or RMSE)
• Brief read-me or slide deck summarising findings, feature importance, and advertising channel effectiveness
Acceptance criteria
The notebook must run end-to-end on my dataset without manual tweaks and achieve reasonable error scores that you will justify in the report.
If this sounds straightforward, let’s get started—I’m ready to share the dataset as soon as you come on board.