Project Description
I have a customer database that needs a careful round of data cleaning and validation. The main pain-point is duplicate entries—I want every record to be unique so downstream reports stay reliable. Along the way, you are welcome to flag any other inconsistencies you spot, but de-duplication is the priority.
You will receive the raw spreadsheet and a short set of matching rules I currently follow. After you run your process, I’ll need the cleaned file back plus a brief log of what was removed, merged, or amended so I can audit the changes quickly.
Deliverables
• Cleaned customer data file with duplicates resolved
• Change log (CSV or XLSX) listing original row IDs and the action taken
Accuracy is more important than speed, so automated scripts, Excel functions, or tools like OpenRefine are fine as long as the final output is 100 % error-free.