Project Description
I need a single, case specific pipeline that can:
• Enter a specified supplier site with Playwright, collect every product description, all associated pictures, and any linked PDF datasheets.
• Pass the raw copy through a rewriting step so the final text follows our own standards—different voice, structure, and formatting from the source while preserving all technical details.
• Push the cleaned assets into our WooCommerce store as new products (or update existing ones), mapping categories, attributes, images, and PDFs automatically.
The crawler must respect pagination, handle dynamic content, and leave behind a log so I can track what was imported or skipped. The rewriting module can rely on an LLM or any approach you prefer, as long as the end result is unique and ready for publishing without manual edits.
Deliverables
1. Playwright-based scraper with config file for URL, credentials, and rate limits
2. Rewriter module with an easily adjustable style template
3. WooCommerce integration script using the REST API (create/update, image & PDF upload)
Looking for efficient solution to get the content in an online store. There are probably couple or some hundreds of products to be processed. This will be clarified soon.