Project Description
SCRAPING PROJECT: PLAYWRIGHT, BEAUTIFULSOUP AND PYTHON
We are looking for a developer who already has existing scraping bots built in Python. It is mandatory that the developer can show working scraping tools during the selection process. We are not starting from zero. The foundation of this project must be built on the developer’s existing code and existing scraping frameworks.
Quality requirements:
1. The developer must have proven experience with Playwright for web scraping.
2. The developer must have proven experience with BeautifulSoup for HTML parsing.
3. The developer must be able to show existing scraping bots, including:
o a demonstration of working code
o an example of JSON output
o proof that the bot can successfully scrape real data
4. The developer must have experience building JSON pipelines for both input and output.
5. The developer must be able to implement SQL connections (MySQL or PostgreSQL) to retrieve input data.
6. The developer must have proven experience with anti bot techniques, including:
o user agent rotation
o proxy rotation
o headless detection bypass
o time delay randomization
o session handling
7. The developer must be able to adapt existing scraping code so that it fits our input and output structure (input via JSON and SQL, output via JSON).
All code must be modular, extendable and maintainable. The scraping logic must be fully based on Playwright for retrieving HTML and BeautifulSoup for parsing HTML.
you need to read the doc enclosed before bidding, all bids without reading the doc will not be considered