Project Description
I need a straightforward Python script that reliably pulls text content from websites I will specify once the project starts. The focus is strictly on text—no images or media files—so the routine should locate, extract, and save headings, paragraphs, or other copy in a clean, structured format such as CSV or JSON.
Because the target sites may vary, the code must be written with easy-to-tweak selectors and robust error handling. Please use widely supported libraries (requests, BeautifulSoup, or Scrapy if you prefer) and include polite scraping practices such as user-agent rotation, rate limiting, and graceful retries on common HTTP errors.
Deliverables:
• A well-commented .py file ready to run from the command line
• A brief README explaining any required packages, configuration steps, and how to point the script at a new site
I will test the script against a sample list of pages; acceptance is complete when the exported file faithfully captures the visible text with no blank rows or HTML residue. Let me know if you need clarification on any point—otherwise I look forward to seeing your approach.