Comprehensive Text & Image Scraping

—

Pending

💰 USD 30–250 👤 Unknown 🕒 7d ago status: new

Required Skills

JavaScript Python Web Scraping Data Mining Scrapy Data Extraction BeautifulSoup Selenium

Project Description

I have a list of niche websites and need every relevant piece of text and its accompanying images extracted, organised, and delivered in a clean, ready-to-use format. I’ll share the URLs and the exact data points once we start, but expect a mix of article-style pages and media galleries. Scope • Capture both written content (headings, paragraphs, metadata) and all on-page images. • Provide the text in CSV or JSON and store images in clearly named folders that map back to the records. • Preserve basic structure—so each text record includes the image file name or path. • Respect robots.txt and rate limits; the scrape must be discreet and repeatable. What I’d like to see in your proposal Please outline your end-to-end approach: preferred language or framework (e.g. Python with Scrapy/BeautifulSoup, Selenium for dynamic pages, or another stack you trust), handling of pagination/login barriers, deduplication strategy, and estimated turnaround time. A brief sample architecture diagram or code snippet showing how you handle image downloads would be a plus. Deliverables 1. Scraper script(s) with clear setup instructions. 2. Final datasets (CSV/JSON) and corresponding image folders. 3. Short read-me explaining how to rerun the scrape and update the data. I’m ready to move quickly once I see a detailed project proposal that convinces me you can gather both text and visual assets accurately and efficiently.

Actions

↗ View on Freelancer