← back
MahaRERA Data Extraction & Processing

MahaRERA Data Extraction & Processing

Pending
💰 INR 12500–37500 👤 Unknown 🕒 20d ago status: new
PHP Data Processing Excel Web Scraping Data Mining Data Extraction Data Analysis NLP
I am an independent professional reaching out to enquire about your data extraction and Data-as-a-Service (DaaS) capabilities. I am looking for an expert partner to process approximately 48,000 specific project URLs from the MahaRERA (Maharashtra Real Estate Regulatory Authority) portal. I do not require the scraping scripts or source code; my goal is simply to procure the final, clean dataset (Excel/CSV) and a structured folder of the processed PDF documents. Below is a detailed overview of the project requirements: Pipeline & Workflow Requirements: Captcha Bypass: Each URL is protected by a simple alphanumeric captcha. You will need to automate solving this (OCR/Proxies/Sessions) to access the project pages. Data Extraction: Scrape specific structured data points from the HTML tables on each page. Download & Merge PDFs: Under the "Promoter Documents" section, locate multi-part files labeled "Land Ownership Document" (e.g., REGISTERED EXCHANGE DEED Part 1, Part 2). Download all parts for each project, merge them into a single PDF, and name the final file [Registration_Number]_Land_Document.pdf. AI/NLP Document Analysis (Critical): Run the merged legal documents (which may be in English or Marathi) through an AI/NLP model to extract the "Consideration" or "Deal Structure" between the multiple parties. Required Data Points: Primary ID: Registration Number & Date of Registration. Basic Details: Project Name, Project Type, and Proposed Completion Date. Area Details: Land Area for Project Applied (Sq. Mts.), Permissible Built-up Area, Sanctioned Built-up Area, and Aggregate Area of Recreational Open Space. Legal & Promoter Details: CC Date, Landowner Type, GSTIN Number, Promoter Name, and all individual names listed in the "Member Details" table. Joint Venture Flag: A True/False column (Mark 'True' if the "Promoter Name" contains a comma or lists multiple entities). Unit Details: Total Residential & Non-Residential Units. AI Consideration Category: Categorize the deal structure from the PDF as: Pure Monetary, Barter, Constructed Area Share, Revenue Share, or Mixed. AI Consideration Summary: A 1-2 sentence English summary of the commercial terms extracted from the PDF. Project Deliverables: One clean .xlsx or .csv file containing all extracted data points and AI summaries for all 48,000 URLs. A .zip folder containing the correctly named, merged Land Ownership Document PDFs for every project. Proposed Milestone Structure: To ensure quality and alignment, I propose dividing this contract into two milestones: Milestone 1 (Proof of Concept): I will provide 100 URLs. You will deliver the CSV (including the AI extraction) and the 100 merged PDFs to prove the pipeline works accurately. Milestone 2 (Final Delivery): Upon approval of the sample, you will process the remaining URLs and deliver the final bulk files. Request for Proposal: If your team is capable of handling this workflow, please reply to this email with a proposal. To ensure you have read through the requirements, please start your response with the word "MahaRERA". Kindly include the following in your proposal: Your estimated total cost for the project in INR. Your expected turnaround time. A brief explanation of the AI/NLP stack you would utilize to read and summarize the Marathi/English property deeds. I look forward to hearing from you.
↗ View on Freelancer