← back
AI-Driven B2B Intelligence Platform Development

AI-Driven B2B Intelligence Platform Development

Pending
💰 INR 12500–37500 👤 Unknown 🕒 11d ago status: new
Python Cloud Computing PostgreSQL Web Crawling Backend Development OpenAI AI Development
# B2B Intelligence Platform Development — Production-Grade AI + Data Pipeline ## Project Overview We are building a **production-grade B2B intelligence platform** focused on large-scale public data acquisition, AI-powered document intelligence, and real-time business alerts. The platform will crawl and process data from **50+ public-facing websites**, extract structured intelligence from multilingual PDFs (English, Hindi, Marathi), and deliver actionable insights through search, AI-generated reports, and multi-channel notifications. This is a **long-term product engagement**, not a short-term prototype assignment. The selected team/freelancer will work on a high-scale architecture designed for: * Large-volume document ingestion * AI-assisted extraction pipelines * Search + semantic intelligence * Risk scoring * Real-time alerts * Enterprise-grade observability * Scalable AWS infrastructure NDA is mandatory before sharing complete architecture, workflows, source mappings, schemas, and internal business logic. --- # Required Technical Skills ## Backend & Core Stack * Python 3.11 * FastAPI * asyncio * asyncpg * Production-grade architecture * Typed, tested, maintainable code ## Web Crawling & Data Acquisition * Playwright * httpx * curl-cffi * JS-rendered page handling * Session management * Queue-based distributed crawling * Rate limiting & retry orchestration ## Document Processing & OCR * pdfplumber * PyMuPDF * Tesseract 5 * Hindi + Marathi OCR language packs * OCR fallback pipelines ## AI / LLM Integration * Anthropic Claude API * OpenAI API * Structured JSON extraction * Schema validation * Confidence scoring * Embedding pipelines * OpenAI text-embedding-3 * BGE-M3 ## Data & Search Infrastructure * PostgreSQL 14+ * JSONB * Query optimisation * Table partitioning * pgvector * OpenSearch / Elasticsearch * Custom analyzers for multilingual search * Redis for queues, caching, throttling ## Cloud & DevOps * AWS ap-south-1 (Mumbai only) * ECS Fargate * S3 * RDS * IAM * Secrets Manager * Docker * Terraform * GitHub Actions --- # Preferred / Bonus Skills * Apache Airflow / MWAA * Indic-language NLP experience * React + TypeScript * WhatsApp Cloud API * Firebase Cloud Messaging * AWS SES * Sentry * OpenTelemetry * Grafana * LLM cost optimisation strategies * High-scale document processing systems * DPDP Act 2023 compliance * Experience handling 10,000+ documents/day pipelines --- # Scope of Work The selected developer/team will build the following production components: ### Core Pipeline 1. Distributed web crawler 2. Document acquisition engine 3. S3 document storage layer 4. OCR cascade pipeline 5. Section detection engine 6. Structured field extraction 7. Revision diff engine 8. Change classification layer 9. Intelligence/risk scoring engine 10. Hybrid search engine 11. AI-powered report generation 12. Multi-channel alerting engine 13. Admin dashboard 14. Operational tooling & monitoring --- # Deliverables ## Code Deliverables * 11 Dockerised microservices * PostgreSQL schema with migrations * React + TypeScript admin dashboard * Public REST APIs * OpenAPI 3.0 documentation ## Infrastructure Deliverables * Terraform infrastructure-as-code * AWS deployment architecture * ECS deployment pipelines * CI/CD workflows * Observability stack * Production deployment on AWS Mumbai region ## Quality Deliverables * ≥75% test coverage on core extraction logic * Integration testing * Production-scale load testing * Technical documentation * Runbooks * Monitoring dashboards * Error tracking setup --- # Important Constraints * AWS Mumbai region only (ap-south-1) * Indian data residency mandatory * No cross-border data transfer * No anti-bot bypassing * Only compliant/public-access acquisition flows * Production-quality engineering required * Observability + testing mandatory * IST timezone preferred (±2 hours) --- # Engagement Model We are open to: * Fixed-price engagement * Milestone-based delivery * Hourly engagement * Long-term retainer Please propose the engagement structure best suited for your team. --- # What We Are Looking For We prefer teams/freelancers who: * Have built production-scale platforms * Understand distributed systems * Can work independently * Write maintainable code * Think in systems, not just tasks * Can support long-term product evolution Please include the following in your proposal: * Relevant project experience * Team composition * Architecture approach * Deployment strategy * Sample production systems * Engagement model preference * Post-launch support capability NDA will be executed before detailed technical discussions.
↗ View on Freelancer