Project Description
I’m looking for an experienced consultant who can take the lead in architecting and standing up a full-scale data lakehouse, with the immediate goal of seamless data integration across my current and future sources.
What I need from you:
• A clear, vendor-agnostic architecture blueprint that balances performance, cost, and governance.
• Hands-on implementation of the core storage layer (Delta/Parquet or comparable), compute engine (Spark or equivalent), and ingestion pipelines.
• Robust security and data-quality controls baked in from day one, including role-based access, lineage, and monitoring.
• Documentation and a concise knowledge-transfer session so my internal team can extend and maintain the platform confidently.
Acceptance criteria:
1. A reproducible infrastructure-as-code template that deploys the lakehouse stack in my cloud environment.
2. At least one production-ready pipeline that lands a relational source into the lakehouse bronze, silver, and gold layers with automated CI/CD.
3. Performance benchmarks proving sub-second query latency on an analytical workload of my choosing.
If you have a track record delivering scalable lakehouse solutions—Databricks, Snowflake on Iceberg, open-source Delta, or a similar stack—let’s talk about timelines and next steps.