About LGND
LGND is an early-stage startup revolutionizing geospatial AI infrastructure. We bridge the gap between large Earth observation models and specific application developers, enabling intuitive interaction with large Earth observation models and geospatial data. Our core mission is to empower decision-makers with rapid insights from vast, complex datasets. As part of our small, dynamic team, you will play a foundational role in building tools that have never existed before.
Role Summary
We’re looking for a seasoned and pragmatic Lead Database Engineer to own the design, scaling, and performance optimization of our core data storage infrastructure. This role spans database engineering and DevOps, with responsibility for the ingestion, indexing, and retrieval of billions of geospatial embeddings via pgvector and PostGIS on Amazon Aurora (this role might want to change the stack). You’ll also work closely with our cloud infrastructure partners at AWS to optimize throughput, costs, and system reliability.
As one of the earliest infrastructure hires, you’ll help establish best practices and operational systems that underpin our entire AI-native stack. You’ll collaborate across engineering, ML research, and product teams to ensure our systems scale with both user demand and model complexity.
This is a hybrid role, with a preference for candidates based in New York, San Francisco Bay Area, or Copenhagen. Remote work is also a possibility.
Key Responsibilities
- Database Architecture & Management: Design, maintain, and optimize the core LGND database layer ( Postgres with PostGIS and pgvector extensions) to support vector search and metadata queries at scale.
- Indexing & Search Services: Build and manage ingestion pipelines that stream embeddings into pgvector-backed similarity search indexes, ensuring near real-time performance and flexible algorithms
- Infrastructure Optimization: Collaborate with AWS engineers to tune performance, minimize cost, and improve system resilience across Aurora, S3, and EventBridge.
- DevOps & Observability: Implement monitoring, logging, and alerting across ingestion and search services. Help bootstrap or improve our CI/CD pipelines and infrastructure-as-code (Terraform preferred).
- System Scalability: Architect for scale using techniques like partitioning, async ingestion, connection pooling, and background task management.
- MCP & Retrieval Systems: Optimize infrastructure to support Model Context Protocol (MCP) servers used for retrieval-augmented generation and high-speed inference tasks.
- Documentation & Collaboration: Maintain clear infrastructure diagrams and service contracts. Work closely with API and product teams to support new features and workflows.
- Cross-functional Integration: Coordinate with backend, API, and research teams to align database operations with product objectives and machine learning workflows.
Scope of Work: First 3 Months
- Finalize and implement architectural plans for LGND's streaming vector database system.
- Build out ingestion/storage/retrieval services to support billions of embeddings with high availability and minimal latency.
- Stand up monitoring and logging tools for embedding ingestion and query layers.
- Collaborate with API and AI research engineers to expose efficient metadata and similarity search endpoints.
- Work with AWS partners to benchmark and optimize performance and costs.
- Begin implementing scalable devops practices and CI/CD pipelines.
Required Technical Skills
- Database Systems: Expert-level experience with PostgreSQL, including advanced features like PostGIS and pgvector; schema design, indexing, and optimization. Experience with DuckDB.
- DevOps & Cloud Infrastructure: Proficiency with AWS (Aurora, RDS, S3, EventBridge), Docker, GitHub Actions, and IaC tools like Terraform or CloudFormation. Experience with cloud formats, like COG or geoparquet.
- Performance Engineering: Deep understanding of indexing (e.g. HNSW), query optimization, streaming ingestion, and scaling real-time databases.
- CI/CD & Observability: Familiarity with logging and monitoring stacks (e.g., CloudWatch, Grafana, or similar) and automated deployment practices.
- Production Experience: 7+ years in backend infrastructure or database roles, ideally in startups or systems with high ingest/query volumes.
Preferred Experience
- Experience with geospatial data and libraries (e.g., GeoPandas, Rasterio).
- Understanding of vector search techniques and embedding infrastructure.
- Familiarity with RAG and transformer-based AI inference systems.
- Background in data infrastructure for machine learning or real-time analytics.
- Exposure to retrieval systems for unstructured data and geospatial metadata.
- Note: Even if you don't fulfill all the criteria, you are encouraged to apply. We welcome applicants from diverse backgrounds and nontraditional paths.
Soft Skills
- Builder Mentality: Loves building foundational systems from the ground up.
- Self-Led & Proactive: Capable of owning complex systems and improving them iteratively.
- Collaborative & Humble: Thrives in a highly cross-functional team and contributes constructively.
- Pragmatic Problem-Solver: Prioritizes solutions that are effective, maintainable, and impactful.
At LGND, we prize:
- Humility: Value collaboration and learning from others.
- Integrity: Uphold honesty and transparency.
- Effectiveness: Focus on what works and deliver results.
Benefits
- Competitive salary based on experience and location
- Equity options in a high-growth early-stage startup
- Health care and 401k
- Flexible work arrangements (hybrid/remote)
- Opportunity to shape LGND’s technology and company culture
- Work on cutting-edge problems in geospatial AI with real-world impact