Job Description
Infrastructure Lead (Agent Networks)
About this role
We are seeking an exceptional Infrastructure Lead to architect and build the foundational systems that will power the next of AI agent networks at Naptha AI. This is a rare opportunity to shape the future of AI agent infrastructure at a massively ambitious scale, backed by industry veterans and technical leaders through NVIDIA Inception, Google for Startups, and Microsoft for Startups.
We're building the foundational infrastructure for the next wave of AI companies, enabling frontier AI developers (many leaving labs like OpenAI, Anthropic, and DeepMind) to build products powered by enormous networks of highly capable next- AI agents. As our Infrastructure Lead, you'll design and implement the systems that will enable billions of AI agents to interact, coordinate, and scale efficiently across distributed environments.
Core Responsibilities
- Design and implement scalable infrastructure for massive agent networks
- Architect systems for efficient agent communication and coordination
- Build robust, distributed systems for agent deployment and execution
- Create monitoring, observability, and debugging systems for agent networks
- Develop performance optimization strategies for large-scale agent operations
- Design fault-tolerant systems for reliable agent interactions
- Lead technical decisions around infrastructure architecture
Technical Challenges You'll Tackle
- Designing distributed systems that can handle millions of concurrent agent interactions
- Building efficient communication protocols for agent-to-agent interactions
- Creating scalable orchestration systems for agent deployment
- Implementing robust monitoring and debugging tools for complex agent networks
- Optimizing resource utilization across distributed agent systems
- Developing infrastructure that can adapt to emerging AI capabilities
You're a good fit if you have:
- Deep expertise in distributed systems and scalable architecture
- Strong experience with high-performance computing or large-scale systems
- Track record of building reliable, production-grade infrastructure
- Experience with modern cloud platforms and containerization
- Strong coding abilities in systems programming
- Understanding of AI/ML deployment challenges
- Passion for solving complex infrastructure problems
Required Technical Experience:
- Proven experience building distributed systems at scale
- Expertise in performance optimization and system reliability
- Strong programming skills (Go, Rust, or similar systems)
- Experience with container orchestration (Kubernetes, etc.)
- Understanding of network protocols and distributed computing
- Experience with observability and monitoring systems
About the hiring process:
- Technical architecture discussion
- Systems design deep dive
- Coding and problem-solving session
- Team collaboration interview
- Infrastructure vision presentation
Compensation & Benefits:
- Highly competitive salary and significant equity stake
- Remote-first work environment
- Full medical, dental, and vision coverage
- Flexible PTO policy
- Learning and development budget
- Conference attendance support
This is a unique opportunity to shape the infrastructure that will power the next of AI systems. You'll be working at the intersection of distributed systems, AI, and platform design, creating the foundation for how future AI agents will interact and scale.
#J-18808-Ljbffr