I design and ship production-grade AI-native infrastructure — autonomous Kubernetes platforms, agentic RAG pipelines, and multi-model orchestration systems that act with confidence and fail gracefully.
I'm a senior AI/Cloud engineer specialising in production-grade agentic systems, cloud-native infrastructure, and multi-agent orchestration. My focus is engineering software that can reason, decide, and act with minimal human intervention.
Every platform I build is infrastructure-as-code first — Terraform-managed, properly observed, and documented from day one. Not proof-of-concepts. Deployable, maintainable systems.
My work sits at the intersection of AWS cloud architecture and modern AI frameworks: LangGraph, confidence-gated decision engines, and retrieval-augmented generation at production scale.
"Every remediation is a Git commit. The cluster never changes outside a reviewed pull request or a high-confidence auto-apply."
A production-grade AIOps platform for Kubernetes — intercepts infrastructure anomalies, routes them through a 5-agent reasoning pipeline, and acts only when confidence justifies it. Every fix is a Git commit; every rollback is automatic.
A production agentic retrieval system for medical knowledge — streaming token-by-token responses, heuristic confidence scoring, and iterative relevance checking across three distinct knowledge sources.
A unified orchestration layer abstracting multiple LLM providers behind a single API — enabling intelligent model routing, per-provider cost tracking, and automatic failover across OpenAI, Anthropic, and AWS Bedrock.
hello@isokandev.com · isokandev.com