DevOps / Platform Engineering Lead (Hands-On, Hybrid Infra)
Special needs require special people.
Wise Wolves Corporation run a hybrid stack across cloud and on-prem: Kubernetes, containers, and a handful of performance-critical services on bare metal. We’re looking for a hands-on DevOps/Platform Lead to own the developer platform and deployment experience—building scalable, reliable, and automated infrastructure that accelerates product teams.
Responsibilities:
● Lead & grow the team: Coach 3–5 DevOps/Platform engineers; set standards, run 1:1s, drive roadmaps and delivery.
● Own the platform: Design, operate, and evolve Kubernetes-based environments (multi-cluster, multi-region) across cloud + on-prem.
● IaC & GitOps: Standardize with Terraform/Helm and GitOps (Argo CD/Flux). Create reusable modules, blueprints, and golden paths.
● CI/CD at scale: Build fast, reliable pipelines (Build/Test/Deploy), artifact registries, environment promotions, and preview environments.
● Observability: Ensure first-class monitoring, logging, and tracing (Prometheus/Grafana/ELK/OTel); tighten feedback loops for engineers.
● Networking for hybrid: Own service connectivity—ingress, LBs, CNI, east-west traffic, API gateways, and secure cloud↔on-prem peering.
● Stateful & storage: Operate CSI-backed storage, object/block integrations, and tune performance for stateful workloads where needed.
● Performance & scalability: Capacity planning, autoscaling strategies (HPA/VPA/KEDA), rollout strategies (blue/green, canary).
● Developer experience: Ship internal self-service (IDP) portals, templates, and CLIs so teams can provision infra safely and quickly.
● Tooling & modernization: Evaluate and introduce tools that improve reliability, speed, or cost—measure impact and adopt pragmatically.
Must-have experience:
● 7–10 years in DevOps/Platform/SRE roles
● Deep hands-on with Kubernetes (cluster lifecycle, upgrades, multi-cluster patterns) and containers.
● Strong IaC (Terraform) and Helm; production GitOps workflows (Argo CD/Flux).
● Cloud (AWS/GCP/Azure) plus real exposure to on-prem/bare metal or virtualization (KVM/Proxmox/VMware).
● Solid networking fundamentals (VPCs/VNETs, VPNs/peering, DNS, L4/L7 load balancing, ingress).
● CI/CD design and operation (GitLab CI/GitHub Actions/Jenkins or similar); caching, parallelization, test orchestration.
● Observability stacks (Prometheus/Grafana, ELK/EFK, OpenTelemetry) and performance troubleshooting.
● Proficient in automation/scripting (Python or Go preferred; Bash a given). Git-centric workflows.
Nice to have:
● Experience with cursor or similar AI tools.
● 2+ years leading a small engineering team.
● Multi-cluster management (Cluster API, Rancher, GKE Autopilot/EKS blueprints).
● Service mesh experience (Istio/Linkerd) and traffic management for canary/blue-green.
● Cost awareness for infra (right-sizing, autoscaling, spot/RI/Savings Plans).
● Supply-chain hardening knowledge (SBOMs, provenance) from a platform perspective.
● Experience building an Internal Developer Platform (Backstage/Port or homegrown).
Soft skills:
● Pragmatic leadership: balances vision with hands-on delivery.
● Excellent communicator with product/dev teams.
● Bias for automation, simplification, and measurable outcomes