Design, deploy, and maintain scalable, secure, and highly available infrastructure across on-premise and cloud environments.
Own CI/CD pipelines, ensuring fast, safe, and repeatable deployments.
Administer, harden, and troubleshoot Linux servers in production environments.
Manage and optimize containerized workloads (Docker) and orchestration where applicable.
Operate and monitor critical systems, including databases, messaging systems, and storage clusters.
Ensure reliability and performance of real-time and high-throughput services.
Implement monitoring, alerting, logging, and observability standards (metrics, traces, dashboards).
Lead incident response, root cause analysis, and post-mortems, driving long-term improvements.
Enforce security best practices across infrastructure, access control, secrets management, and backups.
Collaborate with development teams to improve system architecture, performance, and resilience.
Document infrastructure, operational procedures, and standards.