SRE - Site Reliability Engineer
Location: London/UK (Remote)
Contract: 12 Months Initial
Day rate : £55 Per Hour - £62 Per Hour Inside IR35
Job Overview
We are looking for a Senior Site Reliability Engineer with strong experience in Observability, Monitoring and Distributed Systems to support large-scale cloud infrastructure supporting millions of devices globally. The role focuses on building and scaling monitoring, logging and alerting platforms to ensure high availability and performance of cloud services.
Responsibilities
Design, deploy and scale observability platforms
Manage and scale Prometheus monitoring systems
Deploy and maintain large Elasticsearch clusters
Build and maintain data pipelines using Kafka
Develop alerting and monitoring frameworks
Automate infrastructure using Terraform and Ansible
Develop tools and scripts using Python, Go, Ruby or Bash
Work with Linux systems (Debian/Ubuntu)
Participate in on-call rotation
Improve system reliability, performance and scalabilityRequired Skills
5+ years experience in Site Reliability Engineering / DevOps
Strong Linux systems experience
Observability and Monitoring tools experience
Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
Kafka
Terraform / Infrastructure as Code
Ansible / Configuration Management
Programming experience (Python, Go, Ruby or Bash)
Distributed systems and cloud infrastructure experienceThis is an urgent vacancy where the hiring manager is shortlisting for an interview immediately. Please apply with a copy of your CV or send it khushboo. pandey @ randstad. Co. uk
Randstad Technologies is acting as an Employment Business in relation to this vacancy