The Site Reliability Engineering Team helps the companies’ Engineering and Product Teams in delivering secure, reliable and resilient products faster, in a consistent way and at a lower cost.
As part of their Site Reliability Engineering Team you will work with various project teams providing operational tools, platforms, guidance, and support, such that Engineering and Product teams can easily integrate these into their developments and build reliable products and services that are secure, have the ability to reliably scale, are supportable and able to recover quickly in the event of issues or failure.
If you are committed to improving application performance, reliability, monitoring and ensuring the best user experience for their customers then please get in touch!
They are looking for people with good interpersonal skills who enjoy working in a fast-paced, delivery focused, agile environment. The position will be full remote
As a Site Reliability Engineer at this company you will:
- Have and apply broad knowledge of core web and cloud technologies.
- Take responsibility for solving complex issues.
- Automate tasks, deployments, and tests by creating infrastructure as code, taking responsibility for the quality of code you produce.
- Implement resilient, highly available systems.
- Share knowledge of tools and techniques with your wider team.
- Act as a team ambassador, supporting recruitment, identifying good practices for the other Technical Teams to adopt and sharing experiences
- Participate in our in-house (2nd line) support, and the out-of-hours support rota.
- Share knowledge among the Tech and Product teams, ensuring that your team function is understood by others and understanding the working of the wider organisation.
As a Site Reliability Engineer you will also:
- Provide technical leadership, advising and working with other Reliability Engineers, Engineering and product teams to identify the best solutions.
You may be:
- Experienced with UNIX-like operating systems and technologies used for web applications, e.g Linux, databases, backups, CDNs.
- Are experienced with AWS and the use of orchestration tools such as Terraform and Ansible.
- Have experience with monitoring tools and synthetic testing tools such as Datadog, Cloudwatch, Zabbix, Prometheus, Grafana.
- Understand software design principles and use of development tracking and deployment tools such as JIRA.
- Take a systematic approach to solving problems.
- Use testing to validate solutions.
- Understand agile environments and version control.
- Are familiar with web and coding security.
- Understand network services and protocols, eg HTTPS, TLS, SSH, TCP/IP, etc.
- Have familiarity with working practices such as test driven development, continuous integration and continuous delivery (tools inc, Jenkins, SonarQube).
Site Reliability Engineers should also have experience of:
- Owning project tasks, helping colleagues with their career development and coaching more junior staff members.