Site Reliability Engineer

at Decagon
Location	Dar es Salaam, Tanzania, United Republic of
Date Posted	August 16, 2022
Category	Engineering
Job Type	Full-time
Currency	TZS

Description

About the job

Decagon is Nigeria’s leading tech talent accelerator connecting the top 0.5% of software Engineers with global companies looking to scale their engineering teams.

Driven by a passion to make Nigeria a Top 10 software Engineering nation within a decade, we recruit and place only the brightest of software engineers.

Founded in 2018 with a global network of 400+ software Engineers and 120+ partner companies spread across Europe, the USA, and Africa, Decagon is committed to connecting exceptional software Engineers to global companies across our network.

The Role

Develop, deploy, and operate cloud-native infrastructure in support of SaaS platform
Develop and improve instrumentation for understanding and troubleshooting the health and availability of services
Bring a mindset of standards and best practices to help create observability solutions that the team would want to adopt
Participate in an on-call rotation
Drive a culture of automation, both within the team and throughout the organization, in order to scale efficiently and reliably
Participate in technical discussions to aid system design, analysis, and troubleshooting
Help engineering teams to develop, test, debug and release scalable, resilient and highly available cloud-native applications

Ideal Profile

4+ years of experience with implementation, operations, and maintenance of cloud services
A drive to inspire adoption through enthusiasm
An understanding of the importance of a strong feedback loop with other teams and individuals across the organization
A deep understanding of cloud computing concepts and solutions, specifically with Google Cloud Platform
A solid understanding of Identity and Access Management, as well as setting and auditing access policies
Experience with cloud-native approaches to security concerns
Hands-on experience with container and container orchestration technologies: Kubernetes, Docker, Podman, etc.
Experience working with Infrastructure-as-Code tools
Intimate understanding of one or more of these monitoring and observability tools: DataDog, Prometheus, Grafana, Jaeger, Honeycomb
Very strong problem solving & troubleshooting skills, including the ability to perform root cause analysis and preventative analysis

Nice To Have

You have experience in building systems in a microservice environment, understanding the basic building blocks of resilient and scalable software
Experience with web applications developed in Python or Ruby
Knowledge of some or all of: web/network protocols, security, data persistence, and CI/CD pipelines
An understanding of modern software development practices: TDD/BDD, hexagonal design, etc.
An understanding of Linux primitives: process scheduling, signals, namespaces, authentication/authorization, etc.

What's on Offer?

Flexible working options
Excellent career development opportunities
Opportunity to make a positive impact

Job

Description

About the job

Related Jobs