Returning Candidate?

Site Reliability Engineer

Site Reliability Engineer

Job ID 
# of Openings 
Job Locations 
Posted Date 
Information Technology - Systems

More information about this job


Life is short, work somewhere awesome.


The SRE team is in-house expert on building reliable and maintainable systems. They plan infrastructure capacity to accomplish High Availability and uptime goals for all of DrFirst products.

The DevOps/ Site reliability team eliminates inefficiencies and incompatibilities which jeopardize service availability to deliver a reliable and scalable software service to DrFirst’s clients. Key aspects of this role include automation, configuration management and tools development while collaborating with the engineering team on projects/products as expert on reliability, performance and efficiency.


As a part of the Systems team, you will:

  • Periodically assess all monitoring requirements and implement necessary enhancements to meet changing/growing business needs
  • Enhance current automation processes of managing capacity, safely deploying software and mitigating failures
  • Tune and troubleshoot full-stack software applications using OOPS, Java, web-services, Oracle DB, Mongo DB, networks concepts and virtualization techniques
  • Proactively review, recommend and implement changes to the live infrastructure after ensuring the right validation has been carried out
  • Assist in roll out and deployment of new product features and installations to facilitate rapid iteration.
  • Confidently make informed, data-driven decisions in a fast-paced environment with competing priorities
  • Create and maintain Chef recipes for instance configuration management
  • Participate in 24/7 on call rotation and after hours deployment


To be successful in this role, you must have:

  • Bachelor’s degree in Computer Science or a related discipline (Master’s preferred)
  • At least 7 years of industry experience in managing and supporting SaaS applications (tools like Hazelcast, ELK, Zabbix, Nagios)
  • Strong working knowledge of Linux including client-server interaction, system statistics, performance tuning, filesystems and IO.
  • Hands on experience in deploying enterprise applications from development to production environment (Jenkins/ Travis/ sonar cube)
  • Working knowledge of at least one scripting language (PHP/Python/ Ruby/shell)
  • Passion for automation and hands on experience with at least one configuration management solution (Chef, puppet)
  • 1 year of experience in application performance tuning and troubleshooting using Java, SQL web-services, networks and virtualization tools
  • Strong understanding of network infrastructure (OSI, SMTP, HTTP, TCP/IP, REST APIs) and ability to troubleshoot software applications leveraging that knowledge
  • Self-motivated and technically curious
  • Ability to work independently and prioritize competing priorities

Nice to Have:

  • Experience working with AWS
  • Hands on operational experience in using Cache Services (Hazel cast or similar tools)
  • Source control management (SVN & Git)