We are storming the marketplace with the highly skilled, experienced, and certified professionals that businesses need.

Find your perfect job.

Site Reliability Engineer (Closed)

SkillStorm is seeking a Site Reliability Engineer for our client in Atlanta, GA OR Charlotte, NC OR Addison, TX OR Richmond, VA. Candidates must be able to work on SkillStorm's W2; not a C2C position. EOE, including disability/vets.

Job Description:

  • Need to be hands on
  • 10-15+ years of experience
  • Merchant Technology Engineering seeks a Steller, Self motivated, Problem solver, Automator, SRE engineer who has the responsibility for maintaining and establishing service level indicators (SLIs), objective (SLOs), agreements (SLAs), and error budgets for their systems and make sure these are met. SRE engineer are expected to spend a certain amount of their time doing operational work (making sure systems work as expected and code deployment) and improving processes and reduce toil.
  • Merchant is a brand-new business for the bank, and we’re looking for a special person who is a leader and is dedicated to delivering successful releases in alignment with program goals, while constantly improving quality. Are you looking to build your career? We have a huge opportunity on this team—we a growing explosively, doubling our team size in the next few months, and scaling our next generation platform by 17,000% in the next year

Objective of this Role:

  • Run all pipeline environment by monitoring availability and taking a holistic view of system health
  • Build software and systems to manage platform infrastructure and application
  • Improve reliability and quality of systems and application
  • Measure and optimize system performance, with an eye toward pushing our capability forward
  • Provide primary operational support and engineering for multiple large distributed application


  • Partner with development team to improve services through rigorous testing and release procedure
  • Gather and analyze metrics from both operating systems and application to assist in performance tuning and fault finding
  • Participate in system design consulting, platform management and capacity planning
  • Create sustainable system and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objective

Required Skills and Qualification:

  • Proficiency with SRE concepts such as SLO, SLA and SLI
  • Expertise in automation tools experience such as Ansible, Jenkins, XLR, Circle CI, Terraform
  • Experience with logging and monitoring implementations like Splunk, Dynatrace, ELK etc.
  • Ability to program (structured and OO) with one or more high level language, such as Python, Java,
  • Experience with bash scripting, Linux scripting
  • Experience in Docker/Kubernetes orchestration and management in any cloud environment
  • Participate in the team’s on-call rotation to address complex problems in real-time and keep services operational and highly available
  • Experience troubleshooting relational databases and distributed platforms
  • Experience with messaging and middleware products like Mulesoft, kafka
  • A proactive approach to spotting problems, areas of improvement and performance bottlenecks
  • Understands Jenkins, Python Scripts, and Ansible.
  • Has experience in automating manual tasks.