Senior Support Engineer – Service Reliability

Apar Consultancy Services AB, Brockport, NY, United States

Overview

For our client we are looking for a Senior Support Engineer to take on a role where you will own and improve service reliability and availability of these. You will be in a very important role in a Dev/Ops team that manages and supports services. You will also be able to develop future automation and tooling that will allow our customer to continuously improve service as our customer scale. You will contribute to the full-service lifecycle: from service development to live-service response, as our customer continuously deploy new and innovative functionality for their customers.
Responsibilities

Participate in the Support team on call roster and respond with command and control incident management during High Priority Events while maintaining internal and external SLAs in a 24×7 SaaS operation
Drive Problem Management/Retrospectives (“post mortems”)
Strong contribution and maintenance of our knowledge base
Contribute to the development of new tools and automation that ensures the service can be optimized and tuned with minimal human intervention
Accountable for working upstream with developers on monitoring, tools and architecture to deliver security, reliability, manageability and availability at scale
Point of escalation/decision maker on response level of incidents
Analyze trends and make recommendations in the areas of monitoring, incident and change management, cloud orchestration and support
Contribute to the future growth of the team by conducting candidate screenings and assessments
Personal profile

Extraordinary judgment and composure in high-pressure situations
Excellent customer service skills
A real passion for developing strong partnerships with customers and partners
Enthusiasm for working with customers and/or supporting internal projects and senior leadership, bringing order and efficiency to critical initiatives
Technologies

Experience with Docker, Salt Stack and Kubernetes orchestration tools
Knowledge of one of the following: Mongo, Influx, Postgres, Jenkins, Moshell and Artifactory
Knowledge of Cloud Platform systems and related APIs
Experience designing, setting up, maintaining and refining (noise reduction, auditing) monitoring tools such as Prometheus, Prometheus exporters, Grafana and Alert Manager
Demonstrable experience in one or more languages: Go, React, Node.js, Python, Java, Shell scripting, Groovy
Strong knowledge of TCP/IP networking, DNS, VPNs, HTTP and load-balancers
Knowledge of Atlassian suite (Jira, Confluence), Git
Familiarity with Containers (e.g. Docker, RKT) and IaaS
Network analysis, performance and application issues using TCPdump, Fiddler and Wireshark
Qualifications

Bachelor’s Degree in CS, MIS, or equivalent experience
3+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, LTE systems, storage, database, and application knowledge;
Solid communications skills both written and verbal. Able to effectively tailor messaging to different audiences: External Customer, Leadership, technical SME, or to other Support members
Previous experience in customer facing roles during high stress situations
Demonstrated skills as an influencer within a previous organization
In-depth knowledge of IT concepts, strategies, and methodologies; Agile knowledge a plus
In-depth knowledge of business operations, objectives, and strategies.

#J-18808-Ljbffr