Site Reliability Engineer
Employee
100% Remote –
3/11/24
Site Reliability Engineer
(SRE)
Location: Brazil – Remote
Type: Full-time
Workplace: remote
Category: Product Strategy
JobDescription:
Guidewire is searching for a Site Reliability Engineer who is hungry for a rare chance to transform insurance with the industry’s leading Analytics platform. As a member of the Analytics Reliability Team, you’ll be responsible for building and evolving our SRE practice for Analytics.
The Analytics team at Guidewire uses internet scale data collection, adaptive machine learning, and insurance risk modeling capabilities to help insurers and other financial institutions model evolving risks, develop new products, and make better business decisions.
Downtime and failures are inevitable, but how SREs deal with the problem is what’s important. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our operating environments. Part of the responsibility SREs have is to collaborate with developers to troubleshoot and solve problems and reduce customer impact where possible. SREs will also need to go one step further after the incident to document and examine what went wrong and develop measures such as automated runbooks to handle the issue moving forward.
Responsibilities:
This is an On-Call position
Responding to any critical incidents and ticket escalations
Following and documenting our post incident response/post mortem processes
Executing planned patching or improving related automation
Engineering to reduce toil, tune alerts, and improve documentation
When NOT on-call, you will be responsible for:
Engineering to re-platform or migrate layers of our infrastructure to Kubernetes ecosystems
Analyzing our AWS infrastructure and related applications for design and architectural opportunities to improve overall reliability
Creating patterns of observability to ensure all alerts have consistent content/config to ensure triaging is short and overall MTTR is continuously improved
Analyzing incident data to determine the next opportunity to improve reliability
Influencing engineers to improve application reliability and scalability to run efficiently
Documenting every action, if not captured as code, so your findings turn into repeatable actions and then into automation
Improve operational processes (such as deployments and upgrades) to make them as boring as possible
Required Skills:
Proven experience designing and deploying SLI’s, SLO’s, and Error Budgets
Proven experience triaging and debugging distributed systems on cloud infrastructure
Proven experience in designing and engineering CICD pipelines within K8S and legacy ecosystems
Proven experience in designing and engineering monitors, dashboards, and synthetic transactions in Datadog
Proven experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud native approaches
Proven experience in managing infrastructure config at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible
Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale
Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent
Preferred Skills:
AWS Certified in multiple categories
Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design
Proficiency with multiple pipelining tools such as Team City, Bitbucket Pipelines, Jenkins, and GitHub Actions
Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS RedShift, etc
Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. We also pride ourselves on being a pay for performance company.
Please note that the compensation details listed in this postings reflect the base salary only, and do not include bonus, equity, or benefits.
#LI_REMOTE
#feature
#sitereliabilityengineer #sre #aws #kubernetes #python
About Guidewire
Guidewire is the platform P&C insurers trust to engage, innovate, and grow efficiently. We combine digital, core, analytics, and AI to deliver our platform as a cloud service. More than 540+ insurers in 40 countries, from new ventures to the largest and most complex in the world, run on Guidewire.
As a partner to our customers, we continually evolve to enable their success. We are proud of our unparalleled implementation track record with 1600+ successful projects, supported by the largest R&D team and partner ecosystem in the industry. Our Marketplace provides hundreds of applications that accelerate integration, localization, and innovation.
For more information, please visit www.guidewire.com and follow us on Twitter: @Guidewire_PandC.
Guidewire Software Inc. provides equal employment opportunities to all applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. All offers are contingent upon passing a criminal history and other background checks where it’s applicable to the position.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.
Computer & IT, Networking, Software Development, Python
Tagged as: Remote, Telecommute, Work From Home