← all jobs

[Remote] Sr Site Reliability Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Commence is a company focused on data-centric transformation in healthcare, aiming to improve health outcomes through efficient processes. They are seeking a Senior Site Reliability Engineer to ensure the reliability and operational health of their healthcare data platform, collaborating with engineering teams and managing incident responses.

Responsibilities

  • Design, implement, and own observability infrastructure including metrics, logging, tracing, and alerting across distributed systems
  • Define and enforce SLOs, SLIs, and error budgets in partnership with product and engineering teams
  • Lead incident response: triage, coordinate remediation, conduct blameless post-mortems, and drive systemic fixes
  • Build and maintain CI/CD pipelines that support rapid, safe delivery of changes to production
  • Collaborate with engineering teams on infrastructure changes; able to read, modify, and contribute to existing infrastructure-as-code (Terraform or CloudFormation)
  • Design and operate highly available, fault-tolerant systems—including auto-scaling, failover, and disaster recovery strategies
  • Reduce operational toil through automation; eliminate manual processes before they become habits
  • Collaborate with software engineers to establish reliability-first design patterns and review architectures for operational risk
  • Manage Kubernetes or container orchestration environments at scale
  • Ensure systems meet compliance and security requirements, particularly those applicable to healthcare data (HIPAA, SOC 2)
  • Provide technical mentorship and guidance to engineers across the organization on reliability practices
  • Participate in on-call rotation with a commitment to continuously reducing the need for it

Skills

  • 7+ years of experience in SRE, platform engineering, or DevOps roles
  • Exceptional problem-solving under pressure—demonstrated track record of diagnosing complex, high-stakes system failures and building durable solutions
  • Deep hands-on experience with AWS services including EC2, EKS/ECS, Lambda, RDS, S3, CloudWatch, and related tooling
  • Familiarity with infrastructure-as-code (Terraform or CloudFormation)—able to contribute to existing configurations
  • Experience designing and operating distributed systems with strict availability and latency requirements
  • Proficiency in at least one scripting or systems language (Python, Go, Bash, or similar) for automation and tooling
  • Experience with container orchestration (Kubernetes, ECS) in production environments
  • Expertise in observability tooling (OpenSearch, Prometheus/Grafana, or equivalent)
  • Hands-on experience with CI/CD platforms (GitHub Actions, Jenkins, CircleCI, or similar)
  • Proven ability to define and operationalize SLOs and error budgets
  • Experience with relational and NoSQL databases—performance tuning, replication, and backup strategies
  • Strong working knowledge of networking fundamentals: DNS, load balancing, VPCs, TLS
  • Excellent communication skills—able to translate technical risk into business impact for non-engineering stakeholders
  • AWS Certifications (Solutions Architect, DevOps Engineer, or SysOps Administrator)
  • Experience in healthcare technology or other regulated industries (HIPAA, SOC 2, FedRAMP)
  • Familiarity with chaos engineering practices and tooling
  • Experience with data pipeline reliability (ETL/ELT workflows, streaming systems)
  • Exposure to AI/ML infrastructure and the reliability challenges unique to model serving
  • Familiarity with additional cloud platforms (Azure, Google Cloud)
  • Contributions to open-source reliability or infrastructure tooling

Company Overview

  • Commence delivers AI-driven healthcare data platform and clinical expertise that supports analytics, decisions, and workflow improvement. It was founded in undefined, and is headquartered in Virginia Beach, Virginia, USA, with a workforce of 501-1000 employees. Its website is https://commence.ai.
  • More open positions

    [Remote] Associate Partner, Growth Marketing

    Work from home Full-time role

    [Remote] Salesforce Marketing Cloud Architect (remote)

    Work from home Full-time role

    [Remote] Frontend Engineer, Growth & Engagement Team

    Work from home Full-time role

    [Remote] Customer Service Manager - remote

    Work from home Full-time role

    [Remote] People Operations Specialist

    Work from home Full-time role

    [Remote] Sr. Sales Analyst

    Work from home Full-time role

    North America Retail Positions Sign-Up

    Work from home Full-time role

    Hiring Now: Program Manager

    Work from home Full-time role

    Lead GCP Engineer: AI Platforms & Development

    Work from home Full-time role

    Senior Director of Solutions Engineering

    Work from home Full-time role

    Accounts Recievable Specialist (Part-time)

    Work from home Full-time role

    Experienced Freight Dispatchers & Owner-Operator Recruiters Needed - Pay TBD

    Work from home Full-time role

    [Remote] Remote Indirect Tax Analyst - 80k-120k = Bonus

    Work from home Full-time role

    [Remote] Backend Engineer

    Work from home Full-time role

    Part-Time Accountant - Bookkeeper - Fractional CFO

    Work from home Full-time role

    Experienced Full Stack Lobby Customer Service Representative – Community Support and Assistance

    Work from home Full-time role

    Independent Travel Consultant

    Work from home Full-time role

    Remote Data Entry Analyst – HR Compliance & Fingerprint Processing for careerzynith – $24/hr – Full‑Time – Dallas, TX

    Work from home Full-time role

    Program Manager

    Work from home Full-time role

    On-Call Emergency Management Specialists

    Work from home Full-time role

    Senior Architectural Designer, Data Centers - Remote (U.S.)

    Work from home Full-time role