← all jobs

[Remote] Infrastructure Operations Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Lightning AI is the company behind PyTorch Lightning, building an end-to-end platform for developing AI systems. They are seeking an experienced Infrastructure Operations Engineer to help scale and operate their next-generation AI infrastructure platform, focusing on reliability, automation, and operational efficiency.

Responsibilities

  • At the direction of the Manager of Infrastructure Operations, design, build, and roll out new platforms and patterns to minimize incidents and enable customer facing and internal features
  • Deploy updates and improvements to support both Voltage Park’s internal and end customer use cases
  • Collaborate with colleagues in Infrastructure Engineering, Network Operations, Customer Success and Software and Platform Development Teams
  • Participate in the on-call rotation which is evenly distributed across all team members in a primary / secondary pattern where you are primary then move to a secondary position

Skills

  • 8+ years working with Linux as a server / hosting platform, extra points for Ubuntu experience
  • 5+ years experience with AWS
  • 2+ years experience with Kubernetes and strong container fundamentals
  • 2+ years experience with Terraform and Ansible
  • 2+ years with network attached storage management (via NFS, ceph, or other protocols). Extra points for experience with VAST storage systems
  • Experience with monitoring systems (Prometheus, ELK stack)
  • Familiarity with the gitops workflow
  • Software development experience using Python, Go, bash, or other languages for the purposes of automation & connecting systems & APIs together
  • Deep networking fundamentals, extra points for experience with datacenter level networks, 400Gb ethernet, and Infiniband
  • Experience building and delivering complex systems
  • Effective at navigating tradeoffs between design, risk, cost, and outcomes
  • Comfortable with navigating ambiguity
  • Strong written and oral communication
  • Experience with bare metal hardware troubleshooting and provisioning, extra points for working with Dell hardware
  • Experience with GPU servers, both in bare metal form or under virtualization
  • Deep experience with network switches, routers, and firewalls, particularly SONiC switches, Palo Alto firewalls and Juniper Networks as vendors
  • Experience with VAST storage systems

Benefits

  • Comprehensive medical, dental and vision coverage (U.S.); Private medical and dental insurance (U.K.)
  • Retirement and financial wellness support (U.S.); Pension contribution (U.K.)
  • Generous paid time off, plus holidays
  • Paid parental leave
  • Professional development support
  • Wellness and work-from-home stipends
  • Flexible work environment

Company Overview

  • The AI development platform - From idea to AI, Lightning fast ⚡️. Code together. Prototype. Train on GPUs. Scale. Serve. It was founded in 2019, and is headquartered in New York, New York, USA, with a workforce of 51-200 employees. Its website is https://www.pytorchlightning.ai.
  • More open positions

    [Remote] Field Marketing Associate

    Work from home Full-time role

    [Remote] Technical Recruiter

    Work from home Full-time role

    [Remote] IT Staff Engineer - AI Enablement & Automation (Remote, US)

    Work from home Full-time role

    [Remote] Senior Software Engineer, Endpoint Agent (Windows)

    Work from home Full-time role

    [Remote] Principal Consultant (AI Workforce - Copilot)

    Work from home Full-time role

    Remote Virtual Customer Care Representative – $30/hr – careerzynith – Work‑From‑Home Customer Experience Specialist

    Work from home Full-time role

    Experienced Medical Transcription Specialist – Remote Chat Support Agent in Medical Transcription, Earning $25-$35/hr

    Work from home Full-time role

    Experienced Associate Customer Onboarding Manager, SMB | EMEA at careerzynith

    Work from home Full-time role

    Specialist, Appeals Claims

    Work from home Full-time role

    Adjunct Faculty, Education (Creative Experiences and Play for All)

    Work from home Full-time role

    Experienced Home-Based Customer Service Advocate – Delivering Exceptional Travel Experiences with careerzynith

    Work from home Full-time role

    Remote Data Entry Specialist – Precision Data Management for careerzynith Airline Operations

    Work from home Full-time role

    Experienced Freelance Mental Health Chat Therapist – Online Consultations and Support

    Work from home Full-time role

    Patient Access Representative

    Work from home Full-time role

    Go-to-Market Engineer - Rio de Janeiro, Brazil

    Work from home Full-time role

    Part‑Time Remote Data Entry Specialist – Precise Records Management for careerzynith Aviation Operations

    Work from home Full-time role

    Clinical Pharmacist Patient Care - Remote in TX Market Only

    Work from home Full-time role

    Key Account Manager - Specialty Adhesives

    Work from home Full-time role

    Senior Event Manager

    Work from home Full-time role

    Online Customer Support Specialist – Remote, Flexible Schedule, Residual‑Income Focus, Mentorship‑Driven Growth

    Work from home Full-time role

    Experienced Telecommute Customer Service Representative – Airline Industry Expertise Required

    Work from home Full-time role