Skip to main content
Laptop

Lead systems operations Engineer

  • Technology
  • Full time
  • R-540904

About this role:

Wells Fargo is seeking a Lead Systems Operations Engineer.

In this role, you will:

  • Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
  • Make decisions on technical changes and enhancements
  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
  • You’ll lead the transformation of traditional platform operations into a modern Site Reliability Engineering (SRE) model—driving reliability by design, elevating SLIs/SLOs, automating operational toil, strengthening observability, and maturing incident & problem management. You’ll be hands-on while mentoring Ops and Engineering teams to adopt SRE practices at scale across the platform ecosystem.

Required Qualifications:

  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education

Job Expectations:

  • Reliability & Performance
    • Define and implement SLIs/SLOs and error budgets for critical platform services; drive SLO adoption across product and operations teams.
    • Build, enhance, and tune end-to-end observability (metrics, logs, traces) with focus on golden signals: latency, traffic, errors, saturation.
    • Partner with performance engineering teams to run load, stress, soak, and failover tests; identify and eliminate performance bottlenecks.
  • Platform & Automation
    • Identify and eliminate operational toil; implement automation and AI-driven workflows for reliability and operational excellence.
    • Generate AI-based observability assessments, maturity scoring, and gap analysis for all platform applications.
    • Build self-service reliability tooling: automated runbooks,  readiness checkers, golden paths, and standard reliability patterns.
  • Incident, Problem & Change
    • Lead Major  incidents as Incident Commander; ensure clear communication, rapid triage, and timely restoration.
    • Facilitate blameless postmortems, document corrective actions, and ensure follow-through.
    • Strengthen platform-level problem management through trend analysis, recurring issue elimination, and proactive risk reduction.
  • Culture & Enablement
    • Coach and mentor platform engineering, ops, and product teams on SRE principles and reliability-first mindset.
    • Define and maintain SRE maturity models, track adoption, and provide continuous improvement recommendations.
    • Ensure documentation—runbooks, dashboards, readiness checklists, reliability reviews—remains current, actionable, and standardized.

Required Qualifications:

  • Experience: 5+ years in large-scale distributed systems; minimum 5+ years hands-on experience in SRE, DevOps, or Platform Engineering.
  • Cloud: Expertise in one or more: AWS, Azure, GCP (cloud certifications preferred).
  • IaC & Automation: Terraform, Ansible/Chef; strong Git and GitOps practices.
  • Observability: Hands-on experience with Prometheus, Grafana, OpenTelemetry, ThousandEyes, AppDynamics, Aternity.
  • CI/CD: Azure DevOps, GitHub Actions, Jenkins, or GitLab CI; strong understanding of artifact management & environment promotion workflows.
  • Programming: Proficiency in Python/Go/Java for scripting, automation, and API integrations.
  • Reliability Practices: SLIs/SLOs, error budgets, capacity planning, canary/blue‑green deployments, chaos engineering, DR testing.
  • Processes: Strong knowledge of Incident/Problem/Change management, blameless postmortems, on‑call operations, and runbook development.
  • Excellent communication, documentation, and cross-team collaboration skills.

Posting End Date: 

15 May 2026

*Job posting may come down early due to volume of applicants.

We Value Equal Opportunity

Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.

Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.

Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.

Applicants with Disabilities

To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.

Drug and Alcohol Policy

 

Wells Fargo maintains a drug free workplace.  Please see our Drug and Alcohol Policy to learn more.

Wells Fargo Recruitment and Hiring Requirements:

a. Third-Party recordings are prohibited unless authorized by Wells Fargo.

b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.


Join our talent community

Learn about upcoming events and career opportunities at Wells Fargo

Talent Community
JK 1212 1236 B 4MP