D
SVP, Site Reliability Engineering Domain Lead, SRE & Governance, Group Technology Closed
DBS Bank · Singapore · Full-time
Closed
Quick Summary
- Manage large team across multiple locations covering Applications and Infrastructure
- Drive automation and reduce manual interventions in Production support activities
- Lead Root Cause Analysis and coordinate Incident and Problem Management across Infra and Application domains
Job Insights
Time open
9 days
Times reposted
0 times
Full Description
Roles & Responsibilities
- Manage a large team of Production Support Personnel across multiple geographical locations covering Applications and Infrastructure
- Ensure SLAs on Alerts and Incidents (Application & Infra) are proactively managed and reduce Mean Time To Recover (MTTR) by 20%
- Ensure strict adherence to Standard Operating Procedures for recovery across Application and Infrastructure layers
- Deliver a playbook for onboarding new tasks / activities covering both Application and Infrastructure support models
- Identify opportunities to automate Production support activities (App & Infra) and reduce manual interventions
- Drive application and infrastructure improvements including performance, capacity, resilience, and operational stability; eliminate toil through automation
- Automate manual activities/processes and system health checks for Production Applications and Infrastructure; ensure SLIs/SLOs are defined and met
- Follow Production Support Processes and provide inputs to continuously strengthen them for App + Infra operations
- Provide status to leads, stakeholders and work with vendors to review Infra/Application design, fixes, and production deployments
- Coordinate recurring issues and ensure long-term resolution through robust Incident and Problem Management across Infra and Application domains
- Work with Infrastructure, Development, and Platform teams for root cause analysis of complex issues and outages
- Drive strong stakeholder management with focus on service stability, continuous improvement, and delivery excellence across Infra and Applications
- Lead Root Cause Analysis with technology partners and facilitate RCA reviews post incident resolution
- Work with Risk teams to respond to Audit & Risk RFIs; manage audit walkthroughs covering Infrastructure and Application controls
Requirements
- 10–15 years of experience in Banking with minimum 5+ years in a Run-the-Bank (RTB) Lead role covering Application and Infrastructure Support
- Strong implementation of Site Reliability Engineering (SRE) principles across Applications and Infrastructure including performance, reliability, monitoring, alerting, and maintenance
- Proactive capacity monitoring and observability of Production Infrastructure (compute, storage, network, platform, MF and DB) with automated alerting and reporting
- Proven experience in automation of Infra & Application support tasks and reducing manual toil
- Build and maintain monitoring and automation solutions for Infrastructure and Application stacks
- Drive service improvements by tracking SLIs/SLOs/SLAs and improving system and infrastructure performance KPIs
- Strong technical understanding across RDBMS, Unix/Linux, Cloud platforms, and Infrastructure components (servers, network, middleware, containers)
- Hands-on knowledge of infrastructure technologies, especially Linux, Database, OpenShift (or container platforms)
- Solid understanding of BAU support, Incident/Problem Management, and escalation management across distributed Infra-App environments
- Good understanding of Infrastructure architecture, capacity planning, DR/BCP, IT security, and regulatory compliance
- Strong collaborator with experience working across global teams and vendors
- Ability to present recommendations effectively in both written and verbal formats
- Proactive, independent, resourceful, and team-oriented mindset
Location:
DBS Asia HubJob:
TechnologySchedule:
RegularEmployee Status:
Full timeThis role has closed
Get notified when DBS Bank posts similar IT & Systems roles
Set up Telegram Alerts