SRE Manager, OLAP Engine (Bytehouse) | 40HRS

SRE Manager, OLAP Engine (Bytehouse)

Job type: Full-Time

Posted: 24-11-2023

Salary: Competitive

Email: linh.chu@40hrs.vn

Job Description

- Building and managing the Global SRE team, including team recruitment, new talent training, system operation/maintenance/coordination and team culture building.

- Improve the cross-team/time zone/regional cooperation mechanism, and provide SRE solutions in line with actual business scenarios based on business orientation.

- Responsible for SRE team arrangement and project management, guiding basic SRE work to be more effective, and improving the overall SRE efficiency.

- Develop process specifications and plans for compliant access, configuration, disaster recovery and fault handling of critical paths of overseas SRE services.

- Responsible for continuously improving the core SRE capabilities of OLAP engine in efficiency, cost, quality, security, etc.

- Develop automation, data visualization and automated monitoring processes to facilitate the optimization of the cloud-native OLAP engine infrastructure.

- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.

- Manage oncall processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.

Job Requirements

- Bachelor degree or above in Computer Science or a related technical discipline and good English communication skills.

- Familiar with SRE-related processes, understand the development trend of SRE technology in the industry, and have a good ability to build an SRE system, 6 years+ SRE experience, big-data or OLAP engine SRE experience is best to have .

- Familiar with SRE technologies, including Kubernetes, Terraform, Ansible, Bash Scripting etc.

- Familiar with cloud computing technologies of Amazon Web Services, Google Cloud Platform and other suppliers.

- Expertise in operations, deployment, and trouble shooting high availability and quality assurance of large-scale distributed systems, with a strong focus on stability and performance.

- Possesses a strong sense of responsibility, a proactive team spirit, and a strong ability to comprehensively analyze and solve problems.

SUBMIT RESUME

Just fill out the short form below. Thank you!

Refer friend

Please fill all field in below refer to friend

Sign In

Choose one of the following sign in methods.