SRE Manager, OLAP Engine (Bytehouse)
Job type: Full-Time
Posted: 24-11-2023
Salary: Competitive
Email: linh.chu@40hrs.vn
Job Description
- Building and managing the Global SRE team, including team recruitment, new talent training, system operation/maintenance/coordination and team culture building.
- Improve the cross-team/time zone/regional cooperation mechanism, and provide SRE solutions in line with actual business scenarios based on business orientation.
- Responsible for SRE team arrangement and project management, guiding basic SRE work to be more effective, and improving the overall SRE efficiency.
- Develop process specifications and plans for compliant access, configuration, disaster recovery and fault handling of critical paths of overseas SRE services.
- Responsible for continuously improving the core SRE capabilities of OLAP engine in efficiency, cost, quality, security, etc.
- Develop automation, data visualization and automated monitoring processes to facilitate the optimization of the cloud-native OLAP engine infrastructure.
- Drive the design and engineering of tools, as well as platform solutions, to optimize product engineering and operation efficiencies.
- Manage oncall processes to respond to performance and reliability issues, and establish best practices for coordinating escalation to resolve issues and minimize downtime.
Job Requirements
- Bachelor degree or above in Computer Science or a related technical discipline and good English communication skills.
- Familiar with SRE-related processes, understand the development trend of SRE technology in the industry, and have a good ability to build an SRE system, 6 years+ SRE experience, big-data or OLAP engine SRE experience is best to have .
- Familiar with SRE technologies, including Kubernetes, Terraform, Ansible, Bash Scripting etc.
- Familiar with cloud computing technologies of Amazon Web Services, Google Cloud Platform and other suppliers.
- Expertise in operations, deployment, and trouble shooting high availability and quality assurance of large-scale distributed systems, with a strong focus on stability and performance.
- Possesses a strong sense of responsibility, a proactive team spirit, and a strong ability to comprehensively analyze and solve problems.