Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

45 Site Reliability Engineer jobs in Thailand

Site Reliability Engineer

฿900000 - ฿1200000 Y 2C2P

Posted today

Tap Again To Close

Job Description

Site Reliability Engineer (SRE)

Thailand, Bangkok | Full Time | Technology

Site reliability engineers are responsible for improving the quality of software processes and services in production. They design code to automate processes to improve the efficiency of deliverables and act as a bridge between development and operations. SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).

Working Location: Empire Tower (100% onsite, accessible via BTS Chong Nonsi)

Job Description

Responsibilities:

Monitor the health of your services and work with developers to increase the velocity of changes using built-in support for service monitoring.
Select metrics for SLIs, set SLOs, and track error budgets to mitigate risk for the service.
Use powerful dashboards to aggregate metrics and logs, including golden signals to reduce MTTR and quickly answer questions about service health.
Take ownership of platform-related incident management and resolution, ensuring timely communication and effective problem-solving.
Automate various provisioning and maintenance tasks using scripts and automation tools

Qualifications

Qualifications:

year of experience as software engineer or systems administrator and willing to be

SRE in the future for Junior level.

Minimum 5 year of experience as SRE for Senior level.
Experience with coding at least one language (Bash, Python, PowerShell, etc.)
Ability to use observability tools such as Datadog, Grafana, ElasticSearch, and Kibana
Ability to use cloud services (AWS, etc.)
Good command in English both spoken and written

Nice to have:

Knowledge of best practices and IT operations in Always-Available and highly-scalable

services

Experience with automation CI/CD tools (Github Actions, Jenkins, Ansible, Terraform,

etc.)

- Experience with containerization, container orchestration, microservices - Docker,

Kubernetes, (K8s), Helm

- Knowledge of IT service management (ITSM) - Incident management, problem

management, change management

We offer an attractive remuneration package, a fast-paced and exciting working environment, and provide challenging opportunities for life-long learning and career development.

Interested candidates are invited to send your comprehensive resume with current and expected salary package via this job ad. Please note that only shortlisted candidates will be notified.

Please consult our Candidate Privacy Notice to know more about how we collect, use, transfer and disclose our candidates' information:

By submitting your resume and information, you understand, acknowledge, and consent that your personal data will be processed in accordance with our Candidate Privacy Notice. You consent to the collection, use, transfer and disclosure of your personal data as well as to receive email and/or other electronic messaging communication from 2C2P.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿104000 - ฿130878 Y SCB TechX

Posted today

Tap Again To Close

Job Description

Job Summary :

SRE Engineers are typically responsible for the availability and reliability in AWS cloud based of critical platform services and applications, ensuring they meet the requirements in terms of SLI, SLO and SLA. SRE Engineers also take part in on-call duties to fix cases related to support incident escalation. SRE engineers will collaborate with cross-function team to build and run sustainable product system.

Job Responsibilities :

Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
Debug production issues across services.
Proposes ideas and solutions within the infrastructure team to reduce the workload by automation.
Measure and optimize system performance, create dashboard, making capacity planning and innovating to continually improve.
Improve reliability, quality, and time-to-market of our suite of software solutions

Qualifications:

Bachelor's degree in computer science/engineering or other highly technical
Ability to work under pressure.
1-3 years in AWS Cloud service. EC2, EKS, RDS, AWS batch, runbook script
1-3 years in DevOps tools ex. Jira, Gitlab, Confluence, Terraform.
1-3 years in Monitoring and Dashboard ex Prometheus , Grafana, ELK.
Good knowledge in phyton or RPA ins preferable.

The successful candidate will be joined by a fully agile development team with a fully cross-functional team to deliver the brand new SCB banking channel. He/She will be experienced with modern processes and technologies in the market such as Continuous Delivery, Docker, Amazon Web Services, also many new testing techniques and tools, etc. Candidate will also be challenging to work with many parties such as Vendors, and SCB IT Department.

Only shortlisted candidate will be contacted.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿90000 - ฿120000 Y Tong Hua Holding Public Company Limited

Posted today

Tap Again To Close

Job Description

Urgently Require

"Site Reliability Engineer / System Admin

" (Tong Hua Group

)

Responsibilities

:1. Maintain, monitor, and troubleshoot the company's cloud, blockchain, AI and associated business systems across on-premise and multi-cloud environments

.2. Deploy and manage applications on Linux platforms and virtualized infrastructure (Proxmox, VMware, OpenShift), handling system installations, configurations, and ongoing maintenance tasks

.3. Develop, implement, and manage CI/CD pipelines using tools such as GitHub Actions, Ansible, and Kubernetes to ensure seamless and efficient deployment workflows

.4. Design high-availability systems with load balancing (HAProxy, Nginx), caching (Redis), and failover configurations

.5. Conduct daily monitoring, data backup, and recovery using open-source monitoring tools (Prometheus, Grafana, Loki) for performance reporting, issue tracking, and proactive health checks

.6. Perform anomaly detection, root cause analysis, and automated alerting to address and prevent system failures and performance bottlenecks

.7. Automate operational tasks and improve system resilience through scripting (Bash, Python, or Golang) and configuration management tools

.8. Maintain and optimize infrastructure components such as Docker, Kubernetes, databases (PostgreSQL, MySQL), and distributed storage (Ceph, MinIO)

.9. Setup VPN, VPC, and secure networking for client environments with proper isolation and security

.10. Collaborate with cross-functional teams to support infrastructure improvements, incident response, and operational resilience

Requirement

s1. Bachelor's degree in Computer Science, Information Technology, or a related field, with 4+ years of relevant experience in DevOps, SRE, or similar roles

.2. Demonstrated experience with production-grade infrastructure in high-availability (e.g. load balancing) and high-performance environments (e.g. cache optimization)

.Proficiency in Linux administration and containerization (Docker, Kubernetes)

.3. Strong knowledge of CI/CD processes and automation tools (Ansible, Terraform) and experience scripting (Python, Shell) for operational automation

.4. Solid understanding of networking protocols (TCP/IP, DNS, DHCP) and networking expertise (VPN, VPC, firewalls)

.5. Hands-on experience with on-premise virtualization (VMware , ProxMox, OpenShift or similar) and cloud platforms

.6. Proficient in monitoring and logging solutions (Prometheus, Grafana, Loki) for proactive system management

.7. Familiarity with database management and distributed storage solutions, particularly PostgreSQL,YugabyteDB, Qdrant and MinIO

.8. Multi-cloud and hybrid environment experience

.9. Ability to communicate in English at a conversational level

Tong Hua Group · MRT Hua Lamphon

gIf you are interest, please send your updated CV with current and expected salary to my email :

mTel

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿1200000 - ฿2400000 Y Cathcart Associates Asia Recruitment Ltd.

Posted today

Tap Again To Close

Job Description

About the OpportunityCathcart Technology is working with a leading international organisation on a large-scale Cloud transformation programme. As part of a pioneering project, they are migrating from traditional on-premise systems to Cloud.

As an SRE, you'll work closely with global teams while being the key presence locally — ensuring reliability, automation, and observability across critical projects.

Responsibilities

Support cloud migration projects, moving systems from on-premise to private and public Cloud (AWS)
Build and maintain monitoring solutions, automating detections and responses.
Define and implement SLI and SLO metrics to ensure service availability.
Deploy new application releases into pre-production and production environments.
Drive automation in deployments, system reconfiguration, and monitoring improvements.
Collaborate with development, DevOps, and testing teams on continuous delivery and quality assurance.
Document incidents, solutions, and best practices while sharing knowledge with the wider SRE community.

What We're Looking For

Bachelor's degree in Information Technology, Computer Engineering, or related field
5+ years' experience in DevOps, Cloud, System Engineering, or a related field
Hands on experience with Kubernetes/OpenShift.
Experience with public Cloud (AWS or Azure).
Solid knowledge of Linux, VMs, and shell scripting
Familiarity with CI/CD tools such as Jenkins.
Experience with monitoring/logging tools (Nagios, Splunk, or similar).
Good communication skills in English

This is an exciting opportunity to be part of a pioneer Cloud migration project, working on high-impact systems with international collaboration. If you're passionate about reliability, automation, and scalable infrastructure — this role is for you.

For more details, please contact Cathcart Technology

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿70000 - ฿120000 Y SCB TECH X CO., LTD.

Posted today

Tap Again To Close

Job Description

Job Summary :

Job Responsibilities :

Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
Debug production issues across services.
Proposes ideas and solutions within the infrastructure team to reduce the workload by automation.
Measure and optimize system performance, create a dashboard, make capacity planning and innovate to improve continually.
Improve reliability, quality, and time-to-market of our suite of software solutions

Qualifications:

Bachelor's degree in computer science/engineering or other highly technical
Ability to work under pressure & New Grad is welcome to apply
1-3 years in AWS Cloud service. EC2, EKS, RDS, AWS batch, runbook script
1-3 years in DevOps tools ex. Jira, Gitlab, Confluence, Terraform.
1-3 years in Monitoring and Dashboard ex Prometheus , Grafana, ELK.
Good knowledge in phyton or RPA ins preferable.

The successful candidate will be part of a fully agile development team and a cross-functional team to deliver the brand-new SCB banking channel. He/She will be experienced with modern processes and

technologies in the market such as Continuous Delivery, Docker, Amazon Web Services, also many new testing techniques and tools, etc. Candidate will also be challenged to work with many parties such as Vendors, and SCB IT Department

Our Benefits :

Bonus
Birthday Leave
Mobile Allowance / Internet Allowance
Life / Accident Insurance
SCB Tele Care
Flexible Benefit
Housing Loan
Provident Fund
Cooperative Fund
Near BTS Phaholyothin 24 and BTS Ratchayothin
Shuttle Bus to MRT / BTS
Car Parking
Sport club / Fitness / Sport activity
Co-working space

SCB Tech X Co., LTD

Human Resources Division

18 SCB Park Plaza, Tower West A, 2nd Floor, Ratchadapisek Rd.,

Chatujak, Bangkok 10900 Thailand

(link removed)

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿600000 - ฿1200000 Y H LAB Co., Ltd.

Posted today

Tap Again To Close

Job Description

Job Summary

We are looking for a Site Reliability Engineer (SRE) to own the reliability, security, and performance of our healthcare-critical application platform. This role combines hands-on infrastructure operations, automation, and vendor coordination for our on-premise hospital sites. You will be part of the team that ensures our systems run smoothly, securely, and with minimal downtime — whether in the cloud or at customer premises.

Key Responsibilities

Incident & Reliability Management

Participate in an on-call rotation to respond to incidents impacting system availability.
Support software engineers during customer incidents, providing expertise in root cause analysis and rapid mitigation.
Debug production issues across services, infrastructure, and network layers.
Build monitoring and alerting that triggers on symptoms, not just outages, to catch issues early.

Infrastructure & Automation

Run and maintain infrastructure with Pulumi, GitHub Actions, ArgoCD, and Kubernetes.
Implement automation for deployments, upgrades, and routine maintenance tasks.
Document operational actions to create repeatable processes and automated solutions.
Improve operational processes to enhance system uptime and reduce human intervention.

Security & Compliance

Manage cloud and on-premise environments in accordance with company security guidelines and healthcare regulations (HIPAA, GDPR).
Implement security automation from pre-commit to production stages.
Collaborate with security teams to ensure patching, configuration management, and access controls are in place.

Vendor & On-Premise Site Management

Coordinate with on-premise site vendors to ensure system reliability, upgrades, and maintenance are performed according to SLAs.
Define technical requirements, monitoring standards, and incident escalation procedures for vendors.
Review vendor performance, provide feedback, and ensure alignment with company operational standards.
Support new site deployments, including infrastructure validation, vendor onboarding, and handover to operations.

Collaboration & Enablement

Educate internal teams on new infrastructure tools, cloud capabilities, and operational best practices.
Work closely with product, engineering, and security teams to design resilient and scalable architectures.
Actively engage in capacity planning, performance tuning, and long-term infrastructure strategy.

Technology Stack Skills

Languages: GoLang, Python, Shell Script
Cloud: Azure Cloud
CI/CD: GitHub Actions, ArgoCD, Pulumi, Terraform
Kubernetes Ecosystem: Kubernetes, Kustomize, Helm
Monitoring & Observability: Prometheus, Grafana
Infrastructure: Linux/UNIX, Docker
Networking: TCP/IP, DNS, HTTP, SMTP, distributed networks
Databases: SQL and NoSQL (e.g., SQL Server, PostgreSQL, OpenSearch)

Qualifications

Required:

Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent experience.
2+ years of software development experience in Go, Python, or Java.
2+ years in a Cloud Engineer or SRE role, with hands-on experience in Linux/UNIX, Docker, and Microsoft Azure.
Strong understanding of microservices architecture and distributed systems.
Experience managing production Kubernetes clusters and cloud infrastructure.
Knowledge of monitoring, alerting, and incident management practices.

Preferred:

Experience in vendor management for IT infrastructure or on-premise deployments.
Healthcare IT experience with regulated environments.
Familiarity with HL7, FHIR, DICOM, and healthcare integration patterns.
Prior work with hybrid (cloud + on-premise) environments.

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿900000 - ฿1200000 Y Bangkok Bank Public Company Limited

Posted today

Tap Again To Close

Job Description

Skillet & experience required

• Expertise in High-Volume Transaction Systems:

- Proven experience in managing and optimizing high-volume transaction systems, preferably in banking or financial services.

• Strong Technical Background:

- Solid understanding of network, server, and application-level troubleshooting, with hands-on experience in using monitoring and observability tools (e.g., New Relic, Prometheus, ELK stack, Grafana).

• Proficiency in Programming and Scripting:

- Skills in programming and scripting languages (e.g., Python, Bash) to automate tasks and integrate systems.

• Experience with Cloud and Container Technologies:

- Knowledge of cloud service platforms (e.g., AWS, Azure, GCP) and container orchestration tools (e.g., Kubernetes, Docker) to deploy and manage services.

• Understanding of CI/CD Tools and Practices:

- Experience with CI/CD tools (e.g., Jenkins, Azure DevOps) and practices to facilitate rapid and safe deployments.

• Familiarity with Security Standards:

- Understanding of security best practices and compliance standards relevant to transaction processing and financial data.

• Analytical Skills and Problem-Solving:

- Strong analytical skills with the ability to solve complex problems under pressure.

Responsibilities

• Service Scalability and Optimization:

- Work closely with the cross-functional teams to define and establish service level objectives (SLOs) and service level agreements (SLAs) of the system

- Work closely with the development team to ensure the transaction services platform is scalable, identifying and addressing any scalability or performance limits.

- Work closely with the cross-functional teams to perform capacity planning and resource allocation to ensure optimal system performance and scalability.

- Optimize the performance of the transaction services to handle peak loads efficiently.

• Transaction services Performance and Reliability Monitoring:

- Work collaboratively to develop and maintain monitoring tools, alerts, dashboards and processes to provide visibility into health, performance and reliability of transaction services, ensuring they meet SLAs.

- Setup monitoring system to measure key reliability metrics ( i.e. MTTF, MTTR MTBF, MTTD etc. )

- Analyze transaction patterns and identify potential bottlenecks or failure points in the platform.

• Incident Response and Troubleshooting:

- Work collaboratively with the Bank Operations support team as the first responder for any issues within the transaction services platform, employing a systematic troubleshooting approach to resolve issues quickly

- Develop and refine incident response protocols to minimize downtime and transaction failures.

- Conduct post-incident analyses to identify root causes and implement preventive measures to avoid future incidents.

• Continuous Integration/Continuous Deployment (CI/CD) for Transaction Services:

- Implement and maintain CI/CD pipelines for transaction services, ensuring smooth and reliable deployments with minimal impact on live environments.

- Automate service deployment and rollback procedures to enhance operational efficiency.

- Automate repetitive tasks and processes to improve efficiency and reduce manual intervention

• Security and Compliance Assurance:

- Ensure that all aspects of the transaction services platform adhere to industry security standards and compliance requirements, particularly those related to financial transactions.

- Work with the security team to implement and maintain security measures, such as encryption and access controls, to protect transaction data.

Working Location: Saengthong Thani Tower (Near BTS Chong Nonsri), Bangkok

If you require more information, please contact K. Pongpon Suksai (พงศ์พล) Tel

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Site reliability engineer Jobs in Thailand !

Set Email Alert:

Enter your email

Job title

Location

Site Reliability Engineer

฿600000 - ฿1500000 Y Bangkok Bank Public Company Limited

Posted today

Tap Again To Close

Job Description

Working Location: Saengthong Thani Tower (Near BTS Chong Nonsri), Bangkok

If you require more information, please contact K. Pongpon Suksai (พงศ์พล) Tel

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

฿900000 - ฿1200000 Y LINE Company (Thailand)

Posted today

Tap Again To Close

Job Description

Responsibilities

Maintain and scale LINE products and services to hundreds of millions of users from Thailand and around the world
Monitor and maintain health and availability of our production services in order to prevent outages and issues
Manage our continuous integration and continuous delivery platform all the way from development to production
Automate various provisioning and maintenance tasks using scripts and automation tools
Participate in our cross-functional product development teams and handle DevOps tasks in products
Help improve overall team productivity relating to development, testing and deployment

Qualifications

Bachelor's degree in any field
Strong background in Linux/Unix administration
Ability to use a wide variety of open-source technologies and cloud services (AWS, Google Cloud, OpenStack, etc.)
Strong grasp of automation CI/CD tools (ArgoCD, Github Actions, Jenkins, Ansible, Terraform, etc.)
Working understanding of scripting languages (Shell script, Python, Lua, etc.)
Knowledge and experience in monitoring and troubleshooting tools (ELK, Grafana, Prometheus, Sentry, Opentelemetry, etc.)
Experience working with Docker in production and container orchestration (Kubernetes, Rancher)
Knowledge of several database technologies (MySQL, Postgres, Redis, MongoDB, etc.)
Knowledge of message queue technologies (Kafka, RabbitMQ, etc.)
Knowledge of best practices and IT operations in always-available and highly-scalable services
Knowledge of programming languages (Golang, etc.) is a plus.

Location

LINE Thailand Head Office, Gaysorn Tower, Bangkok

This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

฿90000 - ฿120000 Y Ascend Money

Posted today

Tap Again To Close

Job Description

We are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to join our growing team. As a Senior SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our production systems. You will leverage your deep understanding of infrastructure, automation, and observability to champion operational excellence and build a resilient platform.

Key Responsibilities:

Manage and operate our Kubernetes platform, ensuring high availability, performance, and security.
Design, develop, and implement automation solutions for operational tasks, infrastructure provisioning, and application deployment.
Build and maintain a comprehensive observability stack (monitoring, logging, tracing) to proactively identify and resolve issues.
Implement and maintain proactive measures to ensure platform stability, performance optimization, and capacity planning.
Provide support and expertise for critical middleware tools such as RabbitMQ, Redis, and Kafka, ensuring their optimal performance and reliability.
Participate in our on-call rotation, troubleshoot and resolve production incidents efficiently, and implement preventative measures.
Collaborate effectively with development and other engineering teams.

Qualification:

Positive attitude and empathy for others.
Passion for developing and maintaining reliable, scalable infrastructure.
A minimum of 3 years of working experience in relevant areas.
Experience in managing and operating Kubernetes in a production environment.
Experienced with cloud platforms like AWS or GCP.
Experienced with high availability, high-scale, and performance systems.
Understanding of cloud-native architectures.
Experienced with DevSecOps practices.
Strong scripting and automation skills using languages like Python, Bash, or Go.
Proven experience in building and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI).
Deep understanding of monitoring, logging, and tracing tools and techniques.
Experience with infrastructure-as-code tools (e.g., Terraform, Ansible).
Strong understanding of Linux systems administration and networking concepts.
Experience working with middleware technologies like RabbitMQ, Redis, and Kafka.
Excellent problem-solving and troubleshooting skills.
Excellent communication and collaboration skills.
Strong interest and ability to learn any new technical topic.

This advertiser has chosen not to accept applicants from your region.

Industry

View All Site Reliability Engineer Jobs

Menu

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

45 Site Reliability Engineer jobs in Thailand

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Be The First To Know

Site Reliability Engineer

Job Description

Site Reliability Engineer

Job Description

Senior Site Reliability Engineer

Job Description

Nearby Locations

Other Jobs Near Me

Industry