Site Reliability Engineer

Join the ETP Growth Journey At ETP Group, we deliver the next generation of AI-powered, cloud-native SaaS platforms that are transforming retail and e-Commerce operations across Asia Pacific. As we empower brands with the agility, intelligence, and innovation they need to grow in a dynamic market, we grow too. To do that we need the right people on board. We’re always on the lookout for passionate professionals who are smart, self-motivated, and eager to make a real impact. If you love solving challenges, working with cutting-edge technology, and being part of a collaborative and fast-paced environment, ETP is the place for you. Here, you’ll find more than just a job. You’ll find the right opportunity to shape the future of unified commerce, whilst growing your career alongside a team that values innovation, ownership, and excellence. Ready to be part of our success story? Email your resume to careers@etpgroup.com — please include the position you’re applying for, a recent photograph, current and expected compensation, educational qualifications, work experience, and contact details. Company Description ETP Group is an AI-first SaaS company serving the Retail and e-Commerce industries across Asia Pacific. With 37 years of trust in the market, it supports 500+ brands in 17 countries through enterprise-grade platforms. ETP’s cloud-native solutions—ETP Unify and Ordazzle—cover POS, CRM, Inventory, Promotions, PIM, OMS, WMS, LMS, and seamless marketplace integration. For large-format retail, ETP V5 offers a hybrid omni-channel suite. Built on secure, scalable M.A.C.H architecture. ETP delivers frictionless, personalized experiences across channels. Its intuitive, asset-light platforms accelerate cloud transformation, reduce IT overhead, and help retailers enhance CX, drive growth, and lead in a fast-evolving commerce environment. Here is a glimpse of what we do - http://www.etpgroup.com/Videos.html

Kinh Nghiệm Yêu Cầu
1-2
Địa Điểm
Mumbai
Loại Vai Trò
Full Time
Chia sẻ trên

Key Responsibilities

  • Ensure uptime SLAs and overall reliability of production, staging, and test environments.
  • Continuously assess all platform components for correct configuration — including instance sizes, memory allocation, thread pools, JVM tuning, and log levels.
  • Review and optimize API gateway, service registry, load balancer, and cache service configurations.
  • Implement and maintain observability stack (metrics, logs, traces, dashboards, alerts).
  • Plan and execute capacity planning and autoscaling strategies.
  • Conduct performance, load, and stress testing to validate scalability and resilience.
  • Run chaos engineering experiments to ensure fault tolerance.
  • Collaborate with engineering teams to resolve performance bottlenecks and improve deployment practices.
  • Document configuration standards and operational best practices.

Skills & Qualifications

Must-Have:

  • Strong experience with Kubernetes / container orchestration in production.
  • Proficiency in cloud platforms (AWS / GCP / Azure) and autoscaling mechanisms.
  • Expertise in JVM-based service tuning (heap sizing, GC tuning, thread pool config).
  • Hands-on experience with API gateway technologies (e.g., Kong, Apigee, NGINX, Envoy).
  • Proficient in observability tools (Prometheus, Grafana, ELK, Jaeger, Open Telemetry).
  • Experience in load testing tools (k6, Gatling, JMeter) and chaos engineering (Gremlin, Litmus Chaos).
  • Strong understanding of microservices performance patterns and distributed systems.

Nice-to-Have:

  • Familiarity with MACH architecture principles.
  • Experience in e-commerce SaaS or other high-scale transactional platforms.
  • Knowledge of service mesh (Istio, Linkerd).
  • Experience with infrastructure-as-code (Terraform, Helm, Ansible).

What We Offer

  • Opportunity to work on a large-scale MACH-based platform with modern tech stack.
  • Collaborative, engineering-driven culture with focus on innovation and reliability.
  • Competitive salary and benefits package.
  • Professional growth opportunities in cloud-native, microservices, and SRE best practices.