Senior Site Reliability Engineer (SRE) – Datadog Observability1
Job Title: Senior Site Reliability Engineer (SRE) – Datadog Observability
Experience Required: 8+ years overall in SRE and Infrastructure Operations with minimum 3+ years hands-on experience in Datadog
Location: Hyderabad preferable but open for Pune and remote
Job Summary:
We are seeking an experienced Site Reliability Engineer (SRE) to lead end-to-end SRE implementation initiatives with a strong focus on Datadog Observability. The ideal candidate will bring deep technical expertise in building reliable, scalable, and observable systems, with hands-on experience in integrating enterprise applications and middleware
Key Responsibilities:
 
- Drive end-to-end SRE implementation, ensuring system reliability, scalability, and performance.
 - Design, configure, and manage Datadog dashboards, monitors, alerts, and APM for proactive issue detection and resolution.
 - Utilize the Datadog Roles API to create and manage user roles, global permissions, and access controls for various teams.
 - Collaborate with product managers, engineering teams, and business stakeholders to identify observability gaps and design solutions using Datadog.
 - Implement automation for alerting, incident response, and ticket creation to improve operational efficiency.
 - Work closely with business and IT teams to support critical Financial Month-End, Quarter-End, and Year-End closures.
 - Leverage Datadog AI
 - Provide technical leadership in observability, reliability, and performance engineering practices
 
Required Skills and Experience:
 
- 8+ years of experience in Site Reliability Engineering, Observability
 - Minimum 3+ years of hands-on experience with Datadog (dashboards, APM, alerting, log management, Roles API, and monitoring setup).
 - Proven experience implementing SRE best practices—incident management, postmortems, automation, and reliability metrics 
 - Excellent stakeholder management and communication skills; experience collaborating with business and IT teams.
 - Strong problem-solving mindset and ability to work in high-pressure production support environments.
 
Preferred Qualifications:
 
- Certification in Datadog or related observability platforms.
 - Knowledge of CI/CD tools and automation frameworks.
 - Experience in cloud platforms (AWS, Azure, or OCI).
 - Exposure to ITIL-based production support processes.