Skip navigation EPAM

Senior/Lead AI DevOps/SRE Lithuania or Remote

  • hot

Senior/Lead AI DevOps/SRE Description

We are currently seeking an experienced Senior/Lead AI DevOps/SRE to join our team. In this pivotal role, you will collaborate closely with data scientists and software developers to ensure seamless integration and optimize the operational efficiency of our AI deployments. Your expertise will be pivotal in deploying, maintaining, and scaling our cutting-edge AI solutions, encompassing LLMs and RAG systems.

As a key team member, you will spearhead both traditional DevOps responsibilities and innovative approaches to MLOps. Your proactive involvement will be essential in driving the success of our AI initiatives and maximizing their impact across the organization.

What You’ll Do

  • Implement and maintain CI/CD pipelines for AI and machine learning projects, ensuring robust deployment strategies and continuous integration
  • Monitor and ensure the reliability, availability, and performance of AI applications, particularly those involving LLMs and RAG
  • Collaborate with AI research teams to operationalize machine learning models and systems efficiently
  • Develop and enforce best practices for version control, configuration management, and testing of AI-driven software solutions 
  • Utilize MLOps tools such as Kubeflow, MLflow, or TensorFlow Extended (TFX) to streamline the machine learning lifecycle from experimentation to production
  • Implement monitoring solutions that track both system metrics and model performance to facilitate proactive issue resolution
  • Participate in on-call rotations to support the operational health of critical systems, employing SRE principles to meet service-level objectives (SLOs) and reduce downtime

What You Have

  • Bachelor’s degree in Computer Science, Engineering, or a related field
  • Proven experience as a DevOps Engineer or SRE, with a strong background in software development and automation
  • Expertise in deployment and management of LLMs, including technologies like RAG
  • Proficient in CI/CD tools (Jenkins, GitLab CI, CircleCI) and infrastructure as code (Terraform, Ansible)
  • Solid knowledge of container orchestration technologies (Kubernetes, Docker)
  • Familiarity with MLOps tools and practices to support machine learning lifecycle management

Nice to have

  • Experience with cloud services (AWS, GCP, Azure), particularly in AI/ML deployments
  • Background in monitoring tools like Prometheus, Grafana, and ELK stack
  • Understanding of Python, particularly in data science and machine learning contexts
  • Certification in Kubernetes, AWS/GCP/Azure, or similar technologies

We Offer

  • Salary range 3800-6500 EUR gross, based on your experience and interview results
  • Outstanding career development roadmap to accelerate your journey
  • Engineering community of industry’s top professionals
  • Certification and mentoring programs, training, and unlimited access to LinkedIn Learning
  • Innovative solutions delivery to the world’s industry leaders
  • Regular assessments and salary reviews
  • Bonuses for participating in the referral program
  • Participation in the Employee Stock Purchase Plan
  • Flexible schedule and opportunity to work remotely from any place in Lithuania
  • Friendly team and enjoyable working environment
  • Relocation within offices in 50+ countries with throughout support for you and your family
  • Three additional vacation days and four trust days a year (sick leave without a medical certificate)
  • Private health insurance and corporate discounts for family members

About EPAM

  • As consultants, designers, architects, engineers and trainers, at EPAM we focus on building long-term partnerships with our clients enabling them to reimagine their businesses through a digital lens. We help our clients become faster, more agile and more adaptive enterprises, by delivering solutions through best-in-class engineering, strategy, design, consulting, education and innovation services