Gridmatic + Zencore | Customer Story: Monitoring GKE Batch Jobs

Zencore helped Gridmatic solution a way to monitor GKE batch jobs that aligned with their existing usage of Cloud Monitoring and Alerting.

Project Challenges

Gridmatic had GKE batch jobs that would potentially continue running after execution, resulting in additional pods being created and consuming resources in the GKE cluster. Metrics for Kubernetes jobs do not exist out of the box on Google Cloud (only on Anthos enabled cluster at the moment).

“The quality of support and the guidance we have received from Zencore since day 1 has helped us immensely while growing on Google Cloud.”
Matt Wytock, CEO, Gridmatic
Project Location: USA
Industry: Energy, Financial services
Use case: Cloud Native Monitoring
Website: Gridmatic

Zencore built a solution using kube-state-metrics and Managed Service for Prometheus. Kube-state-metrics listens to the Kubernetes API server and generates metrics about object state. Prometheus scrapes the kube-state-metric service and pushes those metrics into Cloud Monitoring. A custom query was developed to send alerts when jobs run beyond a specified threshold, and visualize in a dashboard.

Cloud Native Monitoring
High Performance Computing
Machine Learning

Gridmatic implemented the solution in their environment and is able to receive notifications when batch jobs run too long in their cluster. This solution reduced the operational burden on the infrastructure team, allowing them to focus on other engineering work and only requiring action when alerted.

About Gridmatic

Gridmatic works on enabling the clean energy transition, leveraging artificial intelligence and cloud technology for electricity markets.