Skip to content
Search icon

Gridmatic

Gridmatic: Efficiently Monitoring GKE Batch Jobs

Streamlining GKE Batch Job Monitoring: Zencore's Solution for Gridmatic's Cloud Monitoring and Alerting.

Gridmatic CUSTOMER SPOTLIGHT
 

Monitoring GKE Batch Jobs


Project Location:
Santa Barbara, CA

Industry:
Energy, Financial services
Use Case:
Cloud Native Monitoring
Website:
Gridmatic

Zencore helped Gridmatic solution a way to monitor GKE batch jobs that aligned with their existing usage of Cloud Monitoring and Alerting.

Project Challenges

Gridmatic had GKE batch jobs that would potentially continue running after execution, resulting in additional pods being created and consuming resources in the GKE cluster. Metrics for Kubernetes jobs do not exist out of the box on Google Cloud (only on Anthos enabled cluster at the moment).

“The quality of support and the guidance we have received from Zencore since day 1 has helped us immensely while growing on Google Cloud.”

Matt Wytock, CEO | Gridmatic
Matt Wytock | CEO, Gridmatic

Solution

Zencore built a solution using kube-state-metrics and Managed Service for Prometheus. Kube-state-metrics listens to the Kubernetes API server and generates metrics about object state. Prometheus scrapes the kube-state-metric service and pushes those metrics into Cloud Monitoring. A custom query was developed to send alerts when jobs run beyond a specified threshold, and visualize in a dashboard.

Expertise

  • Cloud Native Monitoring
  • GKE
  • Compute
  • High Performance Computing
  • Machine Learning

Solution

Gridmatic implemented the solution in their environment and is able to receive notifications when batch jobs run too long in their cluster. This solution reduced the operational burden on the infrastructure team, allowing them to focus on other engineering work and only requiring action when alerted.

About Gridmatic

Gridmatic works on enabling the clean energy transition, leveraging artificial intelligence and cloud technology for electricity markets.

TELL US WHAT YOU'RE SOLVING FOR

Schedule a Briefing