March 14, 2021 · 9 min read ·
This article is part of a series of blog posts on designing a best practices Google Cloud environment using Terraform.
So you’ve decided to move to Google Cloud. Whether it was for the live migration of running Virtual Machines without needing to stop them, global VPCs that allow your VMs to communicate across continents without touching the public internet, or simply because you wanted an easy-to-use, fully-managed Kubernetes solution that automatically takes care of node updates and repairs. There are countless reasons why you would want to take advantage of what Google Cloud has to offer, but how do you go from the decision to use Google Cloud to actually running production workloads on Google?
In our years of experience working as engineers and architects on Google Cloud implementations, we have seen many companies struggle with the onboarding process, juggling the need to move quickly with their lack of Google Cloud experience. To avoid unnecessary tech debt in the future and ensure your migration runs as smoothly as possible, you’ll need to build a solid foundation. To help you with this, over the next few weeks, we’ll be releasing a series of blog posts containing tips around the various components that go into a best practices Google Cloud environment:
- Resource Management
- Identity and Access Management & Security
Along the way we’ll also include some relevant Terraform tips. We’ll begin this week with arguably the most fundamental of all of the building blocks — Resource Management.
Google Cloud Resource Management Concepts
Organization, Projects and folders make up the building blocks of the Google Cloud resource hierarchy. These building blocks provide ownership where each Google Cloud resource has a parent that controls its lifecycle. They provide grouping, as resources can be assembled into projects and folders that logically represent services, applications or organizational entities, such as teams and departments within your organization.
Furthermore, they provide the foundations for access control and configuration policies, which can be attached at any level and propagate down the hierarchy allowing for simplified management and improved security.
Before we go into design tips, let’s do a quick review of the resource hierarchy elements in Google Cloud.
- Organization resource — this is your root node. It’s technically optional, but if you’re a multi-person company you should have an Organization node. This node centralizes administration and prevents projects from being deleted when employees leave your company. The Org node gets provisioned for you upon project creation if you have either a Google Workspace (formerly G Suite) or Cloud Identity account.
- Folder resource — folders can be seen as sub-organizations in your Org node in that they act as policy enforcement points for IAM and Organization Policies which we’ll discuss later.
- Project resource — like the Org and Folder nodes, projects are also policy enforcement points. Unlike these nodes, projects are where you enable billing, manage APIs, and ultimately deploy resources.
For more details, read the Google Cloud documentation. Now, let’s go over some tips.
Tips & Tricks
Each organization has a unique structure, culture, velocity and autonomy. And while there isn’t a predefined recipe that fits all scenarios, we wanted to share some insights we’ve gathered from our experience implementing Google Cloud resource management for enterprises.
Simpler folder structure is usually better
Before you start to build out your Google Cloud organization and hierarchy, take time to consider the following elements to your hierarchy design and some possible Google Cloud solutions to implementing these elements:
- Isolation: Where do you want to establish security boundaries? At the department and team level, at the application or service level, or between production, test and dev environments? Use Folders with their nested hierarchy and Projects to create isolation between your cloud resources. Set IAM policies, Organization Policy Constraints, and even firewall policies at the different levels of the hierarchy to determine who has access to which resources.
- Delegation: How do you balance autonomy with centralized control? Folders and IAM help you establish compartments where you can allow more freedom for developers to create and experiment while reserving other areas with stricter control. For example, you can create a Development Folder where users are allowed to create Projects, spin up virtual machines (VMs) and enable services. You can also safeguard your production workflows by collecting them in dedicated Projects and Folders where least privilege is enforced through IAM.
- Inheritance: How can inheritance optimize policy management? As we mentioned, you can define policies at every node of the hierarchy (organization, up to 10 folders, and project) and propagate them down. IAM policies are additive unless specifically excluded from child nodes. For example, if firstname.lastname@example.org is granted the Compute Admin role for a Folder, they will be able to start VMs in each Project under that Folder.
- Shared resources: Are there resources that need to be shared across your organization, such as, networks, VM images, service accounts? Use Projects and Folders to build central repositories for your shared resources and limit administrative privileges over these resources to only selected users and apply the least privilege principle to allow access to other users.
Terraform Tip: You’re better off designing folder structure around policy governance. This could allow you to have a flatter Folder hierarchy, which in turn will simplify developing and managing your Terraform. Because of this, we recommend fleshing out your IAM requirements before diving into your folder structure. We’ll be covering IAM in a separate post.
Conduct an in-depth assessment of your Organization Policy Constraints
Organization Policy Constraints are a collection of useful guard rails around different types of resources. For example, skipping the default network creation when creating projects is something we often recommend for security reasons. However, you need to carefully assess which constraints you enable on which folders as these are powerful rules that can also break things. To avoid any downtime, we recommend testing all of your Org Policies before making any broader changes to production environments. Here are some policies that we recommend alongside some caveats to watch out for:
Disable Automatic IAM Grants for Default Service Accounts
- Some Google Cloud services automatically create default service accounts. When a default service account is created, it is automatically granted the Editor role (roles/editor) on your project.
- Recommendation: To improve security, we recommend that you disable the automatic role grant as it comes with the extremely powerful Project Editor IAM role.
- Caveat: Make sure none of your deployments rely on the Default Service Account before disabling.
- Shielded VMs offers verifiable integrity of your Compute Engine VM instances, so you can be confident your instances haven’t been compromised by boot- or kernel-level malware or rootkits.
- Recommendation: To improve security, we recommend that you enforce that all Compute Engine VM instances created are Shielded VM instances.
- Caveat: Make sure some of the features of Shielded VMs like Secure Boot don’t prevent representative test VMs from booting before enabling in production.
Require OS Login
- OS Login lets you use Compute Engine IAM roles to grant or revoke SSH access to your Linux instances. OS Login is an alternative to managing instance access by adding and removing SSH keys in metadata.
- Recommendation: To improve security and easier administration by linking Linux to your Google identity, we recommend that you enforce that all new VM instances in your organization have OS Login enabled.
- Caveat: At the time of writing this (Oct ‘20), GKE instances do not support OS Login, so this constraint could break any running instances in a project that it is enabled on.
Skip default network creation
- Unless you choose to disable it, each new Google Cloud project starts with a default network. The default network is an auto mode VPC network with pre-populated firewall rules.
- Recommendation: To improve security and avoid using overly permissive firewall rules, we recommend skipping the default network creation.
- Caveat: As with Default Service Accounts, make sure none of your deployments rely on the default network.
Terraform Tip: Leverage the Google Cloud contributed Terraform module, which includes an exclusion parameter for child folders and projects. With this, you should be able to quickly and systematically test any combination of Org Policies resource configurations.
Develop and document a clean, consistent labeling & naming system
Labels are key-value pairs used to keep things organized. At a minimum, we recommend using them to keep track of cost centers, teams, environments, and services. Make sure that you have your labeling and resource naming policy documented clearly for your teams to use otherwise things can get messy. In making the policy, remember to keep the following in mind:
Establish a naming pattern that works within your limitations:
- Different resources have different naming guidelines that you can find by digging through the documentation. For example, GCE, Project IDs, and GCS bucket names all have their own specific requirements that need to be addressed in your naming manifesto.
- Most Google Cloud resources need to be unique, so you’ll probably need to create a random string suffix for resources such as projects and GCE instances.
Terraform Tip: Terraform resource random_id is helpful in your quest for unique resource names. Also, custom validation rules are stable as of Terraform 0.13 if you want to catch resource name length errors before the ‘apply’ calls the gcloud API. A combination of length(VARIABLE) and can(regex(PATTERN, VARIABLE)) blocks can accomplish this.
Labels feed into Google Cloud’s billing system:
- That makes them particularly useful in connecting the dots between your DevOps and Finance teams by giving you a key-value pair to slice your usage/billing data on. For example, let’s say you want to know how much the security team is spending on staging environments. If you diligently apply the key ‘environment’ with values ‘dev’,‘test’, ‘stg’ and ’prod’ then you can separate how much you are spending on staging environments. To answer the second half of the question, you’d need to apply a key for ‘team’ with values like ‘security’ or ‘appdev’. Just be careful when aggregating your cost data with multiple labels attached — you could inadvertently be displaying multiples of your actual costs. We’ll cover this in more detail in a later post when we cover Billing.
Resource Hierarchy design is not as simple as it appears on the surface. But as we’ve discussed, a proper implementation can benefit both the manageability of Google Cloud and Terraform.
In our next post, we’ll be covering IAM & Security in Google Cloud alongside some more Terraform tips.
If you’re looking for someone to design your Google Cloud environment, we’re here to help. We offer an end-to-end service that provides you with opinionated, customizable, best practice Google Cloud environments managed via GitOps pipeline. It includes everything you need to successfully automate deployment of production workloads (Resource Management, IAM, Networking, Security Policies, Terraform, Documentation). We also offer ongoing consultative support for our Reseller customers. Learn more about Cloud Foundations here.
Zencore was started in 2021 by former senior Google Cloud engineers, solution architects and developers. The consulting and services firm is focused on solving business challenges with Cloud Technology and Tools backed by a world class development, engineering, and data science team.