I may be risking my reputation with the following statement, but after years of being in the trenches, I'm ready to make an unsponsored assertion: Google Cloud Storage is a superior blob storage product to industry-standard alternatives (ahem, S3). Storage solutions such as S3 (and associated SDKs, such as Boto3) have maintained their market share advantage partially by being early-to-market. S3 was so early to market that we often forget the AWS ecosystem of vendor-lock was born as a result of the success of its first product. But is S3 and its tooling truly the best cloud-based storage solution available?

If we were to pick a cloud storage provider today, which would we pick? Vendor-locked organizations are urged to leave the room before we continue: this post isn't intended for you. This post is intended for the rare developers without shackles. I'm speaking to those who are in a position to choose whether they prefer straightforward GUIs, simple permission handling, and intuitive CORS handling. Most of all, how simple is the tooling of provided SDKs to programmatically enable fetching, copying, and deleting your objects & buckets?

I'm a Python-head like yourself, so you won't be disappointed to learn that the google-cloud-storage Python SDK has received my official "Not Completely Annoying" Stamp of Approval. Compared to the clunky and dated Boto3 library, google-cloud-storage is a library that was written with developer experience in mind.

💡
If you're a returning reader, you may recall we touched on google-cloud-storage in a previous tutorial, when we walked through managing files in GCP with Python. If you feel comfortable setting up a GCP Storage bucket alone, carry on.

Getting Set Up

Setting up a Google Cloud bucket is simple enough to skip the details, but there are a few things worth mentioning. First on our list: we need to set our bucket's permissions.

Setting Bucket-level Permissions

Making buckets publicly accessible is a big no-no in the vast majority of cases; we should never make a bucket containing sensitive information public (unless you're a contractor for the US government and you decide to store the personal information of all US voters in a public S3 bucket - that's apparently okay). Since I'm working with memes that I've stolen from other sources, I don't mind this bucket being publicly accessible.

Bucket-level permissions aren't enabled on new buckets by default (new buckets abide by object-level permissions). Changing this can be a bit tricky to find at first: we need to click into our bucket of choice and note the prompt at the top of the screen:

Bucket creation will prompt for resource permissions
Bucket creation will prompt for resource permissions

Clicking enable will open a side panel on the right-hand side of your screen. To enable publicly viewable files, we need to attach the Storage Object Viewer role to a keyword called allUsers (allUsers is a reserved type of "member" meaning "everybody in the entire world").

Finding Our Bucket Info

When we access our bucket programmatically, we'll need some information about our bucket, like our bucket's URL (we need this to know where items in our bucket will be stored). General information about our bucket can be found under the "overview" tab, take this down:

GCP Storage Bucket Metadata
GCP Storage Bucket Metadata

Generating a Service Key

Finally, we need to generate a JSON service key to grant permissions to our script. Check out the credentials page in your GCP console and download a JSON file containing your creds. Please remember not to commit this anywhere.

Configuring our Script

Let's start coding, shall we? Make sure the google-cloud-storage library is installed on your machine with pip3 install google-cloud-storage.

I'm going to set up our project with a config.py file containing relevant information we'll need to work with:

"""Google Cloud Storage Configuration."""
from os import environ, getenv, path
from dotenv import load_dotenv

# Resolve local directory
BASE_DIR: str = path.abspath(path.dirname(__file__))

# Google Cloud Storage Secrets
environ["GOOGLE_APPLICATION_CREDENTIALS"] = "gcloud.json"
load_dotenv(path.join(BASE_DIR, ".env"))

BUCKET_NAME = getenv("GCP_BUCKET_NAME")
BUCKET_URL = getenv("GCP_BUCKET_URL")
BUCKET_DIR = getenv("GCP_BUCKET_FOLDER_NAME")

# Example local files
LOCAL_DIR = path.join(BASE_DIR, "files")
SAMPLE_CSV = path.join(BASE_DIR, "sample_csv.csv")
SAMPLE_IMG = path.join(BASE_DIR, "sample_image.jpg")
SAMPLE_TXT = path.join(BASE_DIR, "sample_text.txt")

config.py

First, I set the environ["GOOGLE_APPLICATION_CREDENTIALS"] value to the path of the service key JSON file. This allows our app to authenticate any requests to interact with our bucket.

The next few variables are strictly related to Google Storage:

  • BUCKET_NAME: Our bucket's given name. The google-cloud-storage library can interact with any storage bucket you have access to simply by passing the bucket's unique name.
  • BUCKET_URL: The base URL of our bucket, as located on the "bucket details" page shown earlier. This is the "root" of our bucket, in which directories can be created or files can be uploaded.
  • BUCKET_DIR: For the sake of this tutorial, I've chosen to work within the confines of a single directory.

I've also decided to have some fun by setting a LOCAL_DIR variable pointed to a project directory: /files. This folder contains three sample files we'll use to upload, rename, and delete.

Managing Files in a GCP Bucket

With all the boring configuration stuff done, we can finally get to the good stuff.

"""Programatically interact with a Google Cloud Storage bucket."""
from google.cloud import storage
from config import bucket_name, bucket_dir, local_dir

...

Import config values from config.py

Before we do anything, we need to create an object representing our bucket. I'm creating a global variable named bucket. This is created by calling the get_bucket() method on our storage client and passing the name of our bucket:

"""Programmatically interact with a Google Cloud Storage bucket."""
from google.cloud import storage
from config import bucket_name


# Initialize Google Cloud Storage client
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)

...

Instantiate GCP storage clients

To demonstrate how to interact with Google Cloud Storage, we'll create 5 different functions to handle common tasks: uploading, downloading, listing, deleting, and renaming files.

Listing Files

Knowing which files exist in our bucket is a good start:

"""Programmatically interact with a Google Cloud Storage bucket."""
from typing import List, Optional, Tuple

from google.cloud import storage
from google.cloud.storage.blob import Blob

from config import BUCKET_NAME


# Initialize Google Cloud Storage client
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)


def list_files() -> List[Optional[str]]:
    """
    List all objects with file extension in a GCP bucket.

    :returns: List[Optional[str]]
    """
    blobs = bucket.list_blobs(prefix=BUCKET_DIR)
    blob_file_list = [blob.name for blob in blobs if "." in blob.name]
    return blob_file_list

List file paths of blobs in a given Bucket directory

list_blobs() gets us a list of files in our bucket. By default this will return all files; we can restrict the files we want to list to those in a bucket by specifying the prefix attribute:

[
    'storage-tutorial/sample_csv.csv',
    'storage-tutorial/sample_image.jpg',
    'storage-tutorial/sample_text.txt',
]

Output of listing files in a Bucket directory

Upload Files

Our first function will look at a local folder on our machine and upload the contents of that folder:

from os import listdir
from os.path import isfile, join

...

def upload_files(bucket_name: str, bucket_dir: str, local_dir: str) -> None:
    """
    Upload files to GCP bucket.

    :param str bucket_name: Human-readable GCP bucket name.
    :param str bucket_dir: Bucket directory in which object exists.
    :param str local_dir: Local file path to upload/download files.

    :returns: str
    """
    files = [f for f in listdir(local_dir) if isfile(join(local_dir, f))]
    for file in files:
        local_file = f"{local_dir}/{file}"
        blob = bucket.blob(f"{bucket_dir}/{file}")
        blob.upload_from_filename(local_file)
    print(f"Uploaded {files} to '{bucket_name}' bucket.")

Upload a local file to a GCP Bucket

The first thing we do is fetch all the files we have living in our local folder using listdir(). We verify that each item we fetch is a file (not a folder) by using isfile().

We then loop through each file in our array of files. We set the desired destination of each file using bucket.blob(), which accepts the desired file path where our file will live once uploaded to GCP. We then upload the file with blob.upload_from_filename(local_file):

Uploaded [
   'sample_csv.csv',
   'sample_text.txt',
   'sample_image.jpg',
] to "hackers-data" bucket.

Output of uploading files to a bucket directory.

Downloading Files

Let's get creative with the way we interact with files in our bucket. Instead of explicitly modifying single files we know the exact path of, we can mix things up with a simple pick_random_file() function to select a random file in our bucket directory.

With that twist, we'll download whichever file the function returns to us via a download_random_file() function:

from random import randint

...

def pick_random_file() -> Tuple[Blob, str]:
    """
    Pick a `random` file from GCP bucket.

    :returns: Tuple[Blob, str]
    """
    blobs = list_files()
    rand = randint(0, len(blobs) - 1)
    blob = bucket.blob(blobs[rand])
    return blob, blob.name


def download_random_file(local_dir: str) -> None:
    """
    Download random file from GCP bucket.

    :param str local_dir: Local file path to upload/download files.

    :returns: None
    """
    blob, blob_filename = pick_random_file()
    blob.download_to_filename(f"{local_dir}/{blob.name.split('/')[-1]}")
    print(f"Downloaded {blob_filename} to `{local_dir}`.")

Download a random file from a bucket directory

We leverage the list_files() function we already created to get a list of items in our bucket. We then select a random item by generating a random index using randint.

It's important to note here that .blob() returns a "blob" object as opposed to a string (inspecting our blob with type() results in <class 'google.cloud.storage.blob.Blob'>). This is why we see blob.name come into play when setting our blob's filename.

Finally, we download our target file to a local directory via download_to_filename(). Note how we split on the last slash of blob.name, as .name return the full file path of a given file.

Deleting Files

Deleting a file is as simple as .delete_blob:

...

def delete_file(bucket_name: str) -> None:
    """
    Delete file from GCP bucket.

    :param str bucket_name: Human-readable GCP bucket name.

    :returns: None
    """
    blob, blob_filename = pick_random_file()
    bucket.delete_blob(blob_filename)
    print(f"{blob_filename} deleted from bucket: {bucket_name}.")

Delete a file from a bucket

Renaming Files

To rename a file, we pass a blob object to rename_blob() and set the new name via the new_name attribute:

...

def rename_file(new_filename: str) -> None:
    """
    Rename a file in GCP bucket.

    :param str new_filename: New file name for Blob object.

    :returns: None
    """
    blob, blob_filename = pick_random_file()
    bucket.rename_blob(blob, new_name=new_filename)
    print(f"{blob_filename} renamed to {new_filename}.")

Rename an existing file

Managing Buckets

We can also use google-cloud-storage to interact with entire buckets:

  • create_bucket('my_bucket_name') creates a new bucket with the given name.
  • bucket.delete() deletes an existing bucket.

There are also ways to programmatically do things like access details about a bucket, or delete all the objects inside a bucket. Unfortunately, these actions are only supported by the REST API. I don't find these actions particularly useful anyway, so whatever.

The source code for this tutorial can be found below. That's all, folks!

GitHub - hackersandslackers/googlecloud-storage-tutorial: :floppy_disk: Tutorial for interacting with Google Cloud Storage via the Python SDK.
:floppy_disk: :cloud: Tutorial for interacting with Google Cloud Storage via the Python SDK. - GitHub - hackersandslackers/googlecloud-storage-tutorial: :floppy_disk: Tutorial for interacting with…

Github repository for this tutorial