I recently worked on a project which combined two of my life's greatest passions: coding, and memes. The project was, of course, a chatbot: a fun imaginary friend who sits in your chatroom of choice and loyally waits on your beck and call, delivering memes whenever you might request them. In some cases, the bot would scrape the internet for freshly baked memes, but there were also plenty of instances where the desired memes should be more predictable, namely from a predetermined subset of memes hosted on the cloud which could be updated dynamically. This is where Google Cloud Storage comes in.

Google Cloud Storage is an excellent alternative to long-established storage solutions (such as S3), which seem to have had us shackled since before we can remember. In addition to providing a cleaner GUI (without subtle IAM permissions or CORS configuration nightmares peppered in), Google Cloud Storage provides a dead-simple way of programmatically fetching, copying, and deleting your data via simple client libraries in the language of your choice.

I'm a Python-head like yourself, so you won't be disappointed to learn that the google-cloud-storage Python SDK has received my official Not Completely Annoying Stamp-of-Approval. When compared to AWS' clunkier Boto3 equivalent, it's clear google-cloud-storage is a library that was written with intent and ease-of use in mind.

If you're a returning reader, you may recall we actually touched on google-cloud-storage in a previous tutorial, when we walked through managing files in GCP with Python. If you feel comfotable setting up a GCP Storage bucket on your own, this first bit might get a little repetitive.

Getting Set Up

Setting up a Google Cloud bucket is simple enough to skip the details, but there are a couple of things worth mentioning. First on our list: we need to set our bucket's permissions.

Setting Bucket-level Permissions

Making buckets publicly accessible is a big no-no in the vast majority of cases; we should never make a bucket containing sensitive information public (unless you're a contractor for the US government and you decide to store the personal information of all US voters in a public S3 bucket - that's apparently okay). Since I'm working with memes which I've stolen from other sources, I don't mind this bucket being publicly accessible.

Bucket-level permissions aren't enabled on new buckets by default (new buckets abide by object-level permissions). Changing this can be a bit tricky to find at first: we need to click into our bucket of choice and note the prompt at the top of the screen:

New buckets should prompt you for bucket-level permissions.

Clicking "enable" will open a side panel on the right-hand side of your screen. To enable publicly viewable files, we need to attach the Storage Object Viewer role to a keyword called allUsers (allUsers is a reserved type of "member" meaning "everybody in the entire world).

Finding Our Bucket Info

When we access our bucket programmatically, we'll need some information about our bucket like our bucket's URL (we need this to actually know where items in our bucket will be stored). General information about our bucket can be found under the "overview" tab, take this down:

To access the files we modify in our bucket, you'll need to know the URL.

Generating a Service Key

Finally, we need to generate a JSON service key to grant permissions to our script. Check out the credentials page in your GCP console and download a JSON file containing your creds. Please remember not to commit this anywhere.

Configuring our Script

Let's start coding, shall we? Make sure the google-cloud-storage library is installed on your machine with pip3 install google-cloud-storage.

I'm going to set up our project with a config.py file containing relevant information we'll need to work with:

"""Google Cloud Storage Configuration."""
from os import environ


# Google Cloud Storage
bucketName = environ.get('GCP_BUCKET_NAME')
bucketFolder = environ.get('GCP_BUCKET_FOLDER_NAME')

# Data
localFolder = environ.get('LOCAL_FOLDER')
  • bucketName is our bucket's given name. The google-cloud-storage interacts with buckets by looking for buckets which match a name in your GCP account.
  • bucketFolder is a folder within our bucket that we'll be working with.
  • localFolder is where I'm keeping a bunch of local files to test uploading and downloading to GCP.

With that done, we can start our script by importing these values:

"""Programatically interact with a Google Cloud Storage bucket."""
from google.cloud import storage
from config import bucketName, localFolder, bucketFolder

...

Managing Files in a GCP Bucket

Before we do anything, we need to create an object representing our bucket. I'm creating a global variable named bucket. This is created by calling the get_bucket() method on our storage client and passing the name of our bucket:

"""Programatically interact with a Google Cloud Storage bucket."""
from google.cloud import storage
from config import bucketName, localFolder, bucketFolder

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucketName)

...

To demonstrate how to interact with Google Cloud Storage, we're going to create 5 different functions to handle common tasks: uploading, downloading, listing, deleting, and renaming files.

Upload Files

Our first function will look at a local folder on our machine and upload the contents of that folder:

from os import listdir
from os.path import isfile, join

...

def upload_files(bucketName):
    """Upload files to GCP bucket."""
    files = [f for f in listdir(localFolder) if isfile(join(localFolder, f))]
    for file in files:
        localFile = localFolder + file
        blob = bucket.blob(bucketFolder + file)
        blob.upload_from_filename(localFile)
    return f'Uploaded {files} to "{bucketName}" bucket.'

The first thing we do is fetch all the files we have living in our local folder using listdir(). We verify that each item we fetch is a file (not a folder) by using isfile().

We then loop through each file in our array of files. We set the desired destination of each file using bucket.blob(), which accepts the desired file path where our file will live once uploaded to GCP. We then upload the file with blob.upload_from_filename(localFile):

Uploaded ['sample_csv.csv', 'sample_text.txt', 'peas.jpg', 'sample_image.jpg'] to "hackers-data" bucket.

Listing Files

Knowing which files exist in our bucket is obviously important:

def list_files(bucketName):
    """List all files in GCP bucket."""
    files = bucket.list_blobs(prefix=bucketFolder)
    fileList = [file.name for file in files if '.' in file.name]
    return fileList

list_blobs() gets us a list of files in our bucket. By default this will return all files; we can restrict the files we want to list to those in a bucket by specifying the prefix attribute.

['storage-tutorial/sample_csv.csv', 'storage-tutorial/sample_image.jpg', 'storage-tutorial/sample_text.txt', 'storage-tutorial/test.csv']

Looks like test.csv lives in our bucket, but not in our local folder!

Downloading Files

A feature of the chatbot I built was to fetch a randomized meme per meme keyword. Let’s see how’d we’d accomplish this:

from random import randint

...

def download_random_file(bucketName, bucketFolder, localFolder):
    """Download random file from GCP bucket."""
    fileList = list_files(bucketName)
    rand = randint(0, len(fileList) - 1)
    blob = bucket.blob(fileList[rand])
    fileName = blob.name.split('/')[-1]
    blob.download_to_filename(localFolder + fileName)
    return f'{fileName} downloaded from bucket.'

We leverage the list_files() function we already created to get a list of items in our bucket. We then select a random item by generating a random index using randint.

It's important to note here that .blob() returns a "blob" object as opposed to a string (inspecting our blob with type() results in <class 'google.cloud.storage.blob.Blob'>). This is why we see blob.name come into play when setting our blob's filename.

Finally, we download our target file with download_to_filename().

Deleting Files

Deleting a file is as simple as .delete_blob:

def delete_file(bucketName, bucketFolder, fileName):
    """Delete file from GCP bucket."""
    bucket.delete_blob(bucketFolder + fileName)
    return f'{fileName} deleted from bucket.'

Renaming Files

To rename a file, we pass a blob object to rename_blob() and set the new name via the new_name attribute:

def rename_file(bucketName, bucketFolder, fileName, newFileName):
    """Rename file in GCP bucket."""
    blob = bucket.blob(bucketFolder + fileName)
    bucket.rename_blob(blob,
                       new_name=newFileName)
    return f'{fileName} renamed to {newFileName}.'

Managing Buckets

We can also use google-cloud-storage to interact with entire buckets:

  • create_bucket('my_bucket_name') creates a new bucket with the given name.
  • bucket.delete() deletes an existing bucket.

There are also ways to programmatically do things like access details about a bucket, or delete all the objects inside a bucket. Unfortunately, these actions are only supported by the REST API. I don't find these actions particularly useful anyway, so whatever.

The source code for this tutorial can be found here. That's all, folks!