Manage Files in Google Cloud Storage With Python

    I recently worked on a project which combined two of my life's greatest passions: coding, and memes. The project was, of course, a chatbot: a fun imaginary friend who sits in your chatroom of choice and loyally waits on your beck and call, delivering memes whenever you might request them. In some cases, the bot would scrape the internet for freshly baked memes, but there were also plenty of instances where the desired memes should be more predictable, namely from a predetermined subset of memes hosted on the cloud which could be updated dynamically. This is where Google Cloud Storage comes in.

    Google Cloud Storage is an excellent alternative to S3 for any GCP fanboys out there. Google Cloud provides a dead-simple way of interacting with Cloud Storage via the google-cloud-storage Python SDK: a Python library I've found myself preferring over the clunkier Boto3 library.

    We've actually touched on google-cloud-storage briefly when we walked through interacting with BigQuery programmatically, but there's enough functionality available in this library to justify a post in itself.

    Getting Set Up

    Setting up a Google Cloud bucket is simple enough to skip the details, but there are a couple of things worth mentioning. First on our list: we need to set our bucket's permissions.

    Setting Bucket-level Permissions

    Making buckets publicly accessible is a big no-no in the vast majority of cases; we should never make a bucket containing sensitive information public (unless you're a contractor for the US government and you decide to store the personal information of all US voters in a public S3 bucket - that's apparently okay). Since I'm working with memes which I've stolen from other sources, I don't mind this bucket being publicly accessible.

    Bucket-level permissions aren't enabled on new buckets by default (new buckets abide by object-level permissions). Changing this can be a bit tricky to find at first: we need to click into our bucket of choice and note the prompt at the top of the screen:

    New buckets should prompt you for bucket-level permissions.

    Clicking "enable" will open a side panel on the right-hand side of your screen. To enable publicly viewable files, we need to attach the Storage Object Viewer role to a keyword called allUsers (allUsers is a reserved type of "member" meaning "everybody in the entire world).

    Finding Our Bucket Info

    When we access our bucket programmatically, we'll need some information about our bucket like our bucket's URL (we need this to actually know where items in our bucket will be stored). General information about our bucket can be found under the "overview" tab, take this down:

    To access the files we modify in our bucket, you'll need to know the URL.

    Generating a Service Key

    Finally, we need to generate a JSON service key to grant permissions to our script. Check out the credentials page in your GCP console and download a JSON file containing your creds. Please remember not to commit this anywhere.

    Configuring our Script

    Let's start coding, shall we? Make sure the google-cloud-storage library is installed on your machine with pip3 install google-cloud-storage.

    I'm going to set up our project with a config.py file containing relevant information we'll need to work with:

    """Google Cloud Storage Configuration."""
    from os import environ
    
    
    # Google Cloud Storage
    bucketName = environ.get('GCP_BUCKET_NAME')
    bucketFolder = environ.get('GCP_BUCKET_FOLDER_NAME')
    
    # Data
    localFolder = environ.get('LOCAL_FOLDER')
    • bucketName is our bucket's given name. The google-cloud-storage interacts with buckets by looking for buckets which match a name in your GCP account.
    • bucketFolder is a folder within our bucket that we'll be working with.
    • localFolder is where I'm keeping a bunch of local files to test uploading and downloading to GCP.

    With that done, we can start our script by importing these values:

    """Programatically interact with a Google Cloud Storage bucket."""
    from google.cloud import storage
    from config import bucketName, localFolder, bucketFolder
    
    ...

    Managing Files in a GCP Bucket

    Before we do anything, we need to create an object representing our bucket. I'm creating a global variable named bucket. This is created by calling the get_bucket() method on our storage client and passing the name of our bucket:

    """Programatically interact with a Google Cloud Storage bucket."""
    from google.cloud import storage
    from config import bucketName, localFolder, bucketFolder
    
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucketName)
    
    ...

    To demonstrate how to interact with Google Cloud Storage, we're going to create 5 different functions to handle common tasks: uploading, downloading, listing, deleting, and renaming files.

    Upload Files

    Our first function will look at a local folder on our machine and upload the contents of that folder:

    from os import listdir
    from os.path import isfile, join
    
    ...
    
    def upload_files(bucketName):
        """Upload files to GCP bucket."""
        files = [f for f in listdir(localFolder) if isfile(join(localFolder, f))]
        for file in files:
            localFile = localFolder + file
            blob = bucket.blob(bucketFolder + file)
            blob.upload_from_filename(localFile)
        return f'Uploaded {files} to "{bucketName}" bucket.'

    The first thing we do is fetch all the files we have living in our local folder using listdir(). We verify that each item we fetch is a file (not a folder) by using isfile().

    We then loop through each file in our array of files. We set the desired destination of each file using bucket.blob(), which accepts the desired file path where our file will live once uploaded to GCP. We then upload the file with blob.upload_from_filename(localFile):

    Uploaded ['sample_csv.csv', 'sample_text.txt', 'peas.jpg', 'sample_image.jpg'] to "hackers-data" bucket.

    Listing Files

    Knowing which files exist in our bucket is obviously important:

    def list_files(bucketName):
        """List all files in GCP bucket."""
        files = bucket.list_blobs(prefix=bucketFolder)
        fileList = [file.name for file in files if '.' in file.name]
        return fileList

    list_blobs() gets us a list of files in our bucket. By default this will return all files; we can restrict the files we want to list to those in a bucket by specifying the prefix attribute.

    ['storage-tutorial/sample_csv.csv', 'storage-tutorial/sample_image.jpg', 'storage-tutorial/sample_text.txt', 'storage-tutorial/test.csv']

    Looks like test.csv lives in our bucket, but not in our local folder!

    Downloading Files

    A feature of the chatbot I built was to fetch a randomized meme per meme keyword. Let’s see how’d we’d accomplish this:

    from random import randint
    
    ...
    
    def download_random_file(bucketName, bucketFolder, localFolder):
        """Download random file from GCP bucket."""
        fileList = list_files(bucketName)
        rand = randint(0, len(fileList) - 1)
        blob = bucket.blob(fileList[rand])
        fileName = blob.name.split('/')[-1]
        blob.download_to_filename(localFolder + fileName)
        return f'{fileName} downloaded from bucket.'

    We leverage the list_files() function we already created to get a list of items in our bucket. We then select a random item by generating a random index using randint.

    It's important to note here that .blob() returns a "blob" object as opposed to a string (inspecting our blob with type() results in <class 'google.cloud.storage.blob.Blob'>). This is why we see blob.name come into play when setting our blob's filename.

    Finally, we download our target file with download_to_filename().

    Deleting Files

    Deleting a file is as simple as .delete_blob:

    def delete_file(bucketName, bucketFolder, fileName):
        """Delete file from GCP bucket."""
        bucket.delete_blob(bucketFolder + fileName)
        return f'{fileName} deleted from bucket.'

    Renaming Files

    To rename a file, we pass a blob object to rename_blob() and set the new name via the new_name attribute:

    def rename_file(bucketName, bucketFolder, fileName, newFileName):
        """Rename file in GCP bucket."""
        blob = bucket.blob(bucketFolder + fileName)
        bucket.rename_blob(blob,
                           new_name=newFileName)
        return f'{fileName} renamed to {newFileName}.'

    Managing Buckets

    We can also use google-cloud-storage to interact with entire buckets:

    • create_bucket('my_bucket_name') creates a new bucket with the given name.
    • bucket.delete() deletes an existing bucket.

    There are also ways to programmatically do things like access details about a bucket, or delete all the objects inside a bucket. Unfortunately, these actions are only supported by the REST API. I don't find these actions particularly useful anyway, so whatever.

    The source code for this tutorial can be found here. That's all, folks!

    Todd Birchard's' avatar
    Todd Birchard
    New York City Website
    Engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.
    Todd Birchard's' avatar
    Todd Birchard

    Engineer with an ongoing identity crisis. Breaks everything before learning best practices. Completely normal and emotionally stable.