I may be risking my reputation with the following statement, but after years of being in the trenches, I'm ready to make an unsponsored assertion: Google Cloud Storage is a superior blob storage product to industry-standard alternatives (ahem, S3). Storage solutions such as S3 (and associated SDKs, such as Boto3
) have maintained their market share advantage partially by being early-to-market. S3 was so early to market that we often forget the AWS ecosystem of vendor-lock was born as a result of the success of its first product. But is S3 and its tooling truly the best cloud-based storage solution available?
If we were to pick a cloud storage provider today, which would we pick? Vendor-locked organizations are urged to leave the room before we continue: this post isn't intended for you. This post is intended for the rare developers without shackles. I'm speaking to those who are in a position to choose whether they prefer straightforward GUIs, simple permission handling, and intuitive CORS handling. Most of all, how simple is the tooling of provided SDKs to programmatically enable fetching, copying, and deleting your objects & buckets?
I'm a Python-head like yourself, so you won't be disappointed to learn that the google-cloud-storage Python SDK has received my official "Not Completely Annoying" Stamp of Approval. Compared to the clunky and dated Boto3 library, google-cloud-storage
is a library that was written with developer experience in mind.
Getting Set Up
Setting up a Google Cloud bucket is simple enough to skip the details, but there are a few things worth mentioning. First on our list: we need to set our bucket's permissions.
Setting Bucket-level Permissions
Making buckets publicly accessible is a big no-no in the vast majority of cases; we should never make a bucket containing sensitive information public (unless you're a contractor for the US government and you decide to store the personal information of all US voters in a public S3 bucket - that's apparently okay). Since I'm working with memes that I've stolen from other sources, I don't mind this bucket being publicly accessible.
Bucket-level permissions aren't enabled on new buckets by default (new buckets abide by object-level permissions). Changing this can be a bit tricky to find at first: we need to click into our bucket of choice and note the prompt at the top of the screen:
Clicking enable
will open a side panel on the right-hand side of your screen. To enable publicly viewable files, we need to attach the Storage Object Viewer role to a keyword called allUsers (allUsers
is a reserved type of "member" meaning "everybody in the entire world").
Finding Our Bucket Info
When we access our bucket programmatically, we'll need some information about our bucket, like our bucket's URL (we need this to know where items in our bucket will be stored). General information about our bucket can be found under the "overview" tab, take this down:
Generating a Service Key
Finally, we need to generate a JSON service key to grant permissions to our script. Check out the credentials page in your GCP console and download a JSON file containing your creds. Please remember not to commit this anywhere.
Configuring our Script
Let's start coding, shall we? Make sure the google-cloud-storage library is installed on your machine with pip3 install google-cloud-storage
.
I'm going to set up our project with a config.py file containing relevant information we'll need to work with:
First, I set the environ["GOOGLE_APPLICATION_CREDENTIALS"]
value to the path of the service key JSON file. This allows our app to authenticate any requests to interact with our bucket.
The next few variables are strictly related to Google Storage:
BUCKET_NAME
: Our bucket's given name. Thegoogle-cloud-storage
library can interact with any storage bucket you have access to simply by passing the bucket's unique name.BUCKET_URL
: The base URL of our bucket, as located on the "bucket details" page shown earlier. This is the "root" of our bucket, in which directories can be created or files can be uploaded.BUCKET_DIR
: For the sake of this tutorial, I've chosen to work within the confines of a single directory.
I've also decided to have some fun by setting a LOCAL_DIR
variable pointed to a project directory: /files
. This folder contains three sample files we'll use to upload, rename, and delete.
Managing Files in a GCP Bucket
With all the boring configuration stuff done, we can finally get to the good stuff.
Before we do anything, we need to create an object representing our bucket. I'm creating a global variable named bucket. This is created by calling the get_bucket()
method on our storage client and passing the name of our bucket:
To demonstrate how to interact with Google Cloud Storage, we'll create 5 different functions to handle common tasks: uploading, downloading, listing, deleting, and renaming files.
Listing Files
Knowing which files exist in our bucket is a good start:
list_blobs()
gets us a list of files in our bucket. By default this will return all files; we can restrict the files we want to list to those in a bucket by specifying the prefix
attribute:
Upload Files
Our first function will look at a local folder on our machine and upload the contents of that folder:
The first thing we do is fetch all the files we have living in our local folder using listdir()
. We verify that each item we fetch is a file (not a folder) by using isfile()
.
We then loop through each file in our array of files. We set the desired destination of each file using bucket.blob()
, which accepts the desired file path where our file will live once uploaded to GCP. We then upload the file with blob.upload_from_filename(local_file)
:
Downloading Files
Let's get creative with the way we interact with files in our bucket. Instead of explicitly modifying single files we know the exact path of, we can mix things up with a simple pick_random_file()
function to select a random file in our bucket directory.
With that twist, we'll download whichever file the function returns to us via a download_random_file()
function:
We leverage the list_files()
function we already created to get a list of items in our bucket. We then select a random item by generating a random index using randint
.
It's important to note here that .blob()
returns a "blob" object as opposed to a string (inspecting our blob with type()
results in <class 'google.cloud.storage.blob.Blob'>
). This is why we see blob.name
come into play when setting our blob's filename.
Finally, we download our target file to a local directory via download_to_filename()
. Note how we split on the last slash of blob.name
, as .name
return the full file path of a given file.
Deleting Files
Deleting a file is as simple as .delete_blob
:
Renaming Files
To rename a file, we pass a blob object to rename_blob()
and set the new name via the new_name
attribute:
Managing Buckets
We can also use google-cloud-storage to interact with entire buckets:
create_bucket('my_bucket_name')
creates a new bucket with the given name.bucket.delete()
deletes an existing bucket.
There are also ways to programmatically do things like access details about a bucket, or delete all the objects inside a bucket. Unfortunately, these actions are only supported by the REST API. I don't find these actions particularly useful anyway, so whatever.
The source code for this tutorial can be found below. That's all, folks!