Reading only the metadata of a file in a Google Cloud Storage bucket into a Cloud Function in Python (without loading the file or its data!)

Question

I need something like Cloud Storage for Firebase: download metadata of all files, just not in Angular but in Python and just for a chosen file instead.

The aim is to return this information when the Cloud Function finishes with the return statement or just to log it during the run of the Cloud Function as soon as the file is saved in the Google Storage bucket. With that information at hand, another job can be started after the given timestamp. The pipeline is synchronous.

I have found Q/A's on loading a file or its data into the Cloud Function

using /tmp directory like at Reading Data From Cloud Storage Via Cloud Functions
or load the data (and not the file) using storage.Client() or pandas df.read_csv() like at How to load a file from google cloud storage to google cloud function

to extract data stats into the running Cloud Function from the external file.

Since I do not want to save the large file or its data in memory at any time only to get some metadata, I want to download only the metadata from that file that is stored in a bucket in Google Storage, meaning timestamp and size.

How can I fetch only the metadata of a csv file in a Google Cloud Storage bucket to the Google Cloud Function?

Hey, have you tried some code? If you did then please provide it. Also let us know the error you are receiving in the code. — Zeenath S N, Jan 21 '22 at 10:36
@ZeenathSN Hey, good question, I have postponed that up to now. For now, I take a workaround in Python: 1. `datetime.now()` 2. counted written rows 3. number of field_names as the column count. I put that in the logging and in the return statement. I have not yet tested anything similar that I would perhaps get as metadata from Google Cloud Storage. For now, I will not take the time to go further unless I get an answer here. — questionto42standswithUkraine, Jan 21 '22 at 10:41
Perhaps, this [GitHub Link](https://github.com/googleapis/python-storage/blob/main/samples/snippets/storage_get_bucket_metadata.py) might help you? — Zeenath S N, Jan 21 '22 at 10:45
@ZeenathSN Thanks, no, it loads the file from the bucket into the memory of the GCF container (if I am not mistaken, please correct me elsewise), see `bucket = storage_client.get_bucket(bucket_name)`. Doing that would be useless traffic since the file that I deal with is large. I save it directly to GCS to avoid having it in memory in the GCF container. Then I do not want to load the whole file only to catch its metadata. — questionto42standswithUkraine, Jan 21 '22 at 10:50
@ZeenathSN Just realise that above, I made a mistake that the bucket itself is not the file anyway, so that the whole code does give metadata, yes, but not for a chosen file. Therefore the github code does not solve it but goes into the right direction. — questionto42standswithUkraine, Jan 24 '22 at 21:46

score 1 · Accepted Answer · answered Jan 24 '22 at 11:39

1

There is a Google document present, that shows how to get metadata which is similar to the GitHub Link that I had provided in the comment. You can look at the library here

It just gets the metadata and doesn’t retrieve object data until you call download_to_filename()

Else, you can have a look at the API:get documentation where it shows that it only retrieves the metadata if alt = media isn’t specified and try it.

answered Jan 24 '22 at 11:39

Zeenath S N

153
5

1

You are right, `blob = bucket.get_blob(blob_name)` does probably not yet upload the file to the cloud function since it is used for the metadata example query: [View and edit object metadata](https://cloud.google.com/storage/docs/viewing-editing-metadata#view) --> "Code Samples" --> "Python". I had overseen these code samples, I thought that I would have to use [`gsutil` which is not available in a GCF](https://stackoverflow.com/questions/61795056/run-a-gsutil-command-in-a-google-cloud-function). My mistake. – questionto42standswithUkraine Jan 24 '22 at 22:00

Reading only the metadata of a file in a Google Cloud Storage bucket into a Cloud Function in Python (without loading the file or its data!)

1 Answers1