Geek Logbook

Tech sea log book

Working with S3 Object Metadata: Understanding ETags and Last Modified Dates

When working with AWS S3, managing large amounts of data effectively involves understanding key metadata like the ETag and Last Modified date. These properties help track file changes and ensure data consistency.

In this post, we’ll explore how to retrieve the ETag and Last Modified attributes of files stored in S3, which can help identify whether a file has been updated and when it was last modified.

Why Are ETags and Last Modified Dates Important?

  • ETag (Entity Tag): This is a unique identifier that S3 generates for each uploaded object. For smaller files, the ETag is often a hash of the file’s contents. However, for larger files uploaded in parts (multipart uploads), the ETag has a more complex format that reflects the parts used in the upload.
  • Last Modified: This is the timestamp indicating when the object was last updated or modified in the S3 bucket. It’s useful for determining the freshness of a file.

Let’s dive into the code!

Python Code: Retrieving ETag and Last Modified from S3

Here’s a simple Python function to list objects in an S3 bucket and extract their ETag and Last Modified date:

import boto3

def list_s3_objects(bucket_name, folder_prefix):
    # Initialize the S3 client
    s3_client = boto3.client('s3')
    
    # List the objects in the given bucket and folder
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_prefix)
    
    # Check if the response contains 'Contents' key with object metadata
    if 'Contents' in response:
        for obj in response['Contents']:
            object_key = obj['Key']
            last_modified = obj['LastModified']
            etag = obj['ETag'].strip('"')  # ETag comes with extra double quotes
            
            # Print the object key, last modified date, and ETag
            print(f"Object: {object_key}")
            print(f"Last Modified: {last_modified}")
            print(f"ETag: {etag}\n")
    else:
        print(f"No objects found in {bucket_name}/{folder_prefix}")

# Example usage
bucket_name = "your-bucket-name"
folder_prefix = "your-folder/year/month/day/"
list_s3_objects(bucket_name, folder_prefix)

How It Works:

  1. boto3.client(‘s3’): Creates an S3 client to interact with the AWS S3 service.
  2. list_objects_v2: This function lists objects within the specified bucket and folder. It allows filtering by a folder-like structure using the Prefix parameter.
  3. Response Metadata: The metadata for each object is stored in response['Contents']. We extract the Key, LastModified, and ETag for each object.

Example Output:

Object: your-folder/2024/09/12/data-file.csv
Last Modified: 2024-09-12 13:45:10+00:00
ETag: c699f24dbb44ff83b22df8b6bb41f5e0

ETag: What Does It Tell You?

The ETag can give you a sense of whether a file has been modified:

  • If the ETag changes, it indicates that the file contents have been updated.
  • Multipart uploads will have ETags that follow a pattern like c699f24dbb44ff83b22df8b6bb41f5e0-1, where -1 refers to the number of parts used in the upload.