Using flask and boto to create a proxy to S3

I had a case today where I needed to serve files from S3 through my flask app, essentially using my flask app as a proxy to an S3 bucket. There are a couple of tricky bits to get right here, and it took me a few minutes of digging experimentation to get it right—there wasn’t much helpful on google/stackoverflow on the subject—so I thought I’d document it here.

First: you want to stream the contents of the files to the client; don’t wait for the whole thing to arrive on your server and send it down in one big response. This is critical for large files, but should help with latency in general, because your server can potentially start streaming to the client right away. Flask has a pretty neat and (from what I’ve seen) seldom-used feature for this kind of streaming, where you can pass a generator into a Response object. Check out the documentation here. Further, and also little-known, boto’s Key object can act as a generator for this purpose. From the documentation:

By providing a next method, the key object supports use as an iterator. For example, you can now say:

for bytes in key:
write bytes to a file or whatever

All of the HTTP connection stuff is handled for you.

Second: you want to copy the http response headers that you get from S3. After you’ve initiated the request for they key (via open_read()), these live in key.resp, which is the raw response object from S3. That response object has a method called “getheaders”.

And finally, two other small things: use “validate=False” when connecting to the S3 bucket to avoid an extra round-trip, and catch the S3ResponseError in order to return the raw S3 response in error cases.

So putting all of that together, here’s where I ended up:

def static_proxy(path):
    conn = boto.s3.connection.S3Connection(ACCESS_KEY, SECRET_ACCESS_KEY)
    bucket = conn.get_bucket(BUCKET_NAME, validate=False)
    key = boto.s3.key.Key(bucket)
    key.key = path

        headers = dict(key.resp.getheaders())
        return Response(key, headers=headers)
    except boto.exception.S3ResponseError as e:
        return flask.Response(e.body, status=e.status, headers=key.resp.getheaders())

That’s it! I hope someone finds this helpful.

And as always, we’re hiring!


3 responses to ‘Using flask and boto to create a proxy to S3

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s