www.sumologic.com

How To Organize S3 Data...

11/7/2022Updated 4/4/2026

Excerpt

S3 is highly scalable, so in principle, with a big enough pipe or enough instances, you can get arbitrarily high throughput. A good example is S3DistCp, which uses many workers and instances. But almost always you’re hit with one of two bottlenecks: 1. The size of the pipe between the source (typically a server on premises or Amazon EC2 instance) and S3. 2. The level of concurrency used for requests when uploading or downloading (including multipart uploads). … Thirdly, and critically if you are dealing with lots of items, **concurrency matters**. Each S3 operation is an API request with significant latency — tens to hundreds of milliseconds, which adds up to pretty much forever if you have millions of objects and try to work with them one at a time. So what determines your overall throughput in moving many objects is the concurrency level of the transfer: How many worker threads (connections) on one instance and how many instances are used. … ### How to use nested S3 folder organization and common problems Newcomers to S3 are always surprised to learn that latency on S3 operations depends on key names because prefix similarities become a bottleneck at more than about 100 requests per second. If you need high volumes of operations, it is essential to consider naming schemes with more variability at the beginning of the key names, like alphanumeric or hex hash codes in the first 6 to 8 characters, to avoid internal hot spots within S3 infrastructure. … ### Why you should avoid using AWS S3 locations in your code This is pretty simple, but it comes up a lot. Don’t hard-code S3 locations in your code. This is tying your code to deployment details, which is almost guaranteed to hurt you later. You might want to deploy multiple production or staging environments. Or you might want to migrate all of one kind of data to a new location, or audit which pieces of code access certain data.

Source URL

https://www.sumologic.com/blog/things-know-aws-s3

Related Pain Points