//
you're reading...

which naming scheme would give optimal performance on S3?


If an application is storing hourly log files from thousands of instances from a high traffic
web site, which naming scheme would give optimal performance on S3?

A.
Sequential

B.
HH-DD-MM-YYYY-log_instanceID

C.
YYYY-MM-DD-HH-log_instanceID

D.
instanceID_log-HH-DD-MM-YYYY

E.
instanceID_log-YYYY-MM-DD-HH

Discussion

19 Responses to “which naming scheme would give optimal performance on S3?”

  1. Brian Smith says:

    Probably D

  2. Sandeep says:

    I agree with D.

    Thousands of Instance IDs + Hourly logs seems like the most random sequence option.

  3. seenagape says:

    I choose C

  4. Vijay says:

    I think B is the correct choice

  5. venkat sai says:

    Yes B is right option. The main reason is the random prefix and the performance would be higher in this case.

    A – Don’t make sense
    C – YYYY ( This would be same and would be difficult to achieve good performance)
    D & E – The instance Id would be same for the first two characters ( i-)

  6. Ashish Chaturvedi says:

    D

  7. Max says:

    D. It seems thousands of keys with same prefix “HH-” in one hour is not an optimized performance case.

  8. Duck Bro says:

    D
    Even if the first couple characters are “i-“, the first 3-4 characters provides more random
    prefix than HH-DD.

  9. BDA says:

    D , the random hostname prevents hammering a specific partition, and the HH-DD following hostname is more random than E

    B will hammer a partition once per day at HH-DD

    A changes i/o pattern, does not apply

    C is just as bad as A

    E is almost as good as D by YYYY will not be as random as D

  10. Ryan says:

    D is the answer.
    A,B,C are all sequential.
    E is less random than D.

Leave a Reply

Recent Comments