[ Forked from %g3hPVP… ]
Why base64?
- Digests are encoded with a URL-safe base64 (replacing + with - and / with _)
TL;DR: Is there a reason we choose base64 instead of just using the same SHA256 hex digest that is used to store blobs on disk?
The only benefit I can see is saving 20 bytes. The downside is that the blob is represented by one string in the UI and by another on disk. And while it's easy to write a script that converts a base64 representation to an actual filename on disk so that one can access the file (inspection, deletion, statistics, …) it is counter-intuitive (and counter-productive for the casual user).
I suggest we skip the whole base64 thing – URL-safe or not – and use the hex digest both in the UI and on disk.
If we're absolutely hell-bent on saving a few bytes, we could use base36 ([0-9a-z]
) that weighs in at 50 bytes for a SHA256 hash.
Comparison
hex (64 bytes)
fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210
base36 (50 bytes)
6coenzgadpereqfqiweqlg6w51buz1ju66rak7toep311d2780
base64 (44 bytes)
_ty6mHZUMhD-3LqYdlQyEP7cuph2VDIQ_ty6mHZUMhA=
Stray thoughts
A pro for both hex and base64 is that they are readily available in most, if not all, programming languages.
As for the hex digest, no further conversion is necessary once the SHA256 hash has been calculated, and the representation is both URL and filesystem safe. Also, the hex digest is case-insensitive.
Base64 has been in use on-disk in ~/.ssb/ebt/
for some time. Earlier versions included the padding character, =
, the current implementation seems to use URL-safe base64 sans padding.
Some terminal emulators [on Linux] do not include base64's padding character by default when selecting a base64 hash by double-clicking it.
Base36 is short, neat and case-insensitive – and sometimes funny words turn up in unsuspected places.
When I post a single 36 kB picture (the median blob size) I have already spent the 20-bytes-savings from more than 1,800 SHA256 hashes written as base64 instead of hex.
Base64 is ugly.