Published this on github.
I decided to disallow rationals with a denominator of 0
, since this would force readers to have a data structure to represent this. In all the use cases i can imagine, simply using one of NaN
, Infinity
and -Infinity
should suffice anyways.
Added a disclaimer about the conflict between human-friendly formats and canonical forms:
The canonical encoding somewhat defeats the point of a human-readable format, since it normalizes away all whitespace. For this reason, it might be better to use a more efficient (and in the case of floats less painful) binary encoding. For this to work, there has to be a bijection between valid canonical sdn encodings and valid canonical binary encodings.
And I settled on a canonical format for floats:
Floats must be encoded such that the resulting string:
- rounds to the correct float
- has a
0
left of the decimal point- has a nonzero digit after the decimal point
- includes the exponent
- is a shortest possible string satisfying these criteria
- if there are multiple shortest strings satisfying these criteria, chose the one with the smaller exponent
- if there are multiple shortest strings satisfying these criteria with the same exponent, chose the smallest number among them
This may look complicated, but there are algorithms for this, which are tuned for performance and are used in real projects (e.g. there is (or at least was) code in V8 that can compute this canonical form).
This has now reached a point where I can step out of the rabbit hole and start implementing it. I won't implement the canonical formatting until I need it. I also have a binary representation sketched out, but that will have to wait until I want to actually use it as well.