I accidentally designed a human-readable data format today. It tries to deal with the issues of canonical data representation necessary for content-addressing, and with the associated problems with floating point numbers. Also, it an serve as the syntax of a #lisp. So in case anyone is interested, here's the spec.
SDN
simple data notation
The billionth human-readable data format. Derived from and very similiar to edn, but more minimalistic. Also attempts to specify how to precisely deal with floating point numbers.
General
SDN must be encoded as utf8.
The whitespace characters are \n
(ASCII character 10) and
(ASCII character 32).
Whitespace is ignored other than to separate elements.
An sdn file consists of exactly one element.
Any line starting with ;
outside of a string literal is considered whitespace.
Elements
nil
nil
is an element, it is the single value of the unit type. It represents absence of information.
Booleans
true
and false
are elements, they are the two values of the boolean type.
Strings
"A string, \\ \" \t \n \u11B3 "
(This description is mostly stolen from TOML)
Strings are surrounded by quotation marks. Any Unicode character may
be used except those that must be escaped: quotation mark, backslash, and the
control characters (U+0000 to U+001F, U+007F).
Escape sequences:
\t - tab (U+0009)
\n - linefeed (U+000A)
\" - quote (U+0022)
\\ - backslash (U+005C)
\uXXXX - unicode (U+XXXX)
\UXXXXXXXX - unicode (U+XXXXXXXX)
Any Unicode character may be escaped with the \uXXXX
or \UXXXXXXXX
forms.
The escape codes must be valid Unicode scalar values.
Integers
An integer consists of one or more digits 0
- 9
, optionally prefixed by a -
. No integer other than 0
itself may begin with a 0
.
-0
is not a valid integer.
If an integer is suffixed by a N
, it is an arbitrary precision integer. Else, integers outside the range of a 64 bit two's complement (smaller than -9223372036854775808 or larger than 9223372036854775807) are invalid.
Floats
A float consists of an integer (without an N
suffix), followed by a dot .
, followed by one or more digits 0
- 9
. It may be prefixed by a -
. It may be suffixed by an exponent. An exponent is an E
, optionally followed by a -
, followed by one or more digits 0
- 9
.
NaN
, Infinity
, -Infinity
are valid floats. -0.0
and 0.0
designate two different floats. There is only a single NaN
(no signaling, no sign bit, no payload, etc).
Floats are IEEE 754 64 bit floats. Any floats that can not be represented exactly must be rounded nearest, tied to even.
Rationals
A rational consists of one or more digits 0
- 9
, followed by a /
, followed by one or more digits 0
- 9
. It is an arbitrary precision rational number. The denominator of a rational is not important, i.e. 2/6
may be internally stored as 1/3
.
Symbols
Symbols represent identifiers, they consist of alphanumeric characters, or # : / . * + ! - _ ? $ % & = < >
. A symbol may not start with a numeric character, and nil
, true
, false
, NaN
, Infinity
, and -Infinity
are not valid symbols.
When some characters can be parsed as either a number or a symbol, they must be parsed as a number.
Lists
(x 14)
An ordered sequence of elements, represented as (
, optional whitespace, any number of elements, optional whitespace and )
.
Set
#{
a b c
}
An unordered collection of elements, represented as {
, optional whitespace, any number of pairs of elements, optional whitespace, and }
. Each element may appear at most once.
Map
{
a b
c d
}
An unordered collection of key-value pairs, represented as {
, optional whitespace, any number of pairs of elements, optional whitespace, and }
. Each element may appear at most once.
Equality
To enforce the uniqueness of set entries and map keys, there has to be an equality relation.
nil
is equal to nil
, true
is equal to true
, false
is equal to false
.
Two integers are equal if they consist of the same characters. 64 bit integers and arbitrary precision integers are never equal.
Two symbols are equal if they consists of the same characters.
Two strings are equal if they result in the same data after resolving escape sequences.
Two floats are equal if they result in the same IEEE 754 64 bit float after rounding. In particular, NaN
is equal to NaN
.
Two rationals are equal if they both describe the same rational number, e.g. 2/6
equals 1/3
.
Two lists are equal if they have the same length and all corresponding pairs of elements are equal.
Two sets are equal if they contain the same set of entries, independent of the order.
Two maps are equal if they contain the same set of pairs, independent of the order.
All other pairs of elements (in particular elements of different types) are unequal.
Continued in the next post.