If the js implementation is just replacing invalid utf8 with U+FFFD, then only the transport encoding is affected, and there are two ways forward:
- change the current ssb implementations to treat non-utf8 json as invalid instead of silently patching it, and make that the spec
- let's do that one, I'd hate to always say "its a json subset except that invalid utf8 is ok for stupid reasons"
- make the replacement part of the spec, with the weird consequence that strings which actually contain U+FFFD in a string can then be encoded by substituting any non-utf8 byte sequence instead. Implementations would be required to e.g. decode
[34 255 34]
into the string"�"
...
To fix the js implementation (if it turns out to do the silent character replacement), use code such as new TextDecoder('utf-8', { fatal: true }).decode(Buffer.from([34, 255, 34]))
, which errors on invalid utf8.