Thank you @regular.
Sbot replaced the invalid utf8 with the replacement character U+FFFD
. So now we need to decide whether to codify this behavior in the spec, or whether to change sbot.
I see three main reasons for changing sbot to reject invalid utf instead of performing lossy replacement:
- If we allow invalid utf8, then the rpc protocol isn't valid json anymore. This is against the json spec js ssb would otherwise follow.
- Any string that actually contains U+FFFD in a string has multiple valid encodings, namely all those that substitute an invalid utf8 sequence for the U+FFFD character. Conforming implementations have to accept and convert all of them.
- The legacy message format becomes even more ridiculous.
Reasons for keeping the silent replacement behavior:
- No need to fix the sbot code base
My preferred choice is clear, but I guess it is @Dominic who gets to decide this?