Forking the heated discussion about the use of ML on SSB data, I want to pose the question of:
What is considered fair processing of SSB feeds?
@Geoffrey brings up a good point that all/most SSB apps have metadata extracting algorithms, either from metadata fields (root
, branch
, mentions
) or from textual content like ssb-search2 does. In fact, without building database indexes (which is a type of data processing), it would be very hard to make any app at all.
There is also ssb-to-graphml that allows people to study the social graph. Also, @humberto has studied the traffic of private messages to see whether we could crack who is talking to whom (!!).
Going forwards, there could be local-first never-online ML, as well as online remote-first ML.
To repeat the question, which of these data processing practices is acceptable and which ones are not? I believe that the answer will vary with who you're asking. And for this, I think users should provide a "license" on acceptable uses of their data, and what are the consequences if you deviate from those uses. Some kind of legal text or "code of conduct" for data usage.
One interesting thing is that the current situation in SSB is that whoever replicates your data could just run whatever data processing on their local computer, and never tell you about it. You would have no idea whether they did it or not.