Spotted something a bit concerning on #wikipedia today: a user with 23x nonsensical-but-plausible-looking chatGPT created articles (all now deleted), and another with six. https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#Suspected_hoax_content_and_LLM_use_by_User:Gyan.Know - we didn't have the immediate rush of nonsense we were expecting, but there's definitely some seeping in
Having said that I plugged in one of my own articles, and I am not wildly sold on the ability of the public tools to distinguish AI-generated content from "human-written in neutral style".
Unless there's something no-one told me until now. Hell of a way to find out...
@generalising so, uh Andrew, there's something we need to tell you 😉
@generalising I hadn't thought of this but am sure others have...
It's more than "neutral pov". Because Wikipedia is such a massive trove of well-written open source text, it's heavily referenced in the training of a lot of these models. So... it's going to sound like AI, because *it's in the AI*.
I hadn't thought about this specifically creating a problem re detecting AI-generated submissions to Wikipedia with current token-detection methods, but it makes perfect sense that it would. ...ugh.
@generalising ... content generated by AIs that were trained on "human-written in neutral style" texts ...
Checked this with different samples of the same article: lead alone is likely entirely AI, some early sections possibly AI, some late sections likely entirely human. Very messy.
(Interestingly the later sections are the ones noting some kind of critical commentary, which might be relevant?)