Spotted something a bit concerning on #wikipedia today: a user with 23x nonsensical-but-plausible-looking chatGPT created articles (all now deleted), and another with six. https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incidents#Suspected_hoax_content_and_LLM_use_by_User:Gyan.Know - we didn't have the immediate rush of nonsense we were expecting, but there's definitely some seeping in
Looking into it a bit more, the first user also "heavily expanded" a couple of dozen mainspace articles. Is any of this true? Who knows!
It is rather telling that anything about a place sounds like it was written by a tourist site or an estate agent, though.
@generalising so, uh Andrew, there's something we need to tell you 😉
@generalising I hadn't thought of this but am sure others have...
It's more than "neutral pov". Because Wikipedia is such a massive trove of well-written open source text, it's heavily referenced in the training of a lot of these models. So... it's going to sound like AI, because *it's in the AI*.
I hadn't thought about this specifically creating a problem re detecting AI-generated submissions to Wikipedia with current token-detection methods, but it makes perfect sense that it would. ...ugh.
@generalising ... content generated by AIs that were trained on "human-written in neutral style" texts ...
Having said that I plugged in one of my own articles, and I am not wildly sold on the ability of the public tools to distinguish AI-generated content from "human-written in neutral style".
Unless there's something no-one told me until now. Hell of a way to find out...