I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

@generalising but if LLMs derive their models from papers that are already published, how can there be massive increases in such words? I like the idea, but this sounds strange.

@ArchaeoIain @generalising Overfitting to certain sources/authors in their training data, I suppose? "Notable" for example is classic Wikipedianese.

Follow

@joeroe @ArchaeoIain that's my best guess as well (as someone who doesn't understand the technicalities involved very well). And some of it may just be a slight mismatch between the tone of the text & the intended venue - maybe when using an LLM for copyediting papers we should ask "and make it 25% less enthusiastic"...

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!