I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. https://arxiv.org/abs/2403.16887
How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".
@generalising Fantastic work, Andrew!
Thank you so much. Now I can search web data for posts, searches, and media using the same token words. :)
@Wikisteff Credit where it's due - I took the sample list from an earlier study! https://arxiv.org/abs/2403.07183 (p 15, 16) I think this is a bit of an idiosyncratic list due to the peer-review context (hence it's all adjectives/adverbs, almost all positive) and there will definitely be other distinctive terms, some unpredictable - it would be quite interesting to do some larger analysis to try and find them.
@generalising It's a fantastic idea!
I used fine-grained stylometrics to identify the unique-ish "fists" of posters and their proxy accounts in Twitter posts in 2022 to do some hypothesis testing of co-authorship amongst accounts in the aftermath of the 2022 Convoy Protest here in Ottawa, but I hadn't thought of using them for bibliometrics and AI!
It's a genius move! :)