I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

@generalising Fantastic work, Andrew!
Thank you so much. Now I can search web data for posts, searches, and media using the same token words. :)


@Wikisteff Credit where it's due - I took the sample list from an earlier study! arxiv.org/abs/2403.07183 (p 15, 16) I think this is a bit of an idiosyncratic list due to the peer-review context (hence it's all adjectives/adverbs, almost all positive) and there will definitely be other distinctive terms, some unpredictable - it would be quite interesting to do some larger analysis to try and find them.

· · Web · 1 · 0 · 2

@generalising It's a fantastic idea!
I used fine-grained stylometrics to identify the unique-ish "fists" of posters and their proxy accounts in Twitter posts in 2022 to do some hypothesis testing of co-authorship amongst accounts in the aftermath of the 2022 Convoy Protest here in Ottawa, but I hadn't thought of using them for bibliometrics and AI!
It's a genius move! :)

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!