I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

@generalising Fantastic work, Andrew!
Thank you so much. Now I can search web data for posts, searches, and media using the same token words. :)

@generalising Do you have the data table for the 90 top words? I'd love to see how the also-rans performed vis-a-vis the top 10! :)

@Wikisteff No, these were all done by hand so I didn't want to spend a full week on doing all 100! Might be practical to test them all using the Dimensions API, though?

@generalising I computed number of standard deviations above the 2016-2022 mean above a baseline quadratic time series model of language use for 2023-2024. All your control words came out significantly different than model, except for "before" and "earlier".

@Wikisteff this is interesting, thankyou!

What I don't have a figure for is "what percentage of papers in any given year have full text", & it may not be constant over time. This was one of the reasons for including control words - they proxy for it and lets us know what a reasonable bound for year to year change might be (I got ~5%). I'm not sure if that complicates your analysis?

Follow

@Wikisteff similarly I think it's plausible the 2024 data is weird in interesting ways - eg over-representing certain types of paper in certain journals because they publish faster- and that might complicate analysis of it. Which is not to say the 2024 figures *aren't* going to be terrifyingly high whatever corrections we apply!

@generalising Yeah, this is a *great* point: with LLMs, it's easier to *publish* new papers, which will contaminate the sample in a parallel and different way. It would be great to sample a random subset of 100 "likelies", 100 "unlikelies", and 100 "unsures" and do more detailed stylometrics on them to see if you can pull out differences between groups here.

Future work! :) :)

@Wikisteff believe it or not, these extreme cases were what made me think about full-text digging originally. Except of course there's only a handful of these - peer review is pretty good at weeding the most blatant stuff out (and I'd assume a lot of it is editorially desk-rejected even before that step). So it was amusing but all pretty low-level.

Then the adjective list came out and I thought, hey, this might actually show up at scale! :-)

@generalising ...and you were *not* wrong!

As a futurist working for the Government of Canada on, among other things, the medium and long-term consequences of generative artificial intelligence in society, I am hugely interested in the time series here... and in ways to quantify the extent to which our productive capacity is colonized by AI.

In this context, your work, though early, is highly important!

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!