Follow

I have a preprint out estimating how many scholarly papers are written using chatGPT etc? I estimate upwards of 60k articles (>1% of global output) published in 2023. arxiv.org/abs/2403.16887

How can we identify this? Simple: there are certain words that LLMs love, and they suddenly start showing up *a lot* last year. Twice as many papers call something "intricate", big rises for "commendable" and "meticulous".

I looked at 24 words that were identified as distinctively LLMish (interestingly, almost all positive) and checked their presence in full text of papers - four showed very strong increases, six medium, and two relatively weak but still noticeable. Looking at the number of these published each year let us estimate the size of the "excess" in 2023. Very simple & straightforward, but striking results.

Can we say any one of those papers specifically was written with ChatGPT by looking for those words? No - this is just a high level survey. It's the totals that give it away.

Can we say what fraction of those were "ChatGPT generated" rather than just copyedited/assisted? No - but my suspicions are very much raised.

Isn't this all a very simplistic analysis? Yes - I just wanted to get it out in the world sooner rather than later. Hence a fast preprint.

Is it getting worse? You bet. Difficult to be confident for 2024 papers but I'd wildly guess rates have tripled so far. And it's *March*.

Is this a bad thing? You tell me. If it's a tell for LLM-generated papers, I think we can all agree "yes". If it's just widespread copyediting, a bit more ambiguous. But even if the content is OK, will very widespread chatGPT-ification of papers start stylistically messing up later LLMs built on them? Maybe...

Is there more we could look at here? Definitely. Test for different tells - the list here was geared to distinctive words *on peer reviews*, which have a different expected style to papers. Test for frequency of those terms (not just "shows up once"). Figure out where they're coming from (there seems to be subject variance etc).

Glad I've got something out there for now, though.

huh, this is neat! someone did an AI-detector-tool based analysis looking at preprint platforms, and released it on exactly the same day as mine. Shows evidence for differential effects by discipline & country. biorxiv.org/content/10.1101/20

More on LLMs and peer reviews: 404media.co/chatgpt-looms-over

(Back to work tomorrow, & to revising the paper. I feel it's going to be a race to keep up.)

@generalising Interesting! I also suspect we'll see a tendency for humans to imitate the style of LLMs, as LLMese becomes a widely used, computer-endorsed, and thus relatively prestigious dialect. (I suspect I'm starting to see this among students already.)

It's good news at least for "outwith", though. I take that as some compensation for the war that spellcheckers have been waging on the word for years.

@ncdominie yes, I think this definitely seems plausible - but goodness knows what it will mean for all the people selling tools to detect LLM written student essays!

Can't decide what I think about "outwith". Good to see it being used, but a little disappointed it's not going to be a distinctive sign of human authorship any more...

@generalising We shall just have to increase our use of other Scottish shibboleths and stay one step ahead of the bots.

(I'm going to start using "furth" and "anent" more, and that's just the polite ones.)

@ncdominie or we could just accept the inevitable triumph of the Leal Leid Makars?

@generalising @ncdominie LLMs might destroy the world, I don't like it but fine. But what does "outwith"mean?

@ditol @generalising Antonym of "within"; English used to use "without" in that sense but has lost it in recent centuries.

@generalising @ncdominie "pivotal", "notable", and "intricate" are going off the *hook* in 2024!

@ncdominie @generalising LLMese will happen. And this one way to describe what Grammarly is selling.

@ncdominie @generalising This is a depressing truth you have just revealed to me.

@generalising I wonder if there is a correlation between having these "LLM markers" and the first author being from a non English speaking country. I know a lot of people who use LLM powered tools for translation, and in such cases the Markers would show up, even if the content is totally original.

Just a passing thought, but seems to be a very interesting study, congrats!

@Jey_snow @generalising
As an author from a non English speaking country: absolutely. Not just translation though, the stuff I write in English I will often run through quillbot for fluency, or ChatGPT to summarise. Helps tremendously, very meticulous and intricate.

(Also my whole academic career is built around tech law and privacy so very aware of how shady these LLMs can be)

@Jey_snow thanks - yes, I think that's very likely! Dimensions doesn't let me easily test for author affiliation location, but I think you'd be safe placing a small bet on it...

@generalising One point in your research is something I have noticed anecdotally, namely that some of the most obvious examples of ChatGPT use in scientific papers involved authors for whom English is not a first language. I suspect that these authors are using ChatGPT as a way of creating idiomatic English text, something that Google Translate does not always provide.

#ChatGPT #GoogleTranslate #linguistics

@michaelmeckler I wonder if part of the issue is that these words are not "wrong" but they are (in context) tonally "awkward" - a thing that is harder to spot and edit out for a second language speaker, if they're not looking for it?

(eg in my case I could look at a auto-translated French text and say "yeah, that sounds like what I was trying to get across", but probably not "hmm, that sounds subtly off")

@michaelmeckler @generalising

maybe this isn't obvious to people who speak English as a first language, but writing something in another language and then automatically translating it does not sound like a reasonable way to publish any kind of long-form writing.

@generalising Gonna be a link in tomorrow morning's ResearchBuzz, too. Thanks! 👍

@generalising I suspect it's mostly copyediting by people for whom English is a second (or later) language. LLM-generated ex nihilo is unlikely to pass peer review (in early 2024 at least!)

@Tom_Drummond yes, I think that's going to account for a lot of it - occasional horror stories aside, I wouldn't expect many pure-LLM papers are escaping into the wild. It's the middling grey area beyond "just polishing" that worries me...

@generalising So - it appears that my student’s paper has almost certainly just received an llm generated review. AC’s attention has been drawn to this. We’ll see how it unfolds!

@Tom_Drummond very curious to see how it develops! Also wonderingwhat stood out - was it phrasing, or just a general lack of engagement with the topic?

@generalising the discussion of weaknesses was very shallow and mostly just a rehash of the limitations section in the paper - apparently llms do this. So the student put the review through gptzero and got an 87% score (weakly calibrated estimate of likelihood that review was llm generated)

@Tom_Drummond came across this today which definitely echoed your comments - "I went through the reports line by line, word by word: there was nothing there" - 404media.co/chatgpt-looms-over

@generalising Thanks for the link; his experience seems worse than ours (which was fortunately only one out of four reviews).

@generalising I'm always happy when someone does statistics and emphasizes that this only says something about the ensemble, not about the individual samples.

Nice piece of work!

@wesselvalk yes, there's definitely a lot of purely human papers out there that will be using these "normally"! (This one would score amazingly high, for one thing...)

@generalising When "word of the year" chosen by human editors converges on one chosen by LLMs...

@generalising

Oh, that is delightfully ingenious! Congratulations, & thanks for sharing!

I'd recommend not anthropomorphising LLMs by using the shorthand that they "love" anything, though. Lay people thinking of them as in some way sentient is a problem which needs no encouragement.

@unchartedworlds thanks - and fair point! I normally try to avoid it, but keep falling into the habit of anthropomorphising them to about the same level as recalcitrant lifts...

@generalising Gosh, I love this!! I also told my students that there's something wrong when words like 'plethora', which didn't exist in student #vocabulary only a few years back, are now coming up in every second essay I read. Although I also have to say that I think that not all these essays are #AI-generated. There's also the phenomenon that students themselves start to write like #ChatGPT because they read so much AI content and lack the critical #literacy to identify weird metaphors etc.

@wendinoakland @generalising I would hope that you know what it means... Because most of my undergraduates who are now using it don't. I should add that we are a Dutch university where 99.9% of my students are non-native speakers of English.

@mob @generalising Ahh, understood. I like words & it’s nice to have a plethora of ways to describe things. ;)

@generalising
Very cool!
from what I understand, if a paper contains both "“intricate” and “meticulous” or
“intricate” and “notable” there is a very high chance it is by LLM (although, of course that's not a proof..especially now that your work will be noticed).

Is there hope to get automatic LLM with some~99% confidence? Of course the entire point is that LLM mimic human language, but you are showing that language correlations appear and are strong indicators, and they may hard to polish.

@franco_vazza @generalising If you are interested in this, you should definitely look into the topic of stylometry, also measurements like perplexity as ways of trying to classify text as LLM-generated or not

@franco_vazza there's always a baseline of usage, but that pair together has suddenly become a lot more common.

I don't think this approach is great for detecting LLM involvement in any individual paper (there are *much* more sophisticated tools for that) but it works OK for estimation at a much broader scale.

@generalising We also have the discussion if our students should still be allowed to use Grammarly etc. as the AI-abilities of these tools increase. What I think should happen: raise overall expectations for writing quality and simply fail papers on grounds of imprecise writing etc. when they are full of flowery AI adjectives. If someone uses AI tools so smartly that a decent argumentation comes out, they probably did put in a lot of work and grey matter. At least that's still the case now.

@mob I wonder a lot about the grammarly thing, but also stuff like "AI assisted search tools". At what point does it cross the line between something we're happy for students to use, and something that's too much? Really hard to say, and very blurry sometimes.

@generalising I have searched for AI tools for learning and research & also came across this one: studycopilot.io/
This is the death of literature reviews unless we require a lot more comparison & transfer. I have not tested the tool yet, but what it supposedly does is summarise long research papers uploaded in PDF format. Especially if students use publications that I have not read, it will be impossible to say if they have reflected on the works themselves or let AI do it all for them.

@generalising As an arXiv moderator, I've noticed the rise in use of "meticulous". The increase in positive adjectives in general makes scientific papers sound like press releases. ArXiv has a rule against "drum banging". I rejected two submissions in the past week on this basis.

@tdietterich this one is really interesting, thankyou! I had wondered if it would be visibly showing up in submitted material.

"Press release" is a good way of describing it - I think the first time I saw it it made me think of people writing travel guides, everywhere "vibrant" and with "stunning natural beauty"...

@generalising It's a simple correlation, corrected neither for changes in the composition of the body of literature nor for literary fads among academics.

@generalising If my papers are good for nothing else, at least they can help stave off model collapse.

@generalising There's one way to stop this dead in it's tracks. All final papers must be written in longhand with 3 rough copies available should queries arise. Sorted (almost).

@generalising Any idea why LLMs like those words so much, if they don't show up with those frequencies in the training data?

@robinadams @generalising Probably to do with RLHF, the step after initial training where they are "humanised" into having a "personality". You can play with base models on openai playground if you have a dev account, and its fun to see the difference in usability and style from RLHF

@generalising TFW when you realize you also use these words at an elevated rate because if you didn't in high school your hand got hit by a ruler.

@generalising Reminds me of this infamous hackernews tool (news.ycombinator.com/item?id=3) which could correlate different pseudonmous accounts sharing an author. Searching for more stuff on LLM stylometry, I found this article (arxiv.org/abs/2308.07305) on more fine-grained stylometry, although that's harder to do at scale, still looks like an interesting rabbit-hole

Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!