ArXiv to Ban Authors Who Let AI Write Entire Papers

ArXiv is tightening rules on AI-generated submissions, threatening year-long bans for authors who outsource entire papers to large language models — a major shift in how the research repository handles synthetic content.

Share
ArXiv to Ban Authors Who Let AI Write Entire Papers

ArXiv, the preprint server that has become the de facto publishing pipeline for AI, physics, and mathematics research, is drawing a hard line against fully AI-generated submissions. According to a new policy reported by TechCrunch, authors caught letting large language models do all the work on their papers will face suspensions of up to one year — a notable escalation in how the academic world is responding to synthetic content.

Why ArXiv Is Acting Now

ArXiv hosts more than 2 million preprints and serves as the primary distribution channel for cutting-edge machine learning research. Nearly every major paper from OpenAI, DeepMind, Anthropic, Meta AI, and academic labs appears there first. That centrality has made it a high-value target for low-effort, AI-generated submissions designed to pad publication counts or game citation metrics.

The repository has seen a measurable surge in submissions with telltale LLM artifacts: boilerplate phrasing, fabricated citations, hallucinated equations, and survey papers that appear to be little more than ChatGPT summaries of existing work. Moderators — who are volunteer researchers — have been overwhelmed sorting legitimate work from machine-generated noise.

What the Policy Actually Says

ArXiv is not banning AI assistance outright. Tools like Grammarly, Copilot-style code suggestions, or LLM-aided editing remain acceptable. The new rule targets papers where generative AI is effectively the author — drafting the literature review, generating the methodology, producing results, and writing conclusions without meaningful human intellectual contribution.

Authors found in violation face suspension of submission privileges for up to a year. Repeat offenders could face longer bans. Crucially, the policy also targets review and position papers, a category ArXiv recently restricted because it was being flooded with LLM-generated “surveys” that recombined existing literature without adding insight.

The Authenticity Problem in Academic Publishing

This move sits squarely within the broader digital authenticity crisis driven by generative AI. Just as deepfake detection tools struggle to keep pace with improving video synthesis, academic gatekeepers are racing to identify synthetic text in a landscape where GPT-4-class models can produce grammatically flawless, plausibly-structured scientific prose.

Detection is hard. Studies have repeatedly shown that AI-text classifiers — including OpenAI’s own discontinued tool — produce high false-positive rates, particularly for non-native English speakers. ArXiv’s approach sidesteps the technical detection problem by shifting to a policy-and-enforcement model: moderators flag suspicious submissions based on content quality, citation integrity, and reviewer reports, then act on patterns rather than relying on probabilistic classifiers.

Implications for AI Research Itself

There’s an irony worth noting: the field most affected by this policy is AI research. Machine learning papers represent one of ArXiv’s largest and fastest-growing categories. Many ML researchers routinely use LLMs to refine writing, generate code, or brainstorm experimental designs. The policy forces a clearer line between AI as a tool and AI as an author.

It also raises questions about training data integrity. If LLM-generated papers proliferate on ArXiv, future foundation models scraping the web for scientific text will train on synthetic content masquerading as human research — a feedback loop that could degrade model quality over time. By cleaning its own pipeline, ArXiv is implicitly protecting the data ecosystem that future AI systems will rely on.

A Template for Other Platforms?

ArXiv’s decision may serve as a template for journals, conferences, and other preprint servers wrestling with the same issue. NeurIPS, ICML, and ICLR have already issued guidelines requiring disclosure of LLM use. Springer Nature and Elsevier have banned listing AI tools as authors. But ArXiv’s enforcement mechanism — actual suspensions with teeth — goes further.

For the broader synthetic media debate, this is a useful case study in how institutions adapt when generative AI saturates a content domain. The answer isn’t purely technical detection; it’s a combination of community norms, moderator judgment, and consequences strong enough to change submitter behavior. Whether that approach scales to video, audio, or images remains to be seen — but ArXiv is offering one of the first concrete blueprints.


Stay informed on AI video and digital authenticity. Follow Skrew AI News.