Ten of the ideas, topics, and commonplaces that have been gaining steam on arXiv during the last few months (explainer)
Most (not all!) of them are related to AI; reasonable if unexciting. But each of these lists will filter out previous ones — it's not a leaderboard — so perhaps it'll surface other things as it clears these from the deck (on the other hand, if there's something the LLM industry has been great at has been coming up with new biggest things ever at quite a quick pace).
1. GRPO: A RL policy that, in the DeepSeek style focuses not just on increasing performance but also reducing training costs. Used in DeepSeekMath (see paper).
Some recent articles:
- Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
- When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
- Effective Reinforcement Learning for Reasoning in Language Models
- On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
- InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO
2. Large Reasoning Models: The more LLMs are deployed in knowledge work, and not just for NLP tasks, the clearer it becomes to stakeholders that talking convincingly about something isn't the same as being able to think about it. So it's not surprising that there's a surge in research into figuring out how to make LLMs reason with reasonable reliability.
Some recent articles:
- When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning
- TrimR: Verifier-based Training-Free Thinking Compression for Efficient Test-Time Scaling
- ReasoningShield: Content Safety Detection over Reasoning Traces of Large Reasoning Models
- Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models?
- Reasoning Meets Personalization: Unleashing the Potential of Large Reasoning Model for Personalized Generation
3. DeepSeek-R1: Still cooking!
Some recent articles:
- TrendFact: A Benchmark for Explainable Hotspot Perception in Fact-Checking with Natural Language Explanation
- RRTL: Red Teaming Reasoning Large Language Models in Tool Learning
- Mitigating Cyber Risk in the Age of Open-Weight LLMs: Policy Gaps and Technical Realities
- Foundation Models for Geospatial Reasoning: Assessing Capabilities of Large Language Models in Understanding Geometries and Topological Spatial Relations
- Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN
4. Verifiable rewards: The term is picking up - look out for Reinforcement Learning with Verifiable Rewards (RLVR) as the relevant acronym. It's a slightly fancier way of saying "reward functions you can be certain about" (e.g., when you are training a system to produce answers to problems you can verify), which turns out can allow in some cases for cheaper or more robust training. See here for a good explainer of how this fits into the DeepSeek-R1 training process.
Some recent articles:
- ManipLVM-R1: Reinforcement Learning for Reasoning in Embodied Manipulation with Large Vision-Language Models
- SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning
- VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models
- Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
- Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
5. Small Language Models: Sometimes useful for research, sometimes the only thing you can build with the data you have. Sometimes they do work better.
Some recent articles:
- DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
- Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
- ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection
- Smaller, Smarter, Closer: The Edge of Collaborative Generative AI
- Edge-First Language Model Inference: Models, Metrics, and Tradeoffs
6. SemEval-2025 task: From the workshop's page: SemEval is a series of international natural language processing (NLP) research workshops whose mission is to advance the current state of the art in semantic analysis and to help create high-quality annotated datasets in a range of increasingly challenging problems in natural language semantics. Each year's workshop features a collection of shared tasks in which computational semantic analysis systems designed by different teams are presented and compared.
Some recent articles:
- keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection
- Duluth at SemEval-2025 Task 7: TF-IDF with Optimized Vector Dimensions for Multilingual Fact-Checked Claim Retrieval
- JNLP at SemEval-2025 Task 11: Cross-Lingual Multi-Label Emotion Detection Using Generative Models
- NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT
- UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation
7. Intermediate reasoning steps: See Discourse on the Method of Rightly Conducting One's Reason and of Seeking Truth in the Sciences (Descartes, 1637).
Some recent articles:
- LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization
- FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
- CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays
- CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision
- NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
8. Interpretability: Because we can't always tell what the AI is doing.
Some recent articles:
- Q-Insight: Understanding Image Quality via Visual Reinforcement Learning
- Physics-Guided Learning of Meteorological Dynamics for Weather Downscaling and Forecasting
- SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis
- Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models
- iLOCO: Distribution-Free Inference for Feature Interactions
9. DESI: As usual, a data release from one of the big astrophysics surveys comes with a host of interesting papers based on it. Here it's Data Release 1 of the Dark Energy Spectroscopic Instrument, tasked with getting information regarding minor scientific questions of detail like trying to understand dark energy, which in the appropriate measurement units might be around two-thirds of everything in the universe.
Some recent articles:
- A Morphological Model to Separate Resolved--unresolved Sources in the DESI Legacy Surveys: Application in the LS4 Alert Stream
- The Backup Program of the Dark Energy Spectroscopic Instrument's Milky Way Survey
- DESI Data Release 1: Stellar Catalogue
- Model-Independent Measurement of the Matter-Radiation Equality Scale in DESI 2024
- Confirming HSC strong lens candidates with DESI Spectroscopy. I. Project Overview
10. Vietnamese: There's been a particular growth in LLM work related primarily or indirectly to Vietnamese. I don't really know why — hints welcome — but it's always good news when language technologies expand their cultural scope.
Some recent articles:
- Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models
- WriteViT: Handwritten Text Generation with Vision Transformer
- ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
- Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model
- MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder