Ten of the ideas, topics, and commonplaces that have been gaining steam on arXiv during the last few months (explainer)
1. Claude 3.7 Sonnet: The top 3 terms in the list were AI models. I'm keeping the first one as a token (pun not intended) of intellectual honesty but, come on.
Some recent articles:
- Assessing the Capability of LLMs in Solving POSCOMP Questions
- Describe Anything in Medical Images
- VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation
- Efficient Agent Training for Computer Use
- ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
2. Solar Terrestrial Relations Observatory: Solar physics using data from the STEREO (alas, no longer appropriately named) mission.
Some recent articles:
- Ensemble Modeling of the Solar Wind Flow with Boundary Conditions Governed by Synchronic Photospheric Magnetograms. I. Multi-point Validation in the Inner Heliosphere
- Deciphering the Formation and Dynamics of Double-decker Filament Through Component Magnetic Reconnection
- Estimating the lateral speed of a fast shock driven by a coronal mass ejection at the location of solar radio emissions
- Observations of a New Form of Partial Filament Eruption
- The high-energy protons of the ground level enhancement (GLE74) event on 11 May 2024
3. Large Deviation Properties: The Wikipedia page is clearer than I could be, and includes this fantastic quote:
Any large deviation is done in the least unlikely of all the unlikely ways! — Frank den Hollander, Large Deviations, p. 10
Some recent articles:
- A simple estimator of the correlation kernel matrix of a determinantal point process
- Martingale approach for first-passage problems of time-additive observables in Markov processes
- LDP for the covariance process in fully connected neural networks
- Journey from the Wilson exact RG towards the Wegner-Morris Fokker-Planck RG and the Carosso field-coarsening via Langevin stochastic processes
- Stochastic Resetting and Large Deviations
4. Mu-SHROOM: A shared task for people working in the problem of hallucination detection in generative AI (the general *SHROOM* naming convention is a good pun).
Some recent articles:
- MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection
- NCL-UoR at SemEval-2025 Task 3: Detecting Multilingual Hallucination and Related Observable Overgeneration Text Spans with Modified RefChecker and Modified SeflCheckGPT
- UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
- SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared Task on Hallucinations and Related Observable Overgeneration Mistakes
- HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection
5. Open-Source Solver: Probably a random spike in how often the term is used, but it's a good idea to browse those papers. The most powerful trick in computing — in a real sense the whole trick of computing — isn't "get more data" but "translate your problem into something we already have a good compiler or solver for."
Some recent articles:
- Inverse Dynamics Trajectory Optimization for Contact-Implicit Model Predictive Control
- Fast Online Movement Optimization of Aerial Base Stations Based on Global Connectivity Map
- Decouple and Decompose: Scaling Resource Allocation with DeDe
- On Solving the Minimum Spanning Tree Problem with Conflicting Edge Pairs
- On Solving the Set Covering Problem with Conflicts on Sets
6. Humanoid Locomotion: The topic gains renewed interest every now and then. There's been a lot of progress! Still, we don't quite have the software or hardware for it yet. Don't let anybody tell you that anything in robotics is easy or "just needs AI"; the moment you leave a controlled laboratory/factory environment you're literally in a world of trouble.
Some recent articles:
- One Policy but Many Worlds: A Scalable Unified Policy for Versatile Humanoid Locomotion
- REvolve: Reward Evolution with Large Language Models using Human Feedback
- TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
- Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
- Model Tensor Planning
7. Kling: Alright, fine, another model, this one for video generation. Glancing at the abstracts, the papers seem to use it as one of various reference models for benchmarks, to test tools, etc.
Some recent articles:
- LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
- AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
- VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
- Jailbreaking the Text-to-Video Generative Models
- Co-Developing Causal Graphs with Domain Experts Guided by Weighted FDR-Adjusted p-values
8. GUI Agents: Four terms above we referred to ongoing efforts to try and figure out how to detect when LLMs hallucinate, something they do with alarming regularity. Here's a set of papers on the increasingly relevant (read: proposed as the technology's killer app) task of asking them to do things for you by tapping on apps, filling up forms, etc.
Some recent articles:
- MAPLE: A Mobile Assistant with Persistent Finite State Machines for Recovery Reasoning
- UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
- Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
- TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
- Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent
9. Textual Semantics: Not unrelated to the issues and use cases above.
Some recent articles:
- SEMMA: A Semantic Aware Knowledge Graph Foundation Model
- MIND-Edit: MLLM Insight-Driven Editing via Language-Vision Projection
- Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
- Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis
- Training-Free Reasoning and Reflection in MLLMs
10. Spatial Reasoning: You know what, I give up. Maybe I'll just modify my tool to filter out the whole LLM field next time.
Some recent articles:
- Grounded Reinforcement Learning for Visual Reasoning
- Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
- ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks
- GET: Goal-directed Exploration and Targeting for Large-Scale Unknown Environments
- MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents