🤖 Context Engineering Daily

Your daily dose of AI context engineering news and research

May 19, 2026 • Generated 20:37 UTC

48
Articles
7
Categories
3
Sources

🔥 Trending Keywords

model arxiv LLM framework reasoning prompt large language model context RAG API

📜 Research Papers (15)

Beyond Sentiment Classification: A Generative Framework for Emotion Intensity Evaluation in Text

arXiv:2605.16613v1 Announce Type: new Abstract: We introduce a novel approach to emotion modeling that shifts the focus from identification to evaluation, addressing the limitations of discrete classification in applied domains such as finance. By constructing a dataset of emotional intensity ...

arXiv • May 19, 2026
fine-tuning model analysis
100%

SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

arXiv:2605.16650v1 Announce Type: new Abstract: Evaluating multi-turn dialogue systems remains challenging because response quality depends not only on the current prompt, but also on previously established entities, claims, and conversational commitments. Existing automatic evaluators, including L...

arXiv • May 19, 2026
LLM reasoning prompt
100%

A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

arXiv:2605.16654v1 Announce Type: new Abstract: Manner and result verbs encode different aspects of event structure and have been discussed in developmental work as a potentially informative distinction for studying early verb learning. However, this distinction remains difficult to measure at scal...

arXiv • May 19, 2026
tool RAG study
100%

Language Acquisition Device in Large Language Models

arXiv:2605.16758v1 Announce Type: new Abstract: Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PPT) on synthetic languages has been proposed to close this gap, with prior work emphasizing highly expressive formal languages such as $k$-Shuffle Dyc...

arXiv • May 19, 2026
LLM transformer model
100%

Exploring Lightweight Large Language Models for Court View Generation

arXiv:2605.16770v1 Announce Type: new Abstract: Criminal Court View Generation (CVG) is a critical task in Legal Artificial Intelligence (Legal AI), involving the generation of court view based on case facts. In this work, we systematically explore the capabilities of lightweight (smaller than 2B) ...

arXiv • May 19, 2026
LLM experiment study
100%

Counterparty Modeling is Not Strategy: The Limits of LLM Negotiators

arXiv:2605.16575v1 Announce Type: new Abstract: Negotiation requires more than inferring what the other side wants: it requires using that information to make advantageous offers and counteroffers over multiple turns. We study whether large language model (LLM) agents do this in a controlled multi-...

arXiv • May 19, 2026
LLM reasoning RAG
100%

PRISMat: Policy-Driven, Permutation-Invariant Autoregressive Material Generation

arXiv:2605.16612v1 Announce Type: new Abstract: Rapid identification of candidate materials with target properties has become a key task in materials science. Machine learning has emerged as an alternative to physics-based simulation, offering a faster and cheaper way to filter materials based on t...

arXiv • May 19, 2026
LLM API model
100%

Mirror Descent-Type Algorithms for the Variational Inequality Problem with Functional Constraints

arXiv:2605.16262v1 Announce Type: new Abstract: Variational inequalities play a key role in machine learning research, such as generative adversarial networks, reinforcement learning, adversarial training, and generative models. This paper is devoted to the constrained variational inequality proble...

arXiv • May 19, 2026
experiment model analysis
100%

Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories

arXiv:2605.16319v1 Announce Type: new Abstract: Medium-horizon Alzheimer's disease progression prediction is difficult because future clinical scores can remain tied to baseline severity, while biomarker histories are irregular and incompletely observed. We develop an anchor-based analysis of 24-mo...

arXiv • May 19, 2026
transformer attention model
100%

AgentWall: A Runtime Safety Layer for Local AI Agents

arXiv:2605.16265v1 Announce Type: new Abstract: The safety of autonomous AI agents is increasingly recognized as a critical open problem. As agents transition from passive text generators to active actors capable of executing shell commands, modifying files, calling APIs, and browsing the web, the ...

arXiv • May 19, 2026
API model alignment
80%

From Prompts to Protocols: An AI Agent for Laboratory Automation

arXiv:2605.16552v1 Announce Type: new Abstract: Automating science laboratories enables faster, safer, more accurate, and more reproducible execution of protocols, accelerating the discovery and testing of new materials, drugs, and more. However, setting up and running autonomous labs requires coor...

arXiv • May 19, 2026
experiment prompt model
80%

Systematic Optimization of Real-Time Diffusion Model Inference on Apple M3 Ultra

arXiv:2605.16259v1 Announce Type: new Abstract: While real-time image generation using diffusion models has advanced rapidly on NVIDIA GPUs, systematic optimization research on non-CUDA platforms such as Apple Silicon remains extremely limited. In this study, we conducted comprehensive optimization...

arXiv • May 19, 2026
demonstration experiment study
80%

Scaling Accessible Mathematics on arXiv: HTML Conversion and MathML 4

arXiv:2605.16562v1 Announce Type: new Abstract: We report on the ongoing development of arXiv's HTML Papers offering, available on every new TeX/LaTeX submission since its initial release in 2023. The main highlights from 2025 and early 2026 are: (i) community-driven improvements to HTML fideli...

arXiv • May 19, 2026
release paper experiment
60%

Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents

arXiv:2605.16671v1 Announce Type: new Abstract: Rapid biodiversity loss underscore the urgency of effective monitoring, yet manual surveys remain resource-intensive. While on-device AI offers a scalable alternative, its performance in the wild is often challenged by environmental variability. Curre...

arXiv • May 19, 2026
reasoning API knowledge base
40%

When Actions Disappear: Adversarial Action Removal in Self-Play Reinforcement Learning

arXiv:2605.16312v1 Announce Type: new Abstract: We study adversarial action masking in self-play reinforcement learning: an attacker selectively removes legal actions from a victim's action set. Unlike observation or action perturbations, removal eliminates decision options before the agent acts. A...

arXiv • May 19, 2026
study arxiv
40%

🎨 Prompt Engineering (10)

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

arXiv:2605.16767v1 Announce Type: new Abstract: Multi-label legal annotation requires assigning multiple labels from large, evolving taxonomies to long, fact-intensive documents, often under limited supervision. Parametric encoders typically require task-specific training and retraining when the la...

arXiv • May 19, 2026
fine-tuning prompting retrieval
100%

Context-Engineering - "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization.

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization.

GitHub • Jun 29, 2025
context window prompt context
100%

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

arXiv:2605.16551v1 Announce Type: new Abstract: Evaluating LLM-based agents remains challenging because identifying meaningful failure cases often requires substantial human effort to design realistic test scenarios. Prior works primarily focus on automatically discovering agent failures induced by...

arXiv • May 19, 2026
framework prompt arxiv
60%

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

arXiv:2605.16309v1 Announce Type: new Abstract: LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches ad...

arXiv • May 19, 2026
LLM prompt memory
60%

Language Game: Talking to Non-Human Systems

arXiv:2605.16321v1 Announce Type: new Abstract: Language carries thought and coordination among humans but rarely reaches further along the spectrum of diverse intelligence. Yet non-neural systems -- from gene regulatory networks and microbial consortia to fungi -- are increasingly recognized as su...

arXiv • May 19, 2026
prompt memory model
60%

AlphaCodium - Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

GitHub • Jan 14, 2024
prompt prompt engineering paper
60%

Claude-Code-Everything-You-Need-to-Know - The ultimate all-in-one guide to mastering Claude Code. From setup, prompt engineering, commands, hooks, workflows, automation, and integrations, to MCP servers, tools, and the BMAD method—packed with step-by-step tutorials, real-world examples, and expert strategies to make this the global go-to repo for Claude mastery.

The ultimate all-in-one guide to mastering Claude Code. From setup, prompt engineering, commands, hooks, workflows, automation, and integrations, to MCP servers, tools, and the BMAD method—packed with step-by-step tutorials, real-world examples, and expert strategies to make this the global go-to repo for Claude mastery.

GitHub • Aug 17, 2025
step-by-step tool prompt
60%

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

...

Hugging Face Blog • May 18, 2026
fine-tuning
40%

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

...

Hugging Face Blog • May 18, 2026
transformer
40%

optillm - Optimizing inference proxy for LLMs

Optimizing inference proxy for LLMs

GitHub • Aug 22, 2024
LLM
40%

🔗 Chain-of-Thought (7)

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

arXiv:2605.16638v1 Announce Type: new Abstract: Recent research has demonstrated that Universal Multimodal Embedding (UME) benefits significantly from Chain-of-Thought (CoT) reasoning. In this paradigm, a generative model produces explicit reasoning traces for a multimodal query, with the final rep...

arXiv • May 19, 2026
multimodal LLM reasoning
100%

Reducing Credit Assignment Variance via Counterfactual Reasoning Paths

arXiv:2605.16302v1 Announce Type: new Abstract: Reinforcement learning for multi-step reasoning with large language models (LLMs) often relies on sparse terminal rewards, leading to poor credit assignment conditions where the final feedback is evenly propagated across all intermediate decisions. Th...

arXiv • May 19, 2026
LLM reasoning model
100%

ThinkSound - [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

GitHub • Jun 27, 2025
audio reasoning CoT
100%

Cline-Recursive-Chain-of-Thought-System-CRCT- - A framework designed to manage context, dependencies, and tasks in large-scale Cline projects within VS Code

A framework designed to manage context, dependencies, and tasks in large-scale Cline projects within VS Code

GitHub • Feb 18, 2025
framework context chain-of-thought
100%

PageIndex - 📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

GitHub • Apr 01, 2025
reasoning RAG vector
100%

cosmos-reason1 - Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

GitHub • Mar 02, 2025
model chain-of-thought reasoning
80%

Scalable Uncertainty Reasoning in Knowledge Graphs

arXiv:2605.16568v1 Announce Type: new Abstract: Knowledge Graphs are pivotal for semantic data integration. The real-world data they model is often inherently uncertain. Within knowledge graphs, uncertainty manifests in three distinct levels: imprecise attribute values, probabilistic triple existen...

arXiv • May 19, 2026
reasoning embedding model
60%

🏢 Industry News (4)

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

arXiv:2605.16675v1 Announce Type: new Abstract: We introduce LinAlg-Bench, a diagnostic benchmark evaluating 10 frontier large language models on structured linear algebra computation across a strict dimensional gradient of 3x3, 4x4, and 5x5 matrices. Spanning 9 task types and 660 SymPy-certified p...

arXiv • May 19, 2026
release LLM tool
100%

excel-mcp-server - A Model Context Protocol server for Excel file manipulation

A Model Context Protocol server for Excel file manipulation

GitHub • Feb 12, 2025
context model
60%

mcp-agent - Build effective agents using Model Context Protocol and simple workflow patterns

Build effective agents using Model Context Protocol and simple workflow patterns

GitHub • Dec 18, 2024
context model
60%

OlmoEarth v1.1: A more efficient family of models

...

Hugging Face Blog • May 19, 2026
model
20%

🔧 Tools & Frameworks (5)

awesome-prompt-engineering - A curated list of resources, tools, papers, and platforms for prompt engineering in large language models (LLMs) and generative AI.

A curated list of resources, tools, papers, and platforms for prompt engineering in large language models (LLMs) and generative AI.

GitHub • Jun 28, 2025
LLM tool prompt
100%

fastapi_mcp - Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

GitHub • Mar 08, 2025
context tool model
80%

The Scaling Laws of Skills in LLM Agent Systems

arXiv:2605.16508v1 Announce Type: new Abstract: As agent systems scale, skills accumulate into large reusable libraries, yet their scaling laws remain poorly understood. Across 15 frontier LLMs, 1,141 real-world skills, and over 3M routing or execution decisions, we identify two coupled laws. Routi...

arXiv • May 19, 2026
arxiv library model
60%

SignMuon: Communication-Efficient Distributed Muon Optimization

arXiv:2605.16311v1 Announce Type: new Abstract: Distributed training of large neural networks is bottlenecked by full-precision gradient communication and by coordinatewise optimizers that ignore the matrix structure of weight tensors. We propose Sign-Muon, a 1-bit, matrix-aware optimizer that comb...

arXiv • May 19, 2026
framework GPT model
60%

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv:2605.16679v1 Announce Type: new Abstract: End-to-end automation of realistic healthcare operations stresses three capabilities underrepresented in current benchmarks: policy density, decisions must be grounded in a large library of medical, insurance, and operational rules; Multi-role composi...

arXiv • May 19, 2026
library tool model
40%

🔍 RAG & Retrieval (6)

KAG - KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge bases. It can effectively overcome the shortcomings of the traditional RAG vector similarity calculation model.

KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge bases. It can effectively overcome the shortcomings of the traditional RAG vector similarity calculation model.

GitHub • Sep 21, 2024
vector LLM retrieval
100%

LightRAG - [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

GitHub • Oct 02, 2024
retrieval RAG augmented
100%

graphrag - A modular graph-based Retrieval-Augmented Generation (RAG) system

A modular graph-based Retrieval-Augmented Generation (RAG) system

GitHub • Mar 27, 2024
retrieval RAG augmented
100%

Kiln - Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.

GitHub • Jul 23, 2024
RAG fine-tuning
100%

R2R - SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

GitHub • Feb 12, 2024
retrieval RAG augmented
100%

airweave - Open-source context retrieval layer for AI agents

Open-source context retrieval layer for AI agents

GitHub • Dec 24, 2024
context retrieval
100%

📋 Context Management (1)

OpenViking - OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.

OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw). OpenViking unifies the management of context (memory, resources, and skills) that Agents need through a file system paradigm, enabling hierarchical context delivery and self-evolving.

GitHub • Jan 05, 2026
context memory
60%