Context Engineering Daily - October 29, 2025

🎨 Prompt Engineering (9)

Evaluating Long-Term Memory for Long-Context Question Answering

arXiv:2510.23730v1 Announce Type: new Abstract: In order for large language models to achieve true conversational continuity and benefit from experiential learning, they need memory. While research has focused on the development of complex memory systems, it remains unclear which types of memory ar...

arXiv • Oct 29, 2025

research model retrieval

100%

ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents

arXiv:2510.23822v1 Announce Type: new Abstract: Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss of goal information, and recurrent failure cycles, while hiera...

arXiv • Oct 29, 2025

model alignment large language model

100%

Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models

arXiv:2510.23824v1 Announce Type: new Abstract: Coordinating multiple autonomous agents in shared environments under decentralized conditions is a long-standing challenge in robotics and artificial intelligence. This work addresses the problem of decentralized goal assignment for multi-agent path p...

arXiv • Oct 29, 2025

model large language model reasoning

100%

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

arXiv:2510.23631v1 Announce Type: new Abstract: Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from richer forms o...

arXiv • Oct 29, 2025

model alignment large language model

100%

Noise is All You Need: Solving Linear Inverse Problems by Noise Combination Sampling with Diffusion Models

arXiv:2510.23633v1 Announce Type: new Abstract: Pretrained diffusion models have demonstrated strong capabilities in zero-shot inverse problem solving by incorporating observation information into the generation process of the diffusion models. However, this presents an inherent dilemma: excessive ...

arXiv • Oct 29, 2025

model vector compression

100%

Context-Engineering - "Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization.

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step." — Andrej Karpathy. A frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization.

GitHub • Jun 29, 2025

context window prompt prompt engineering

100%

OraPlan-SQL: A Planning-Centric Framework for Complex Bilingual NL2SQL Reasoning

arXiv:2510.23870v1 Announce Type: new Abstract: We present OraPlan-SQL, our system for the Archer NL2SQL Evaluation Challenge 2025, a bilingual benchmark requiring complex reasoning such as arithmetic, commonsense, and hypothetical inference. OraPlan-SQL ranked first, exceeding the second-best syst...

arXiv • Oct 29, 2025

framework reasoning prompt

80%

AlphaCodium - Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

GitHub • Jan 14, 2024

prompt prompt engineering paper

60%

optillm - Optimizing inference proxy for LLMs

Optimizing inference proxy for LLMs

GitHub • Aug 22, 2024

LLM

40%

📜 Research Papers (13)

BitSkip: An Empirical Analysis of Quantization and Early Exit Composition

arXiv:2510.23766v1 Announce Type: new Abstract: The pursuit of efficient Large Language Models (LLMs) has led to increasingly complex techniques like extreme quantization and dynamic routing. While individual benefits of these methods are well-documented, their compositional effects remain poorly u...

arXiv • Oct 29, 2025

model framework large language model

100%

Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language

arXiv:2510.23828v1 Announce Type: new Abstract: We present a comprehensive evaluation of the ability of large language models (LLMs) to process culturally grounded language, specifically to understand and pragmatically use figurative expressions that encode local knowledge and cultural nuance. Usin...

arXiv • Oct 29, 2025

research model large language model

100%

How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse

arXiv:2510.23842v1 Announce Type: new Abstract: Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communication dynamically adapts to contexts and interlocutors throu...

arXiv • Oct 29, 2025

model study analysis

100%

Temporal Blindness in Multi-Turn LLM Agents: Misaligned Tool Use vs. Human Time Perception

arXiv:2510.23853v1 Announce Type: new Abstract: Large language model agents are increasingly used in multi-turn conversational settings to interact with and execute tasks in dynamic environments. However, a key limitation is their temporal blindness: they, by default, operate with a stationary cont...

arXiv • Oct 29, 2025

model alignment study

100%

Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs

arXiv:2510.23854v1 Announce Type: new Abstract: In modern industry systems like multi-turn chat agents, Text-to-SQL technology bridges natural language (NL) questions and database (DB) querying. The conversion of tabular DB results into NL representations (NLRs) enables the chat-based interaction. ...

arXiv • Oct 29, 2025

model alignment framework

100%

Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

arXiv:2510.23746v1 Announce Type: new Abstract: Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current methods rely on database matching from previously observed molecule...

arXiv • Oct 29, 2025

model transformer framework

100%

From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production

arXiv:2510.23856v1 Announce Type: new Abstract: Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This path is complicated by fragmented frameworks, slow development,...

arXiv • Oct 29, 2025

research API framework

100%

NUM2EVENT: Interpretable Event Reasoning from Numerical time-series

arXiv:2510.23630v1 Announce Type: new Abstract: Large language models (LLMs) have recently demonstrated impressive multimodal reasoning capabilities, yet their understanding of purely numerical time-series signals remains limited. Existing approaches mainly focus on forecasting or trend description...

arXiv • Oct 29, 2025

model alignment large language model

100%

LLMComp: A Language Modeling Paradigm for Error-Bounded Scientific Data Compression

arXiv:2510.23632v1 Announce Type: new Abstract: The rapid growth of high-resolution scientific simulations and observation systems is generating massive spatiotemporal datasets, making efficient, error-bounded compression increasingly important. Meanwhile, decoder-only large language models (LLMs) ...

arXiv • Oct 29, 2025

model transformer API

100%

Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents

arXiv:2510.23691v1 Announce Type: new Abstract: We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned native keyboard-mouse inputs. Unlike API- or GUI-based approaches, this paradigm enables large-scale continual pre-training across he...

arXiv • Oct 29, 2025

model GPT API

80%

Adversarially-Aware Architecture Design for Robust Medical AI Systems

arXiv:2510.23622v1 Announce Type: new Abstract: Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten p...

arXiv • Oct 29, 2025

model study experiment

60%

AI and the Decentering of Disciplinary Creativity

arXiv:2510.23734v1 Announce Type: new Abstract: This paper examines the role of artificial intelligence in scientific problem-solving, with a focus on its implications for disciplinary creativity. Drawing on recent work in the philosophy of creativity, I distinguish between creative approaches and ...

arXiv • Oct 29, 2025

product arxiv paper

40%

Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions

arXiv:2510.23772v1 Announce Type: new Abstract: The rapid advancement of Generative AI has raised significant questions regarding its ability to produce creative and novel outputs. Our recent work investigates this question within the domain of chess puzzles and presents an AI system designed to ge...

arXiv • Oct 29, 2025

API arxiv paper

40%

🔧 Tools & Frameworks (6)

Language Models for Longitudinal Clinical Prediction

arXiv:2510.23884v1 Announce Type: new Abstract: We explore a lightweight framework that adapts frozen large language models to analyze longitudinal clinical data. The approach integrates patient history and context within the language model space to generate accurate forecasts without model fine-tu...

arXiv • Oct 29, 2025

model framework large language model

100%

mcp-context-forge - A Model Context Protocol (MCP) Gateway & Registry. Serves as a central management point for tools, resources, and prompts that can be accessed by MCP-compatible LLM applications. Converts REST API endpoints to MCP, composes virtual MCP servers with added security and observability, and converts between protocols (stdio, SSE, Streamable HTTP).

A Model Context Protocol (MCP) Gateway & Registry. Serves as a central management point for tools, resources, and prompts that can be accessed by MCP-compatible LLM applications. Converts REST API endpoints to MCP, composes virtual MCP servers with added security and observability, and converts between protocols (stdio, SSE, Streamable HTTP).

GitHub • May 08, 2025

model API prompt

100%

Kiln - The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

GitHub • Jul 23, 2024

tool LLM fine-tuning

100%

openlit - Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.

Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.

GitHub • Jan 23, 2024

vector framework prompt

100%

fastapi_mcp - Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

Expose your FastAPI endpoints as Model Context Protocol (MCP) tools, with Auth!

GitHub • Mar 08, 2025

model tool API

80%

Generating Creative Chess Puzzles

arXiv:2510.23881v1 Announce Type: new Abstract: While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle these difficulties in the domain of chess puzzles. We start by ...

arXiv • Oct 29, 2025

model API framework

40%

🔍 RAG & Retrieval (6)

KAG - KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge bases. It can effectively overcome the shortcomings of the traditional RAG vector similarity calculation model.

KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning and factual Q&A solutions for professional domain knowledge bases. It can effectively overcome the shortcomings of the traditional RAG vector similarity calculation model.

GitHub • Sep 21, 2024

model vector retrieval

100%

AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages

arXiv:2510.23896v1 Announce Type: new Abstract: Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing hallucinations in LLMs. Despite the recent release of massively multilingual MTEB (MMTEB), African language...

arXiv • Oct 29, 2025

model retrieval augmented

100%

LightRAG - [EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"

GitHub • Oct 02, 2024

retrieval RAG augmented

100%

graphrag - A modular graph-based Retrieval-Augmented Generation (RAG) system

A modular graph-based Retrieval-Augmented Generation (RAG) system

GitHub • Mar 27, 2024

retrieval RAG augmented

100%

AutoRAG - AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

GitHub • Jan 10, 2024

retrieval framework RAG

100%

R2R - SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.

GitHub • Feb 12, 2024

API retrieval product

100%

🌍 Multimodal Context (3)

An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis

arXiv:2510.23617v1 Announce Type: new Abstract: Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by jointly analyzing data from multiple modalities typically text and images offering a richer and more accurate interpretation than unimodal approaches. In this paper, we first pr...

arXiv • Oct 29, 2025

model cross-modal transformer

100%

From Detection to Discovery: A Closed-Loop Approach for Simultaneous and Continuous Medical Knowledge Expansion and Depression Detection on Social Media

arXiv:2510.23626v1 Announce Type: new Abstract: Social media user-generated content (UGC) provides real-time, self-reported indicators of mental health conditions such as depression, offering a valuable source for predictive analytics. While prior studies integrate medical knowledge to improve pred...

arXiv • Oct 29, 2025

vision model framework

100%

Why Foundation Models in Pathology Are Failing

arXiv:2510.23807v1 Announce Type: new Abstract: In non-medical domains, foundation models (FMs) have revolutionized computer vision and language processing through large-scale self-supervised and multimodal learning. Consequently, their rapid adoption in computational pathology was expected to deli...

arXiv • Oct 29, 2025

vision model API

80%

🔗 Chain-of-Thought (4)

Chain of Execution Supervision Promotes General Reasoning in Large Language Models

arXiv:2510.23629v1 Announce Type: new Abstract: Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given its inherent logical structure and diverse reasoning paradigm...

arXiv • Oct 29, 2025

vision model chain-of-thought

100%

ThinkSound - [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Thought (CoT) reasoning.

GitHub • Jun 27, 2025

audio chain-of-thought framework

100%

Cline-Recursive-Chain-of-Thought-System-CRCT- - A framework designed to manage context, dependencies, and tasks in large-scale Cline projects within VS Code

A framework designed to manage context, dependencies, and tasks in large-scale Cline projects within VS Code

GitHub • Feb 18, 2025

chain-of-thought framework context

100%

cosmos-reason1 - Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.

GitHub • Mar 02, 2025

model chain-of-thought reasoning

80%

🤖 Context Engineering Daily

🔥 Trending Keywords

🎨 Prompt Engineering (9)

📜 Research Papers (13)

🔧 Tools & Frameworks (6)

🔍 RAG & Retrieval (6)

🌍 Multimodal Context (3)

🔗 Chain-of-Thought (4)

📋 Context Management (1)

🏢 Industry News (2)