← 전체 목록

2026-02-02 AI 리서치 브리핑

최신 VLM, sLLM, on-device AI 논문과 연구 블로그를 한눈에 정리합니다. 중복 기사 방지를 위해 URL 기준으로 추적합니다.

총 18건 요약 자동 생성

AI 뉴스 & 리서치

기업/연구기관의 주요 발표와 블로그 업데이트

ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Paper Hugging Face Papers

Join the discussion on this paper page

English Summary unavailable.

원문 보기

THINKSAFE: Self-Generated Safety Alignment for Reasoning Models

Paper Hugging Face Papers

Join the discussion on this paper page

English Summary unavailable.

원문 보기

TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper Hugging Face Papers

Join the discussion on this paper page

English Summary unavailable.

원문 보기

PaperBanana: Automating Academic Illustration for AI Scientists

Paper Hugging Face Papers

Join the discussion on this paper page

English Summary unavailable.

원문 보기

ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper Hugging Face Papers

Join the discussion on this paper page

English Summary unavailable.

원문 보기

VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation

Paper arXiv cs.CV (recent)

While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, physical plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.

English Summary unavailable.

원문 보기

From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation

Paper arXiv cs.CV (recent)

Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.

English Summary unavailable.

원문 보기

User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

Paper arXiv cs.CV (recent)

Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect GroundingDINO. Prompt enhancement substantially improves robustness under ambiguity, yielding gains exceeding 55% mIoU and 41% average confidence. Based on the findings, we propose several prompting strategies and prompt enhancement methods for OSOD models in XR environments.

English Summary unavailable.

원문 보기

Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging

Paper arXiv cs.CV (recent)

Astronomical imaging remains noise-limited under practical observing constraints, while standard calibration pipelines mainly remove structured artifacts and leave stochastic noise largely unresolved. Learning-based denoising is promising, yet progress is hindered by scarce paired training data and the need for physically interpretable and reproducible models in scientific workflows. We propose a physics-based noise synthesis framework tailored to CCD noise formation. The pipeline models photon shot noise, photo-response non-uniformity, dark-current noise, readout effects, and localized outliers arising from cosmic-ray hits and hot pixels. To obtain low-noise inputs for synthesis, we average multiple unregistered exposures to produce high-SNR bases. Realistic noisy counterparts synthesized from these bases using our noise model enable the construction of abundant paired datasets for supervised learning. We further introduce a real-world dataset across multi-bands acquired with two twin ground-based telescopes, providing paired raw frames and instrument-pipeline calibrated frames, together with calibration data and stacked high-SNR bases for real-world evaluation.

English Summary unavailable.

원문 보기

PaperBanana: Automating Academic Illustration for AI Scientists

Paper arXiv cs.CV (recent)

Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow. To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready academic illustrations. Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique. To rigorously evaluate our framework, we introduce PaperBananaBench, comprising 292 test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse research domains and illustration styles. Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, readability, and aesthetics. We further show that our method effectively extends to the generation of high-quality statistical plots. Collectively, PaperBanana paves the way for the automated generation of publication-ready illustrations.

English Summary unavailable.

원문 보기

January 28, 2026 Towards a science of scaling agent systems: When and why agent systems work Generative AI · Machine Intelligence

News Google Research Blog

January 28, 2026 Towards a science of scaling agent systems: When and why agent systems work Generative AI · Machine Intelligence 관련 업데이트입니다.

원문 보기

Algorithms & Theory

News Google Research Blog

Algorithms & Theory 관련 업데이트입니다.

원문 보기

Climate & Sustainability

News Google Research Blog

Climate & Sustainability 관련 업데이트입니다.

원문 보기

Conferences & Events

News Google Research Blog

Conferences & Events 관련 업데이트입니다.

원문 보기

Data Management

News Google Research Blog

Data Management 관련 업데이트입니다.

원문 보기

Microsoft Research blog

News Microsoft Research Blog

Microsoft Research blog 관련 업데이트입니다.

원문 보기

UniRG: Scaling medical imaging report generation with multimodal reinforcement learning

News Microsoft Research Blog

UniRG: Scaling medical imaging report generation with multimodal reinforcement learning 관련 업데이트입니다.

원문 보기

Multimodal reinforcement learning with agentic verifier for AI agents

News Microsoft Research Blog

Multimodal reinforcement learning with agentic verifier for AI agents 관련 업데이트입니다.

원문 보기

참고한 소스