NVIDIA Nemotron-OCR-v2: Synthetic Data Powers Fast OCR
NVIDIA's Nemotron-OCR-v2 leverages large-scale synthetic data to deliver fast, multilingual document OCR, pushing the envelope on efficient vision-language models for structured text extraction.
NVIDIA's Nemotron-OCR-v2 leverages large-scale synthetic data to deliver fast, multilingual document OCR, pushing the envelope on efficient vision-language models for structured text extraction.
Chinese AI lab DeepSeek is reportedly raising $300M to accelerate frontier model development and compete with OpenAI, Anthropic, and Google DeepMind. The move signals intensifying geopolitical stakes in the AI race.
A technical deep dive into how LLMs manage memory during inference, what happens when the KV cache exceeds GPU limits, and the strategies engineers use to keep long-context generation viable.
Mimi is a low-bitrate neural audio codec designed to tokenize speech for large language models, enabling real-time speech generation and the next wave of voice AI systems like Moshi.
Researchers are fighting fire with fire: generating synthetic deepfakes at scale to train more robust detection models capable of keeping pace with rapidly evolving generative AI threats.
Canva's major AI 2.0 update introduces prompt-based editing across its design platform, bringing generative AI image and video tools to over 200 million users in a bid to reshape creative workflows.
MIT Technology Review examines why meaningful human oversight of autonomous AI weapons systems may be impossible, as machine-speed decision cycles outpace human cognition and create only the illusion of control.
UCSD and Together AI introduce Parcae, a stable looped transformer architecture that achieves the quality of models twice its size, potentially reshaping how efficient AI systems are built and deployed.
New research reveals a troubling disconnect in large language models: they can produce logically valid reasoning chains yet arrive at incorrect final answers, raising questions about AI reliability and trust.
A new research paper proposes bi-predictability as a real-time signal for detecting compromised or manipulated LLM interactions, offering a lightweight approach to monitoring conversational integrity without access to model internals.
Real-time deepfakes and voice cloning are turning video conferencing into a new attack surface. Here's how AI-powered impersonation exploits trust in virtual meetings and what security gaps remain.
Adobe is integrating Claude Code-style agentic AI into Creative Cloud, enabling AI-driven creative workflows that could reshape how professionals produce and manipulate visual media at scale.