Back

arxiv.org

What Challenges Do Developers Face in AI Agent Systems? An ...

Updated 2/22/2026
https://arxiv.org/html/2510.25423v1

We construct a taxonomy of developer challenges through tag expansion and filtering, apply LDA-MALLET for topic modeling, and manually validate and label the resulting themes. Our analysis reveals seven major areas of recurring issues encompassing 77 distinct technical challenges related to runtime integration, dependency management, orchestration complexity, and evaluation reliability. We further quantify topic popularity and difficulty to identify which issues are most common and hardest to resolve, map the tools and programming languages used in agent development, and track their evolution from 2021 to 2025 in relation to major AI model and framework releases. … |Topic 5 — Installation & Dependency Conflicts|Topic 6 — RAG Engineering|Topic 7 — Prompt & Output Engineering| |--|--|--| |*Topic share: 20.9% of posts*|*Topic share: 9.8% of posts*|*Topic share: 17% of posts*| |Subtopic % LangChain/LlamaIndex Version Drift (API Churn) 31.88 Python/Pydantic/Typing Compatibility 14.49 Third-Party SDK Surface Changes 14.49 Non-Python Platform/Library Incompatibility 7.25 Vendor SDK/Client Mismatch (OpenAI/Azure/Groq/Ollama) 7.25 OS/Binary Environment Crashes 5.80 Vector Store Client\leftrightarrowServer/API Mismatch 5.80 Data Encoding/Serialization Breakages 4.35 Missing Extras/Optional Deps 4.35 Frontend Loader & Worker Versioning (pdf.js) 1.45 Observability/Tracing Setup Issues 1.45 Transformers Pipeline Interface Changes 1.45|Subtopic % Ingestion & Document Processing (PDF/XML/Images) 12.12 Sca==ling, Concurrency & Throughput 12.12 Evaluation, Logging & Traceability (RAGAS; sources) 9.09 Prompting & Query Strategy (multi-query, guardrails) 9.09 Semantic Caching & Memoization 9.09 Session State & Multi-tenant Memory 9.09 Tokenization, Budgets & Cost Control 9.09 Architecture & Framework Choices 6.06 Metadata & Splitter Control 6.06 RAG for Classification / Structured Data 6.06 Structured Outputs & Schema-Aware RAG 6.06 Temporal & Freshness-Aware RAG 3.03 Vector Stores & Index Ops 3.03|Subtopic % Prompt composition & context injection (condense vs. answer; context-only answers) 18.87 Agents & tool/function calling (incl. output parsers) 13.21 Memory prompts & context-window control (buffers/windows/summarization) 11.32 Prompt templating & variable injection 11.32 Chat templates & role prompting (Ollama/Llama) + stop sequences 9.43 Determinism, sampling & output length control 7.55 LCEL composition & chaining patterns 7.55 Structured outputs (JSON, schemas, regex) 3.77| … Within this topic, the top challenge is *Tool-Use Coordination Policies* (23.26%), which is related to configuring when and how agents invoke tools, including disabling or sequencing parallel use to avoid conflicts, e.g., Q79332599. The second most-frequent challenge is *Observability* (11.63%), related to capturing execution traces and operational errors in complex runs to diagnose control issues and architecture-related deadlocks, e.g., Q79363673.

Related Pain Points5

Frequent breaking changes and unstable API

9

LangChain releases updates at an aggressive pace with frequent breaking changes and backward incompatibility, forcing developers to constantly refactor existing code. The break-first, fix-later approach has destroyed developer trust in upgrading packages.

compatibilityLangChain

Lack of observability makes it impossible to trust agents in production

8

94% of organizations with agents in production have implemented observability tooling because agents cannot be trusted without visibility into execution traces and reasoning. Observability is a blocker for production deployment despite 89% adoption attempts.

monitoringobservabilitytracinglogging

Building RAG systems for AI chatbots requires massive engineering investment

8

Raw GPT models have no knowledge of a company's specific business, products, or policies. Developers must build complex Retrieval-Augmented Generation (RAG) systems to dynamically fetch and feed the right information from help centers, tickets, and documentation in real-time, requiring significant ongoing maintenance.

architectureOpenAI APIGPTRetrieval-Augmented Generation

Tool/function calling coordination and agent orchestration complexity

7

Configuring when, how, and in what order agents invoke tools is the top agent orchestration challenge (23.26% of issues). Developers struggle with disabling/sequencing parallel tool use to avoid conflicts and managing control flow in complex workflows.

architectureAI agentsfunction callingtool use

Prompt composition, templating, and context window management across complex workflows

6

Developers struggle with prompt composition strategies, variable injection, chat templating, deterministic output control, and managing memory/context windows across multi-step agent interactions. This represents 17% of agent development issues.

dxprompt engineeringLCELLangChain