community.openai.com
Challenges and Concerns with OpenAI's Assistant API
div As a Ph.D. student engaged in AI in education research, I’ve been utilizing OpenAI’s Assistant API for my project. My experience has led to some important observations and concerns: 1. **Retrieval Charges**: Despite OpenAI stating that retrieval is free until 1/12/2024, I was charged for each retrieval of my PDF, which significantly inflated my costs. 2. **Token Count Discrepancy**: The API seems to read the raw PDF data resulting in inflated tokens count and higher costs. In my case I computed 3566 tokens while the assistant API retrieved around 13k tokens. 3. **Tokenization Limitation**: The API appends the entire conversation thread, including any PDFs (when retrieval is active), to each message. The API will keep appending until it accumulates up to approximately 128k tokens (the GPT-4 token limit). 4. **Context Window Management**: OpenAI’s current setup does not allow users to control the length of the context window. While OpenAI is considering enabling this feature, there’s no definitive timeline or update. 5. **Documentation Clarity on Threads**: The official documentation lacks clear guidance on the cost per thread. Questions about thread creation costs, management, deletion, and whether these can be controlled via the API remain unanswered. **Cost Analysis**: - **Expected Cost**: Based on OpenAI’s pricing and official tokenizer, I calculated the expected cost for my usage as $26.07. - **Incurred Cost**: The actual cost tallied to $189.40, significantly higher than expected. This includes charges for failed attempts, which are not clearly outlined in OpenAI’s pricing model. The inflated costs were incurred mainly due to the re-retrieval of the document for every message and appending the whole conversation in the thread to the new messages. I conducted a few preliminary tests before proceeding to a full run. ... However, due to time constraints in my research, I soon progressed to looping over the prompt and wasn’t able to monitor the cost during the run. It was in this phase that the significant costs, previously unnoticed in the shorter tests, became apparent. In summary, my experience with the Assistant API has been financially burdensome, contradicting OpenAI’s claims of cost-efficiency. The lack of transparency in pricing and the apparent hidden costs have made it challenging to continue the use of OpenAI’s GPT models. Hey champ an welcome to the community. It’s the file storage that’s free. The tokens used for the Assistant API are billed at the chosen language model’s per-token input / output rates. … Thanks for your resonse. So far due to numerous hidden costs and lack of details in the documentations, it only looks good on paper. ... Casual usecases will not necessarily needs all chat history until 128k, its just expensive. ... Costs are way up and I’ve been getting a ton of failed runs lately. I will probably switch back tonight and wait until it’s a bit more mature. ... `Rate limit reached for gpt-4-1106-preview in organization X on tokens_usage_based per day: Limit 500000, Used 497557, Requested 4096. Please try again in 4m45.638s. Visit https://platform.openai.com/account/rate-limits to learn more.` Might revisit this API again soon. ... It is pretty absurd that the tokens for instructions and data are counted with each message. The way Retrieval is handled and charged today kills most business cases.
Related Pain Points4件
Unexpected retrieval charges despite free tier claims
7OpenAI charged for PDF retrieval in the Assistant API despite stating the feature would be free until January 12, 2024. Developers incurred significantly inflated costs from repeated retrieval charges that contradicted official pricing claims.
No control over context window length in Assistant API
6The Assistant API automatically appends the entire conversation thread and PDFs to each message up to the 128k token limit, with no user control over context window management. OpenAI is considering this feature but has provided no timeline.
Inefficient token usage and hidden API costs
6LangChain's abstractions hide what happens with prompts and model calls, resulting in more tokens consumed than hand-optimized solutions. The framework exhibits inefficient context management and a broken cost tracking function that often showed $0.00 when real charges were accumulating.
Opaque cost metrics and unpredictable platform expenses
5Vercel's usage dashboard shows metrics like 'Fluid Active CPU' and 'ISR Writes' without clear documentation on how they impact costs or how to optimize them. Developers pay subscription fees but lack visibility into what drives spending, making budgeting impossible.