why Parlant consumes so many tokens? #563
Replies: 2 comments 1 reply
-
|
Same feeling. If the scenario is complex, multiple rounds of interaction with the language model occur during app startup, which is a relatively token-intensive step. When the conversation begins, it can be seen from the tracing that as the conversation increases, the tokens passed to the LM in each round also continuously grows to maintain the context , which is another relatively token-intensive step. It seems that when the scenario is complex, the response time for interacting with the model also increases. I think it might be necessary to reasonably compress the context to reduce token consumption, or use a locally deployed model, of course, there are still costs involved. |
Beta Was this translation helpful? Give feedback.
-
|
This is an important question and discussion, thank you for raising it. We set out to solve the reliability and behavioral consistency issues with customer-facing agents, while allowing you to install a large number of instructions that actually get followed. This surprisingly difficult challenge requires many intricate design decisions, some of which involve trade-offs. In particular, filtering guidelines - which is what allows you to install so many of them while delivering consistency - is a nuanced task which currently can only be performed with a language model. This translates to tokens. Incidentally, even with a substantially complex agent, costs will still be only a fraction of a human rep call's. Nonetheless, efficiency is an important consideration and the main thing our team is currently promoting behind the scenes, as we're training our own SLMs which, while they might use the same number of tokens, are expected to yield about 10x cost savings and 2-3x latency reduction. We do have an early access program live right now, to which we welcome commercial partners. If anyone wants to hear more, please contact us on our Discord. In the very near future, we aim to make these models available to everyone. So to say it briefly and clearly, we are hard at work on bringing operating costs to the floor, while maintaining all of the accuracy and behavioral consistency you're getting from Parlant. And we expect our improvements to be deployed sooner rather than later! 🙏 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I feel it’s very costly, even when using the lowest-cost LLM model. is there any solution to lower the token usage.
Beta Was this translation helpful? Give feedback.
All reactions