langchain
20690db4 - core[minor]: Add BaseModel.rate_limiter, RateLimiter abstraction and in-memory implementation (#24669)

Commit

1 year ago

core[minor]: Add BaseModel.rate_limiter, RateLimiter abstraction and in-memory implementation (#24669) This PR proposes to create a rate limiter in the chat model directly, and would replace: https://github.com/langchain-ai/langchain/pull/21992 It resolves most of the constraints that the Runnable rate limiter introduced: 1. It's not annoying to apply the rate limiter to existing code; i.e., possible to roll out the change at the location where the model is instantiated, rather than at every location where the model is used! (Which is necessary if the model is used in different ways in a given application.) 2. batch rate limiting is enforced properly 3. the rate limiter works correctly with streaming 4. the rate limiter is aware of the cache 5. The rate limiter can take into account information about the inputs into the model (we can add optional inputs to it down-the road together with outputs!) The only downside is that information will not be properly reflected in tracing as we don't have any metadata evens about a rate limiter. So the total time spent on a model invocation will be: * time spent waiting for the rate limiter * time spend on the actual model request ## Example ```python from langchain_core.rate_limiters import InMemoryRateLimiter from langchain_groq import ChatGroq groq = ChatGroq(rate_limiter=InMemoryRateLimiter(check_every_n_seconds=1)) groq.invoke('hello') ```

References

#24669 - core[minor]: Add BaseModel.rate_limiter, RateLimiter abstraction and in-memory implementation

Author

eyurtsev

Parents

c623ae66

langchain 20690db4 - core[minor]: Add BaseModel.rate_limiter, RateLimiter abstraction and in-memory implementation (#24669)

langchain
20690db4 - core[minor]: Add BaseModel.rate_limiter, RateLimiter abstraction and in-memory implementation (#24669)