Migrate to new model + proxy from server (#5)
* use fork of hf.js to support fully custom endpoints for testing purpose
* proxy textGenerationStream call to the backend to hide token from client
* migrate to patch-package instead of pnpm
* fix issue after merge conflict
* use env var instead of hardcoded value for endpoint
* fix messages not being split between assistant/user
* fix stream response sometimes not split by token
* remove PUBLIC_ from private env variables + rename ENDPOINT to MODEL_ENDPOINT
* only set hf token as private, model can stay public
* move HF_TOKEN to a dynamic env
* fix env var import typo
* remove @microsoft/fetch-event-source
* update parameters to be identical to Python demo
* small refactor to avoid typing issue
* cleanup while loop
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* make comment clearer on what is happening on stream chunks split
* fix chunk spliting not being handled properly
* cleanup model tokens sometimes containing "<|endoftext|>" text
* refactor how we proxy from the server to simplify logic
* use latest version of hf.js
* use .env + .env.local instead of .env.example
* rewrite logic to trim "<|endoftext|>" artifact properly
* update to latest hf.js
* expose env var to Docker during build time for deployment
* remove patch-package
---------
Co-authored-by: Julien Chaumond <julien@huggingface.co>