Fix private to public (#582)
* docs: ✏️ add missing argument in docstring
* feat: 🎸 change the logic of the webhook
We now use the webhook only to get the name of the dataset to handle.
Then, if the dataset is supported (exists, is public, can be gated or
not), update the cache; otherwise, delete the cache.
* feat: 🎸 create a new job if a splits/ response is a cache miss
When a request comes to splits/, if no cache entry exist for the
dataset, and if no job is pending either, we now check on the Hub
if the dataset exists and is supported (ie. not private).
If so, it means that we somehow missed a webhook. We add a job to
the queue, and we respond with a "NotReady" error.
Otherwise, we respond with "NotFound".
This change is a quick fix for https://github.com/huggingface/datasets-server/issues/380#issuecomment-1254740105
Also: to be able to test, we changed the test library "responses"
to a proper HTTP server ("pytest-httpserver" library), which
gives us more flexibility to mock the Hub APIs (both the auth endpoint
and the API datasets endpoint) and test even if we use huggingface_hub
lib in the code.
Also: the tests are a bit more precise (more cases)
* refactor: 💡 dataset_name -> dataset
* refactor: 💡 add intermediate function in dataset.py
also: dataset_name -> dataset. also: fix style
* feat: 🎸 add (/splits) jobs if a /first-rows resp is cache miss
If a request is made to /first-rows and there is no entry in the
cache, we check if it should.
Various cases have to be managed, which makes the logic a bit (too)
convoluted.
We return a FirstRowsNotReady error if:
- the /first-rows response for the split is already in process
- the /splits response for the dataset is still in process
We return a FirstRowsNotReady error, AND we add a /splits job because
something has failed before, if:
- the /splits response for the dataset exists, and the split is part
of the splits of the dataset
- the /splits response for the dataset does not exist, is not in
process, but it should because the dataset is supported.
Note that I didn't add a unit test for every of these cases
* feat: 🎸 update the docker image
* fix: 🐛 pass HF_TOKEN to the API service