Fix private to public (#582)

Commit

3 years ago

Fix private to public (#582) * docs: ✏️ add missing argument in docstring * feat: 🎸 change the logic of the webhook We now use the webhook only to get the name of the dataset to handle. Then, if the dataset is supported (exists, is public, can be gated or not), update the cache; otherwise, delete the cache. * feat: 🎸 create a new job if a splits/ response is a cache miss When a request comes to splits/, if no cache entry exist for the dataset, and if no job is pending either, we now check on the Hub if the dataset exists and is supported (ie. not private). If so, it means that we somehow missed a webhook. We add a job to the queue, and we respond with a "NotReady" error. Otherwise, we respond with "NotFound". This change is a quick fix for https://github.com/huggingface/datasets-server/issues/380#issuecomment-1254740105 Also: to be able to test, we changed the test library "responses" to a proper HTTP server ("pytest-httpserver" library), which gives us more flexibility to mock the Hub APIs (both the auth endpoint and the API datasets endpoint) and test even if we use huggingface_hub lib in the code. Also: the tests are a bit more precise (more cases) * refactor: 💡 dataset_name -> dataset * refactor: 💡 add intermediate function in dataset.py also: dataset_name -> dataset. also: fix style * feat: 🎸 add (/splits) jobs if a /first-rows resp is cache miss If a request is made to /first-rows and there is no entry in the cache, we check if it should. Various cases have to be managed, which makes the logic a bit (too) convoluted. We return a FirstRowsNotReady error if: - the /first-rows response for the split is already in process - the /splits response for the dataset is still in process We return a FirstRowsNotReady error, AND we add a /splits job because something has failed before, if: - the /splits response for the dataset exists, and the split is part of the splits of the dataset - the /splits response for the dataset does not exist, is not in process, but it should because the dataset is supported. Note that I didn't add a unit test for every of these cases * feat: 🎸 update the docker image * fix: 🐛 pass HF_TOKEN to the API service

References

#582 - Fix private to public

Author

severo

Parents

6ef3c22b

dataset-viewer b7a34ece - Fix private to public (#582)

dataset-viewer
b7a34ece - Fix private to public (#582)