dataset-viewer
7f69df04 - feat: ๐ŸŽธ make the queue agnostic to the types of jobs (#608)

Commit
3 years ago
feat: ๐ŸŽธ make the queue agnostic to the types of jobs (#608) Important changes: - the queue is now agnostic to the types of jobs. libqueue now also provides a Worker class that manages the jobs. The service `services/worker` does not exist anymore and is replaced with two small projects: `workers/splits` and `workers/first_rows`. It will make it easier to contribute a new worker. - upgrade the datasets library - for simplicity: we remove the concept of job "retries" because we never really evaluated if it helps - remove the `created_at` field from /admin/pending_jobs for coherence - docker: install libsndfile1 from the ubtuntu repository instead of building from source - tests: fix a small issue with the authentication, see https://github.com/huggingface/datasets/issues/4875#issuecomment-1280744233 --- all commits: * feat: ๐ŸŽธ make the queue agnostic to the types of jobs Before we had two collections: for splits and for first-rows jobs. Now only one collection name "jobs", with a field "type". Note that the job arguments are still restricted to dataset (required) and optionally config and split. BREAKING CHANGE: ๐Ÿงจ two collections are removed and a new one is created. The function names have changed too. * feat: ๐ŸŽธ publish new version * feat: ๐ŸŽธ upgrade to libqueue 0.3.0 * feat: ๐ŸŽธ remove created_at field in pending_jobs * style: ๐Ÿ’„ fix style * feat: ๐ŸŽธ upgrade libqueue to 0.3.0 * refactor: ๐Ÿ’ก use an enum to prevent typos * fix: ๐Ÿ› fix mypy * feat: ๐ŸŽธ upgrade libqueue to 0.3.0 and datasets * test: ๐Ÿ’ fix test * refactor: ๐Ÿ’ก pack the queue functions into a Queue class * refactor: ๐Ÿ’ก use relative imports * feat: ๐ŸŽธ upgrade to libqueue 0.3.1 * refactor: ๐Ÿ’ก use relative imports * feat: ๐ŸŽธ upgrade to libqueue 0.3.1 * refactor: ๐Ÿ’ก use relative imports * feat: ๐ŸŽธ upgrade to libqueue 0.3.1 * refactor: ๐Ÿ’ก use a common Worker class for the loop logic * refactor: ๐Ÿ’ก simplify the code * refactor: ๐Ÿ’ก factor process_job in Worker, and remove refresh refresh... functions are now the "compute" abstract method * test: ๐Ÿ’ temporarily disable unrelated failing tests * test: ๐Ÿ’ fix tests * refactor: ๐Ÿ’ก add Worker to libqueue * chore: ๐Ÿค– install types * feat: ๐ŸŽธ upgrade to libqueue 0.3.2 also: move types-requests dependency to dev dependencies. * refactor: ๐Ÿ’ก new project isolating the /first-rows worker Note: we removed apache-beam for now because of an issue with the installation It must be added again later. * feat: ๐ŸŽธ create a new project: worker_splits it only contains the splits/ worker * chore: ๐Ÿค– add the commented dependency to think to reinstall it * feat: ๐ŸŽธ replace services/workers with the workers/ beware: the docker images don't exist, we will have to update * ci: ๐ŸŽก fix argument name * fix: ๐Ÿ› fix details and upgrade docker images for admin and api * feat: ๐ŸŽธ upgrade docker images for the two workers * fix: ๐Ÿ› reinstall apache beam, pinned to 2.41.0 * fix: ๐Ÿ› use absolute imports, not relative imports * fix: ๐Ÿ› upgrade httplib2 to remove safety alert * test: ๐Ÿ’ hack the tests order to fix the CI? * test: ๐Ÿ’ restore the tests order to show the problem if the tests fail, it means that a side effect occurs somewhere * test: ๐Ÿ’ force datasets to patch csv for streaming every time see https://github.com/huggingface/datasets/issues/4875#issuecomment-1280821172 * style: ๐Ÿ’„ fix style * test: ๐Ÿ’ don't store the HF token on the disk we explicitely pass it as an argument, so no need to store it on the disk * chore: ๐Ÿค– install libsndfile1 from repos instead of building it now the current package version is 1.0.31, no need to build it from source. * chore: ๐Ÿค– add a missing package to have ICU work * feat: ๐ŸŽธ update the docker images
Author
Parents
Loading