Add Presidio scan (#2763)
* add presidio dep
* add SplitPresidioEntitiesScanJobRunner
* implement presidio_scan_entities
* handle nested data + simple cache
* fix analyze
* update lock file
* set to 10k rows
* pin presidio to dev version to fix overflow error
* set max text length
* mask
* only for most liked datasets or email/pii datasets
* add to graph
* style
* add wip tests
* finish tests
* count num rows per entity type
* fix test
* again