mlflow
Improve error message if user tries to set tracking URI to UC
#7896
Merged

Improve error message if user tries to set tracking URI to UC #7896

smurching
smurching2 years ago

Related Issues/PRs

This PR is similar to #7863, but for the MLflow tracking client

#xxx

What changes are proposed in this pull request?

Similarly to how #7863 reserved the databricks-uc URI scheme for the registry client, this PR reserves the databricks-uc URI scheme for the MLflow tracking client, so that users see a better + more actionable error message if they call mlflow.set_tracking_uri("databricks-uc")

How is this patch tested?

Unit tests, also manually verified the updated error message.

Before this PR:

>>> import mlflow; mlflow.set_tracking_uri("databricks-uc"); mlflow.log_param("a", "b")
Traceback (most recent call last):
  File "/Users/sid.murching/mlflow/mlflow/tracking/registry.py", line 77, in get_store_builder
    store_builder = self._registry[scheme]
KeyError: 'databricks-uc'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 544, in log_param
    run_id = _get_or_start_run().info.run_id
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 1552, in _get_or_start_run
    return start_run()
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 278, in start_run
    client = MlflowClient()
  File "/Users/sid.murching/mlflow/mlflow/tracking/client.py", line 69, in __init__
    self._tracking_client = TrackingServiceClient(final_tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 51, in __init__
    self.store
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 55, in store
    return utils._get_store(self.tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/utils.py", line 189, in _get_store
    return _tracking_store_registry.get_store(store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
    return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 48, in _get_store_with_resolved_uri
    builder = self.get_store_builder(resolved_store_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/registry.py", line 79, in get_store_builder
    raise UnsupportedModelRegistryStoreURIException(
mlflow.tracking.registry.UnsupportedModelRegistryStoreURIException:  Model registry functionality is unavailable; got unsupported URI 'databricks-uc' for model registry data storage. Supported URI schemes are: ['', 'file', 'databricks', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql', 'file-plugin']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations.

After this PR:

>>> import mlflow; mlflow.set_tracking_uri("databricks-uc"); mlflow.log_param("a", "b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 544, in log_param
    run_id = _get_or_start_run().info.run_id
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 1552, in _get_or_start_run
    return start_run()
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 278, in start_run
    client = MlflowClient()
  File "/Users/sid.murching/mlflow/mlflow/tracking/client.py", line 69, in __init__
    self._tracking_client = TrackingServiceClient(final_tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 51, in __init__
    self.store
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 55, in store
    return utils._get_store(self.tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/utils.py", line 207, in _get_store
    return _tracking_store_registry.get_store(store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
    return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 49, in _get_store_with_resolved_uri
    return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/utils.py", line 177, in _get_databricks_uc_rest_store
    raise MlflowException(
mlflow.exceptions.MlflowException: Detected Unity Catalog tracking URI 'databricks-uc'. Setting the tracking URI to a Unity Catalog backend is currently unsupported. Please specify a different tracking URI via mlflow.set_tracking_uri, with one of the following supported schemes: ['', 'file', 'databricks', 'databricks-uc', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql', 'file-plugin']. If you're trying to access models in the Unity Catalog, please upgrade to the latest version of the MLflow Python client, then specify a Unity Catalog model registry URI via mlflow.set_registry_uri('databricks-uc') or mlflow.set_registry_uri('databricks-uc://profile_name'), where 'profile_name' is the name of the Databricks CLI profile to use for authentication. Be sure to leave the tracking URI configured to use one of the supported schemes listed above.
  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests (describe details, including test results, below)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes
smurching Improve error message if user tries to set tracking URI to UC
aa2a399d
mlflow-automation
mlflow-automation2 years ago (edited 2 years ago)

Documentation preview for c3ecf40 will be available here when this CircleCI job completes successfully.

More info
smurching line length
4fc91485
smurching smurching requested a review from harupy harupy 2 years ago
smurching smurching added rn/none
smurching Tweak message
069232ff
harupy
harupy commented on 2023-02-27
mlflow/tracking/_tracking_service/utils.py
177 global _tracking_store_registry
178 raise MlflowException(
179 f"Detected Unity Catalog tracking URI '{store_uri}'. "
180
f"Setting the tracking URI to a Unity Catalog backend is not supported in the current "
harupy2 years ago👍 1
Suggested change
f"Setting the tracking URI to a Unity Catalog backend is not supported in the current "
"Setting the tracking URI to a Unity Catalog backend is not supported in the current "

Can we remove useless f-strings?

harupy
harupy commented on 2023-02-27
mlflow/tracking/_tracking_service/utils.py
181 f"version of the MLflow client ({VERSION}). "
182 f"Please specify a different tracking URI via mlflow.set_tracking_uri, with "
183 f"one of the supported schemes: "
184
f"{list(_tracking_store_registry._registry.keys())}. "
harupy2 years ago

The error message contains 'databricks-uc'. Should we remove it if it's unsupported?


mlflow.exceptions.MlflowException: Detected Unity Catalog tracking URI 'databricks-uc'. Setting the tracking URI to a Unity Catalog backend is currently unsupported. Please specify a different tracking URI via mlflow.set_tracking_uri, with one of the following supported schemes: ['', 'file', 'databricks', 👉 'databricks-uc', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql', 'file-plugin']. If you're trying to access models in the Unity Catalog, please upgrade to the latest version of the MLflow Python client, then specify a Unity Catalog model registry URI via mlflow.set_registry_uri('databricks-uc') or mlflow.set_registry_uri('databricks-uc://profile_name'), where 'profile_name' is the name of the Databricks CLI profile to use for authentication. Be sure to leave the tracking URI configured to use one of the supported schemes listed above.

smurching2 years ago

Nice catch! Yes, good call

smurching2 years ago (edited 2 years ago)

Verified after the latest commit that the message doesn't suggest "databricks-uc" as a supported scheme:

~/mlflow reserve-datag-uri-schemepython                                                                                      Py base Py 3.8.13 09:01:08 PM
Python 3.8.13 (default, Jan 25 2023, 19:47:36) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import mlflow; mlflow.set_tracking_uri("databricks-uc"); mlflow.log_param("a", "b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 544, in log_param
    run_id = _get_or_start_run().info.run_id
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 1552, in _get_or_start_run
    return start_run()
  File "/Users/sid.murching/mlflow/mlflow/tracking/fluent.py", line 278, in start_run
    client = MlflowClient()
  File "/Users/sid.murching/mlflow/mlflow/tracking/client.py", line 69, in __init__
    self._tracking_client = TrackingServiceClient(final_tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 51, in __init__
    self.store
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/client.py", line 55, in store
    return utils._get_store(self.tracking_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/utils.py", line 218, in _get_store
    return _tracking_store_registry.get_store(store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
    return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/registry.py", line 49, in _get_store_with_resolved_uri
    return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)
  File "/Users/sid.murching/mlflow/mlflow/tracking/_tracking_service/utils.py", line 185, in _get_databricks_uc_rest_store
    raise MlflowException(
mlflow.exceptions.MlflowException: Detected Unity Catalog tracking URI 'databricks-uc'. Setting the tracking URI to a Unity Catalog backend is not supported in the current version of the MLflow client (2.1.2.dev0). Please specify a different tracking URI via mlflow.set_tracking_uri, with one of the supported schemes: ['', 'file', 'databricks', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql', 'file-plugin']. If you're trying to access models in the Unity Catalog, please upgrade to the latest version of the MLflow Python client, then specify a Unity Catalog model registry URI via mlflow.set_registry_uri('databricks-uc') or mlflow.set_registry_uri('databricks-uc://profile_name'), where 'profile_name' is the name of the Databricks CLI profile to use for authentication. Be sure to leave the tracking URI configured to use one of the supported schemes listed above.
smurching Update, and remove use of databricks-uc hardcoded string
6daf825b
smurching smurching requested a review from harupy harupy 2 years ago
harupy
harupy commented on 2023-02-27
Conversation is marked as resolved
Show resolved
mlflow/tracking/_tracking_service/utils.py
176 from mlflow.version import VERSION
177
178 global _tracking_store_registry
179
supported_schemes = list(
180
filter(
181
lambda scheme: scheme != _DATABRICKS_UNITY_CATALOG_SCHEME,
182
list(_tracking_store_registry._registry.keys()),
183
)
184
)
harupy2 years ago

I think list comprehension is simpler:

Suggested change
supported_schemes = list(
filter(
lambda scheme: scheme != _DATABRICKS_UNITY_CATALOG_SCHEME,
list(_tracking_store_registry._registry.keys()),
)
)
supported_schemes = [
scheme
for scheme in _tracking_store_registry._registry.keys()
if scheme != _DATABRICKS_UNITY_CATALOG_SCHEME
]
harupy
harupy approved these changes on 2023-02-27
harupy2 years ago❤ 1
smurching Update mlflow/tracking/_tracking_service/utils.py
3ec655f3
smurching Fix pylint
c3ecf40e
smurching smurching merged 8cf6202e into master 2 years ago
jswetzen
jswetzen1 year ago (edited 1 year ago)

@smurching or @harupy I'm running into this error when trying to log to Unity Catalog from my laptop. Does this mean that UC logging is only supported from Databricks compute nodes and not externally? I've gathered from the documentation that UC is the recommended place to store models but I would then also like to do this externally.

smurching
smurching1 year ago

@jswetzen good question, you should be able to access models in UC externally (curious if you can share more details about the use case for doing so). MLflow tracking in UC is not supported, but you can target models in UC for model registry using mlflow.set_registry_uri("databricks"). Make sure to use the latest/a sufficiently-new version of the MLflow client (versions 2.4.1 and above should suffice), let me know if that works & thanks!

jswetzen
jswetzen1 year ago

@smurching I didn't get the databricks extension running in VSCode yet, so I can't run remotely but would like to track my training in Databricks while developing.
I had mixed up set_tracking_uri and set_registry_uri and thought my problem when using databricks-uc was local vs. Databricks execution instead of tracking vs. registry storage. Thanks for the swift clarification, I see now that there's feature parity when running locally.

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone