trino
Allow using GCS without providing credentials
#25811
Merged

Allow using GCS without providing credentials #25811

patricklucas
patricklucas32 days ago (edited 30 days ago)

Description

If a JSON key or key file path are not provided, and using access tokens is not enabled, do not attempt to get the Application Default Credentials, instead allowing the GCS client library to use its own default behavior.

This will allow using Trino with custom GCS-compatible endpoints without auth, such as during development and testing.

Additional context and related issues

Fixes #25810

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Add support for using GCS without credentials (#25810)
cla-bot
cla-bot32 days ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

patricklucas
patricklucas32 days ago

I will take a look at the CLA once I get positive indication this change would be accepted.

findinpath
findinpath commented on 2025-05-16
Conversation is marked as resolved
Show resolved
lib/trino-filesystem-gcs/src/main/java/io/trino/filesystem/gcs/GcsStorageFactory.java
9494 }
9595 }
9696 else {
97 credentials = jsonGoogleCredential.orElseGet(() -> {
98 try {
99 return GoogleCredentials.getApplicationDefault();
100 }
101 catch (IOException e) {
102 throw new UncheckedIOException(e);
103 }
104 });
97
credentials = jsonGoogleCredential.orElse(null);
findinpath32 days ago

ideally the PR would have an integration test showcasing the use-case without any credentials.

patricklucas32 days ago

Sure, I'll look into using fake-gcs-server for this. I see some prior art for the S3 FS using MinIO.

ebyhr32 days ago (edited 32 days ago)

Add support for using GCS without credentials

Does real GCS allow accessing without credentials? Is fake-gcs-server for production use? Why don't you fix fake-gcs-server instead? Allowing empty credentials just for tests doesn't make sense to me.

patricklucas31 days ago

Does real GCS allow accessing without credentials?

Yes, buckets can be open to the public and allow anonymous access. For example, the datasets listed on this page: https://cloud.google.com/storage/docs/public-datasets

Someone might, for example, make an Iceberg table available to users without authentication, and I would argue Trino should support that.

This snippet demonstrates accessing one of those buckets with null credentials:

import com.google.cloud.storage.StorageOptions

fun main() {
  val storage = StorageOptions.newBuilder().build().service
  println("Credentials: ${storage.options.credentials}")
  val bucket = storage.get("gcp-public-data-arco-era5")
  println(bucket)
}

Output:

Credentials: null
Bucket{name=gcp-public-data-arco-era5}

However, a call to GoogleCredentials.getApplicationDefault() fails with an IOException.

Is fake-gcs-server for production use?

No, it is not a complete implementation, but very useful for local/offline development and testing.

Why don't you fix fake-gcs-server instead?

There's not really anything to "fix" in fake-gcs-server here. The issue is that Trino requires connectivity to Google's OAuth servers due to how it creates Credentials objects to pass to the SDK.

Trino is the only software with GCS integration I have used where it is not possible to leave credentials unconfigured, allowing the GCS client library to apply its default behavior.

patricklucas31 days ago

Also, I would highlight that the GCS client library itself calls GoogleCredentials.getApplicationDefault() if no Credentials object is provided, meaning the behavior is nearly identical to what Trino does now.

The only difference is that the client library allows anonymous access by suppressing the IOException thrown when no credentials are found:

ServiceOptions.java:

  protected ServiceOptions(
      Class<? extends ServiceFactory<ServiceT, OptionsT>> serviceFactoryClass,
      Class<? extends ServiceRpcFactory<OptionsT>> rpcFactoryClass,
      Builder<ServiceT, OptionsT, ?> builder,
      ServiceDefaults<ServiceT, OptionsT> serviceDefaults) {
...
    credentials = builder.credentials != null ? builder.credentials : defaultCredentials();
...
  }

ServiceOptions.java:

  private static GoogleCredentials defaultCredentials() {
    try {
      return GoogleCredentials.getApplicationDefault();
    } catch (Exception ex) {
      return null;
    }
  }

Whereas Trino throws an exception:

GcsStorageFactory.java:

                    try {
                        return GoogleCredentials.getApplicationDefault();
                    }
                    catch (IOException e) {
                        throw new UncheckedIOException(e);
                    }
patricklucas31 days ago๐Ÿ‘ 1

ideally the PR would have an integration test showcasing the use-case without any credentials.

Thinking about it further, I don't think it's appropriate to add an integration test with fake-gcs-server here, since the purpose of this change is not to add support for it in particular. Instead, what I would do is add a unit test that verifies the behavior that if no credentials options are configured, the Storage instance returned from GcsStorageFactory#create has credentials equal to a default Storage instance:

    @Test
    void testDefaultCredentials()
            throws IOException
    {
        Credentials expectedCredentials = StorageOptions.newBuilder().build().getCredentials();

        // No credentials options are set
        GcsFileSystemConfig config = new GcsFileSystemConfig();

        GcsStorageFactory storageFactory = new GcsStorageFactory(config);

        Credentials actualCredentials;
        try (Storage storage = storageFactory.create(ConnectorIdentity.ofUser("test"))) {
            actualCredentials = storage.getOptions().getCredentials();
        }
        catch (Exception e) {
            throw new RuntimeException(e);
        }

        assertThat(actualCredentials)
                .as("if no GCS credentials configured, should let default flow from the GCS client library")
                .isEqualTo(expectedCredentials);
    }
wendigo30 days ago

Can we do the same and return null when IOException is catched? The proposed unit test looks fine.

patricklucas30 days ago๐Ÿ‘ 1

Sure, will also include a comment with the rationale.

patricklucas patricklucas force pushed from 53938004 to 2cb84cf4 31 days ago
cla-bot
cla-bot31 days ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

ebyhr
ebyhr commented on 2025-05-18
Conversation is marked as resolved
Show resolved
lib/trino-filesystem-gcs/src/test/java/io/trino/filesystem/gcs/TestGcsStorageFactory.java
24
25import static org.assertj.core.api.Assertions.assertThat;
26
27
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
ebyhr31 days ago

Why do we need this annotation?

patricklucas29 days ago

We don't; I erroneously thought it was a convention from looking at other test classes. Fixed.

Conversation is marked as resolved
Show resolved
lib/trino-filesystem-gcs/src/test/java/io/trino/filesystem/gcs/TestGcsStorageFactory.java
25import static org.assertj.core.api.Assertions.assertThat;
26
27@TestInstance(TestInstance.Lifecycle.PER_CLASS)
28
public class TestGcsStorageFactory
ebyhr31 days ago๐Ÿ‘ 1

https://trino.io/docs/current/develop/tests.html

Test classes should be defined as package-private and final.

Conversation is marked as resolved
Show resolved
lib/trino-filesystem-gcs/src/test/java/io/trino/filesystem/gcs/TestGcsStorageFactory.java
42 try (Storage storage = storageFactory.create(ConnectorIdentity.ofUser("test"))) {
43 actualCredentials = storage.getOptions().getCredentials();
44 }
45
catch (Exception e) {
46
throw new RuntimeException(e);
47
}
ebyhr31 days ago๐Ÿ‘ 1

Remove this catch and replace throws IOException with throws Exception.

patricklucas Allow using GCS without providing credentials
58698d51
patricklucas patricklucas force pushed from 2cb84cf4 to 58698d51 29 days ago
cla-bot
cla-bot29 days ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

patricklucas
patricklucas29 days ago

CLA submitted.

wendigo
wendigo27 days ago

@cla-bot check

cla-bot
cla-bot27 days ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot
cla-bot27 days ago

The cla-bot has been summoned, and re-checked this pull request!

Praveen2112
Praveen2112 commented on 2025-05-22
Praveen211226 days ago (edited 26 days ago)

What is the default mode for these Credentials ? The default behaviour is to use the ApplicationDefaultCredentials right

Praveen2112
Praveen2112 commented on 2025-05-22
lib/trino-filesystem-gcs/src/main/java/io/trino/filesystem/gcs/GcsStorageFactory.java
100100 }
101101 catch (IOException e) {
102 throw new UncheckedIOException(e);
102 // This is consistent with the GCP SDK when no credentials are available in the environment
103
return null;
Praveen211226 days ago

Or should we use NoCredentials ?

patricklucas
patricklucas26 days ago

What is the default mode for these Credentials ? The default behaviour is to use the ApplicationDefaultCredentials right

The default behavior of the library as linked to above/in the original issue is to call GoogleCredentials.getApplicationDefault(), but importantly, to return null if it throws an exception (which signifies no application default credentials are available in the environment). This is different from the current behavior of Trino which calls that method, but propagates the exception.

Passing NoCredentials is a way for a user of the library to explicitly indicate that they do not want to authenticate. I don't think that Trino should assume that in any case, but a possible route would be to add a configuration option to do so.

My preference is what I had originallyโ€”if no credentials options are set, simply fall back to the default behavior of the library by not calling setCredentials at all, rather than trying to emulate its behavior by calling GoogleCredentials.getApplicationDefault() ourselves.

github-actions
github-actions5 days ago

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

github-actions github-actions added stale
patricklucas
patricklucas4 days ago

@cla-bot check

cla-bot
cla-bot4 days ago

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot
cla-bot4 days ago

The cla-bot has been summoned, and re-checked this pull request!

patricklucas
patricklucas4 days ago

@wendigo I submitted the CLA to cla@trino.io 25 days ago (19 May)โ€”does it usually take this long, or is it worth checking with someone about getting that verified?

wendigo
wendigo4 days ago

@martint can you check whether you have received @patricklucas CLA?

wendigo
wendigo4 days ago

@patricklucas I can see that the last time CLAs were processed was 2.05: https://github.com/trinodb/cla/commits/master

patricklucas
patricklucas4 days agoโค 1

@cla-bot check

cla-bot cla-bot added cla-signed
cla-bot
cla-bot4 days ago๐Ÿ‘€ 1

The cla-bot has been summoned, and re-checked this pull request!

github-actions github-actions removed stale
wendigo
wendigo approved these changes on 2025-06-13
wendigo wendigo merged 0581cf0d into master 4 days ago
github-actions github-actions added this to the 477 milestone 4 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone