I am trying to transfer a public dataset to our project. I am specifying a Scheduled Query with a service account. In my project, (the destination project) I have given this service account the bigquery Admin role. I have also made sure the BigQuery Data Transfer service agent has the service account token creator role and the bigquerydatatransfer.serviceAgent role.
Using Terraform I have specified the following data transfer config -
resource "google_bigquery_data_transfer_config" "indications_query_config" {
depends_on = [google_project_iam_member.bq_data_transfer_permissions]
project = module.gcp_project.project_id
display_name = "Open Targets Platform Data Transfer - Indications"
location = "us-west1"
data_source_id = "scheduled_query"
schedule = "14 of month 02:00"
destination_dataset_id = module.bigquery.bigquery_dataset.dataset_id
service_account_name = google_service_account.bigquery_data_transfer_sa.email
params = {
destination_table_name_template = "indication"
write_disposition = "WRITE_APPEND"
query = "SELECT * FROM `bigquery-public-data.open_targets_platform.indication`"
}
}
When this transfer runs, I get the following error message:
Error code 7 : Failed to start BigQuery job. Error: Access Denied: Table bigquery-public-data:open_targets_platform.indication: User does not have permission to query table bigquery-public-data:open_targets_platform.indication, or perhaps it does not exist. User: SA_NAME@PROJECT_ID.iam.gserviceaccount.com.
I went ahead and created a service account key for this service account. I set the env var GOOGLE_APPLICATION_CREDENTIALS as the path to this key and ran the following python code:
from google.cloud import bigquery
import google.auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform'
]
CREDENTIALS, GCP_PROJECT = google.auth.default(scopes=SCOPES)
client = bigquery.Client(credentials=CREDENTIALS, project=GCP_PROJECT)
# Perform a query.
QUERY = (
'SELECT * FROM `bigquery-public-data.open_targets_platform.indication`'
'LIMIT 100')
query_job = client.query(QUERY) # API request
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.id)
When I do this, I am able to query the public dataset. I can also use the debugger to confirm that the bigquery client is indeed using the service account as credentials.
Any idea whats going wrong here?
Solved! Go to Solution.
TL;DR - Give the service agent BigQuery Data Viewer in the project. Subscribe to the public dataset to create a linked dataset in your project. Create/ensure a destination dataset has the same location as the linked dataset (regional vs. multi-regional). Create a scheduled query.
The main issue involves a mismatch in permission granting for the BigQuery Data Transfer service to access a public dataset. Although the service account can directly query the dataset, as demonstrated by the Python script, the BigQuery Data Transfer service requires explicit authorization to utilize these permissions.
To resolve this, specific steps must be taken to grant the necessary access to the Data Transfer service account. First, navigate to the Google Cloud Console, go to IAM & Admin -> IAM, and locate the BigQuery Data Transfer service account, which typically looks like service-[project-number]@gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com
. Edit this service account's roles to include the BigQuery Data Viewer role, which allows the service to read data from BigQuery, including public datasets. After adding this role, save the changes.
For additional verification, you can check the IAM policy for the public dataset itself by going to BigQuery -> Open Targets Platform -> indication, clicking on the Details tab, and then Permissions. Ensure that the BigQuery Data Transfer service account has the "BigQuery Data Viewer" role here as well.
Your Terraform configuration appears to be correctly set up, and no changes are necessary. This setup ensures that while your service account has the required permissions, the Data Transfer service needs its own explicit authorization to access the dataset.
By granting the "BigQuery Data Viewer" role directly to the Data Transfer service account, it gains the necessary read permissions, enabling it to perform the data transfer from the public dataset. This distinction is crucial because the Data Transfer service operates as a separate entity from your regular service account.
For troubleshooting, consider temporarily adjusting the scheduled query interval to a shorter timeframe, such as every hour, to facilitate quicker testing and validation. Additionally, reviewing the logs in Cloud Logging for the BigQuery Data Transfer service can provide further insights if issues persist.
Thank you for your answer. I have added the BigQuery Data Viewer role to my Project IAM. However, when I go to the public datasets permissions, I do not see it appear -
When I rerun the scheduled query I get the same error message:
Error code 7 : Failed to start BigQuery job. Error: Access Denied: Table bigquery-public-data:open_targets_platform.indication: User does not have permission to query table bigquery-public-data:open_targets_platform.indication, or perhaps it does not exist. User: [SA_NAME]@[PROJECT].iam.gserviceaccount.com.
If I subscribe to this dataset, and create a linked dataset in my project, then I am able to see the Data Transfer service agent listed under the bigquery data viewer role. However, if I reconfigure my Scheduled Query transfer to use the following query:
SELECT * FROM `PROJECT.open_targets_platform.indication`
I get the following error:
Error code 5 : Not found: Dataset PROJECT:open_targets_platform was not found in location us-west1; JobID: PROJECT:66885aee-0000-2baa-88b5-d4f547ec4fb4
Which makes sense because the linked database is not a regional database and its location is listed as "US". Is there a way I can specify the source dataset's location as US and the destination as "us-west1"?
So then I tried to use Dataset Copy. I created the following Dataset Copy configuration:
I find that this also fails with a NotFound error.
However, if I use a multi-region destination dataset in the same location as the source dataset, the Scheduled Query works as expected. Which should be enough to get me unblocked for now. It would be nice if the documentation mentioned such a limitation for these services. Unless they already do and I missed it 😛
TL;DR - Give the service agent BigQuery Data Viewer in the project. Subscribe to the public dataset to create a linked dataset in your project. Create/ensure a destination dataset has the same location as the linked dataset (regional vs. multi-regional). Create a scheduled query.
Hi @CaliWee it seems like your setup is mostly correct, but the issue might be related to permissions for accessing the public dataset rather than the destination project. Here are a few steps you can try to resolve the problem:
1. Check Access Permissions for the Public Dataset
Although public datasets are open for querying, your service account still needs explicit permissions—specifically, the roles/bigquery.dataViewer role—on the bigquery-public-data project.
To grant this role, you can run the following command:
2. Validate the BigQuery Transfer Service Agent Role
Ensure that the BigQuery Data Transfer Service Agent role is correctly assigned to the destination project. It’s great that you’ve already added the bigquerydatatransfer.serviceAgent and Service Account Token Creator roles. Just make sure these roles are applied specifically to the BigQuery Data Transfer Service Agent account, rather than your custom service account.
3. Test the Q uery with the Service Account
You mentioned that you successfully ran the query manually using the Python client, which confirms your service account credentials are valid. However, Scheduled Queries via the Data Transfer Service might require stricter permissions than direct queries.
As a test, try assigning the roles/bigquery.dataViewer role to your service account on the bigquery-public-data project and see if this resolves the issue.
4. Verify the Table Path and Existence
Double-check the table path bigquery-public-data.open_targets_platform.indication to ensure it’s accurate. Public datasets sometimes undergo changes or become deprecated. You can list the tables in the dataset with the following command:
5. Explore Alternative Solutions
If the issue persists, you might consider using a different ETL tool to transfer the data. Platforms like Windsor.ai offer flexible connectors for moving data between sources and BigQuery, which can help navigate around strict permissions or operational limitations in certain cases.
Hope this helps!