-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid policy tags 403 error in load_table_from_dataframe
#557
fix: avoid policy tags 403 error in load_table_from_dataframe
#557
Conversation
} | ||
if mode is not None: | ||
self._properties["mode"] = mode.upper() | ||
if description is not _DEFAULT_VALUE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shollyman This is one of the key changes: we no longer set the resource value for "description" if it's not explicitly set.
We already omit policy_tags
from the resource if none (though arguably it should get the same treatment so that someone can unset policy tags from Python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called that out as a possibility in #558, but that'd require updating our field mask logic to support sub-fields, which gets into some hairy string parsing (perhaps not all that hairy, as it could be as simple as split on '.'
, but is definitely a departure from what we've been doing).
Also, it might mean that we'd have to introduce a field mask to our load job methods. Based on the error message we're seeing, it sounds like it's possible to make updates to fields like policy tags from a load job.
field for field in table.schema if field.name in columns_and_indexes | ||
# Field description and policy tags are not needed to | ||
# serialize a data frame. | ||
SchemaField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual bug fix. Rather than populate all properties of schema field from the table schema, just populate the minimum we need to convert to parquet/CSV and then upload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.
Also, check that sent schema matches DataFrame order, not table order
} | ||
if mode is not None: | ||
self._properties["mode"] = mode.upper() | ||
if description is not _DEFAULT_VALUE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.
field for field in table.schema if field.name in columns_and_indexes | ||
# Field description and policy tags are not needed to | ||
# serialize a data frame. | ||
SchemaField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.
For posterity, here is the error you get if you include
|
In internal issue 182204971, as customer is encountering a 403 error for missing permissions to set policy tags on a table when trying to append a dataframe to a table with
load_table_from_dataframe
. This is because we get the schema from the table and then pass it back to the API. We only need to set the field names (and maybe type + mode) in this case.Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal bug 182204971 🦕