-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: use nullable Int64
and boolean
dtypes in to_dataframe
#786
fix!: use nullable Int64
and boolean
dtypes in to_dataframe
#786
Conversation
…frame` To override this behavior, specify the types for the desired columns with the `dtype` argument.
I'll take a closer look at #776 before finishing this one, as it might mean fewer code paths to cover. I think the BQ Storage API will always be used for |
I did a little bit of experimentation to see what the intermediate It appears https://issuetracker.google.com/144712110 was fixed for FLOAT columns in #314 as of google-cloud-bigquery >= 2.2.0 (That was technically a breaking change [oops]) I might still keep this open so that we can have some explicit tests for different data types. Also, we're relying on PyArrow -> Pandas to pick the right data types, so maybe there's some dtype defaults we can help with still. |
…python-bigquery into b144712110-nullable-pandas-types
to_dataframe
Int64
and boolean
dtype by default in to_dataframe
Re: system test failure:
Didn't we increase the default deadline to 10 minutes? Maybe v3 branch needs a sync? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits, but not essential, looks good.
@@ -14,12 +14,12 @@ First, ensure that the :mod:`pandas` library is installed by running: | |||
|
|||
pip install --upgrade pandas | |||
|
|||
Alternatively, you can install the BigQuery python client library with | |||
Alternatively, you can install the BigQuery Python client library with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
Since already at this, there's at least on other occurrence of "python" not capitalized (line 69), which can also be fixed.
loss-of-precision. | ||
|
||
Returns: | ||
Dict[str, str]: mapping from column names to dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Can be expressed as the annotation of the function return type.
…' into b144712110-nullable-pandas-types
("max_results",), ((None,), (10,),) # Use BQ Storage API. # Use REST API. | ||
) | ||
def test_list_rows_nullable_scalars_dtypes(bigquery_client, scalars_table, max_results): | ||
df = bigquery_client.list_rows( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: I'll need to exclude the INTERVAL column next time we sync with master
deps!: BigQuery Storage and pyarrow are required dependencies (#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) feat!: destination tables are no-longer removed by `create_job` (#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972) feat!: mark the package as type-checked (#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967) fix: improve type annotations for mypy validation (#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117) docs: Add migration guide from version 2.x to 3.x (#1027) Release-As: 3.0.0
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972) feat!: mark the package as type-checked (googleapis#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967) fix: improve type annotations for mypy validation (googleapis#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117) docs: Add migration guide from version 2.x to 3.x (googleapis#1027) Release-As: 3.0.0
deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972) feat!: mark the package as type-checked (googleapis#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967) fix: improve type annotations for mypy validation (googleapis#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117) docs: Add migration guide from version 2.x to 3.x (googleapis#1027) Release-As: 3.0.0
To override this behavior, specify the types for the desired columns with the
dtype
argument.BREAKING CHANGE: uses Int64 type by default to avoid loss-of-precision in results with large integer values
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes https://issuetracker.google.com/144712110 🦕
Fixes #793