cc @itholic
@itholic @HyukjinKwon I didn't find an issue about this on Jira but I hope it's fine to propose a solution. It's a minor bug I encountered while setting up automated testing on Databricks at work.
you cab create an account and create a JIRA ticket.
Let's do not change the existing tests. Otherwise looks fine.
69 | 69 | self.assertEqual(psdf.at[3, "b"], 6) | |
70 | 70 | self.assertEqual(psdf.at[3, "b"], pdf.at[3, "b"]) | |
71 | 71 | self.assert_eq(psdf.at[9, "b"], np.array([0, 0, 0])) | |
72 | self.assert_eq(psdf.at[9, "b"], pdf.at[9, "b"]) | ||
72 | self.assert_eq(psdf.at[9, "b"], pdf.at[9, "b"].to_numpy()) |
I believe the previous one looks more reasonable.
955 | 954 | exception=pe.exception, | |
956 | 955 | error_class="INVALID_TYPE_DF_EQUALITY_ARG", | |
957 | 956 | message_parameters={ | |
958 | "expected_type": f"{ps.DataFrame.__name__}, " | ||
959 | f"{pd.DataFrame.__name__}, " | ||
960 | f"{ps.Series.__name__}, " | ||
961 | f"{pd.Series.__name__}, " | ||
962 | f"{ps.Index.__name__}" | ||
963 | f"{pd.Index.__name__}, ", | ||
964 | "arg_name": "expected", | ||
957 | "expected_type": "Pandas or Pandas-on-Spark DataFrame, Series, or Index", |
+1 This one looks more clearer to me.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
Login to write a write a comment.
What changes were proposed in this pull request?
AttributeError
(see examples below) when mixing Spark DataFrame & Pandas or Pandas-on-Spark DataFrame inassertDataFrameEqual
by flowing into_assert_pandas_almost_equal
/_assert_pandas_equal
instead ofassertAlmostEqual
/assertEqual
.PandasOnSparkTestUtils.assert_eq
, applied the Pandas-on-Spark flow for both paramsleft
andright
, instead of onlyleft
, and clarified the error to specify that a Pandas or Pandas-on-Spark object is expected, since which is not immediately obvious from the current error:DataFrame, DataFrame, Series, Series, IndexIndex
Why are the changes needed?
assertDataFrameEqual
results inAttributeError
when providing a Spark DataFrame as the first argument and a Pandas DataFrame or a Pandas-on-Spark DataFrame as the second argument (when not inheriting from unittest).Does this PR introduce any user-facing change?
PySparkAssertionError
with a message is raised instead ofAttributeError
.PySparkAssertionError
error when mixing Spark & Pandas-on-Spark DataFrames is consistently raised inPandasOnSparkTestUtils.assert_eq
, regardless of which one is left or right.Expected type DataFrame, DataFrame, Series, Series, IndexIndex, for ...
->Expected type Pandas or Pandas-on-Spark DataFrame, Series, or Index for ...
Setup:
Before:
After
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No