Problem with Pyspark and Delta Lake Tables unit-tests
The integration of Spark and Delta Lake tables is seamless and smooth for the most part. Ran into some issues with unit-tests testing the creation and update of the tables when run in conjunction with all existing unit-tests:
@classmethod
@since(0.4)
def isDeltaTable(cls, sparkSession, identifier):
"""
Check if the provided `identifier` string, in this case a file path,
is the root of a Delta table using the given SparkSession.
:param sparkSession: SparkSession to use to perform the check
:param path: location of the table
:return: If the table is a delta table or not
:rtype: bool
Example::
DeltaTable.isDeltaTable(spark, "/path/to/table")
"""
assert sparkSession is not None
> return sparkSession._sc._jvm.io.delta.tables.DeltaTable.isDeltaTable(
sparkSession._jsparkSession, identifier)
E TypeError: 'JavaPackage' object is not callable
../../../anaconda3/envs/rfa/lib/python3.8/site-packages/delta/tables.py:433: TypeError
The call to isDeltaTable is blowing up.
The spark session is global to the entire process running. In this case started by pytest. Below is the relevant Spark Delta Lake table configuration:
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
.config("spark.databricks.delta.schema.autoMerge.enabled", "true") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
Side Note: Add the following or replacing spark.sql.catalog.spark_catalog with below yields varying results.
.config("spark.sql.catalog.local", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
Workaround
The current workaround is to run the Delta Lake specific unit-tests with a separate pytest call.
pytest . --ignore=path\to\test\delta_lake_tests.py
pytest path\to\test\delta_lake_tests.py
It should be noted that adding pytest custom markers to categorize and running those tests by marker (-m) will also fail even though only the selected tests are run. The collection of the tests seems to "pollute" the Spark session.
pytest . -m delta_lake_tests.
References
- docs.databricks.com/delta/quick-start.html
- docs.pytest.org/en/6.2.x/example/markers.html
更多推荐

所有评论(0)