Class VortexScan
- All Implemented Interfaces:
org.apache.spark.sql.connector.read.Scan,org.apache.spark.sql.connector.read.SupportsReportStatistics
Scan over a table of Vortex files.
Implements SupportsReportStatistics to surface both the row count Vortex records in each file footer and a
Spark scan-size estimate. The byte estimate starts from the on-storage file sizes collected by
MultiFileDataSource, then follows Spark's file scan convention by applying the SQL file-compression factor
and scaling by the pushed read schema's default size relative to the full table schema's default size. When the
listing did not return a size for one or more files the file-byte total is extrapolated before Spark scaling is
applied.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.spark.sql.connector.read.Scan
org.apache.spark.sql.connector.read.Scan.ColumnarSupportMode -
Constructor Summary
ConstructorsConstructorDescriptionVortexScan(List<String> paths, List<org.apache.spark.sql.connector.catalog.Column> tableColumns, List<org.apache.spark.sql.connector.catalog.Column> readColumns, org.apache.spark.sql.connector.expressions.filter.Predicate[] pushedPredicates, Map<String, String> formatOptions) Creates a new VortexScan for the specified file paths and columns. -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.spark.sql.connector.read.Scan.ColumnarSupportModeReturns the columnar support mode for this scan.Logging-friendly readable description of the scan source.org.apache.spark.sql.connector.read.StatisticsReturns statistics for this scan.org.apache.spark.sql.types.StructTypeReturns the schema for the data that will be read by this scan.org.apache.spark.sql.connector.read.BatchtoBatch()Converts this scan to a Batch for execution.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.sql.connector.read.Scan
reportDriverMetrics, supportedCustomMetrics, toContinuousStream, toMicroBatchStream
-
Constructor Details
-
VortexScan
public VortexScan(List<String> paths, List<org.apache.spark.sql.connector.catalog.Column> tableColumns, List<org.apache.spark.sql.connector.catalog.Column> readColumns, org.apache.spark.sql.connector.expressions.filter.Predicate[] pushedPredicates, Map<String, String> formatOptions) Creates a new VortexScan for the specified file paths and columns. The caller is responsible for passing immutable collections; the constructor does not copy.- Parameters:
paths- the list of Vortex file paths to scantableColumns- the full table columns before projection pushdownreadColumns- the list of columns to read from the filespushedPredicates- predicates pushed down by Spark;nullor empty means no pushdown
-
-
Method Details
-
readSchema
public org.apache.spark.sql.types.StructType readSchema()Returns the schema for the data that will be read by this scan.The schema is constructed from the read columns that were specified when this scan was created.
- Specified by:
readSchemain interfaceorg.apache.spark.sql.connector.read.Scan- Returns:
- the StructType representing the schema of the read data
-
description
Logging-friendly readable description of the scan source.- Specified by:
descriptionin interfaceorg.apache.spark.sql.connector.read.Scan
-
toBatch
public org.apache.spark.sql.connector.read.Batch toBatch()Converts this scan to a Batch for execution.Creates a VortexBatchExec that will handle the actual reading of the specified files and columns.
- Specified by:
toBatchin interfaceorg.apache.spark.sql.connector.read.Scan- Returns:
- a Batch implementation for executing this scan
-
columnarSupportMode
public org.apache.spark.sql.connector.read.Scan.ColumnarSupportMode columnarSupportMode()Returns the columnar support mode for this scan.Vortex always provides columnar data access, so this method always returns SUPPORTED.
- Specified by:
columnarSupportModein interfaceorg.apache.spark.sql.connector.read.Scan- Returns:
- ColumnarSupportMode.SUPPORTED
-
estimateStatistics
public org.apache.spark.sql.connector.read.Statistics estimateStatistics()Returns statistics for this scan.Opens the Vortex
DataSourceon first invocation and caches the result. The row count is taken from the data source (sum of file-footer row counts; extrapolated from the first opened file when other files are deferred).Statistics.sizeInBytes()is derived from the per-file sizes reported by the filesystem listing, then adjusted by Spark's compression factor and the ratio between the pushed read schema and the full table schema. When a listing did not return a size for some file the file-byte total is extrapolated. When no file size is known at all the value is left empty so Spark falls back to its default heuristic.- Specified by:
estimateStatisticsin interfaceorg.apache.spark.sql.connector.read.SupportsReportStatistics- Returns:
- statistics with row-count and Spark scan-size estimates
-