Apache Spark supports processing various types of data. Not all expressions support all data types. The RAPIDS Accelerator for Apache Spark has further restrictions on what types are supported for processing. This tries to document what operations are supported and what data types each operation supports. Because Apache Spark is under active development too and this document was generated against version 3.0.1 of Spark. Most of this should still apply to other versions of Spark, but there may be slight changes.
General limitations
Decimal
The Decimal
type in Spark supports a precision up to 38 digits (128-bits). The RAPIDS Accelerator in most cases stores values up to 64-bits and will support 128-bit in the future. As such the accelerator currently only supports a precision up to 18 digits. Note that decimals are disabled by default in the plugin, because it is supported by a relatively small number of operations presently. This can result in a lot of data movement to and from the GPU, slowing down processing in some cases. Result Decimal
precision and scale follow the same rule as CPU mode in Apache Spark:
* In particular, if we have expressions e1 and e2 with precision/scale p1/s1 and p2/s2
* respectively, then the following operations have the following precision / scale:
*
* Operation Result Precision Result Scale
* ------------------------------------------------------------------------
* e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2)
* e1 * e2 p1 + p2 + 1 s1 + s2
* e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1)
* e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2)
* e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2)
However, Spark inserts PromotePrecision
to CAST both sides to the same type. GPU mode may fall back to CPU even if the result Decimal precision is within 18 digits. For example, Decimal(8,2)
x Decimal(6,3)
resulting in Decimal (15,5)
runs on CPU, because due to PromotePrecision
, GPU mode assumes the result is Decimal(19,6)
. There are even extreme cases where Spark can temporarily return a Decimal value larger than what can be stored in 128-bits and then uses the CheckOverflow
operator to round it to a desired precision and scale. This means that even when the accelerator supports 128-bit decimal, we might not be able to support all operations that Spark can support.
Timestamp
Timestamps in Spark will all be converted to the local time zone before processing and are often converted to UTC before being stored, like in Parquet or ORC. The RAPIDS Accelerator only supports UTC as the time zone for timestamps.
CalendarInterval
In Spark CalendarInterval
s store three values, months, days, and microseconds. Support for this type is still very limited in the accelerator. In some cases only a a subset of the type is supported, like window ranges only support days currently.
Configuration
There are lots of different configuration values that can impact if an operation is supported or not. Some of these are a part of the RAPIDS Accelerator and cover the level of compatibility with Apache Spark. Those are covered here. Others are a part of Apache Spark itself and those are a bit harder to document. The work of updating this to cover that support is still ongoing.
In general though if you ever have any question about why an operation is not running on the GPU you may set spark.rapids.sql.explain
to ALL and it will try to give all of the reasons why this particular operator or expression is on the CPU or GPU.
Key
Types
Type Name | Type Description |
---|---|
BOOLEAN | Holds true or false values. |
BYTE | Signed 8-bit integer value. |
SHORT | Signed 16-bit integer value. |
INT | Signed 32-bit integer value. |
LONG | Signed 64-bit integer value. |
FLOAT | 32-bit floating point value. |
DOUBLE | 64-bit floating point value. |
DATE | A date with no time component. Stored as 32-bit integer with days since Jan 1, 1970. |
TIMESTAMP | A date and time. Stored as 64-bit integer with microseconds since Jan 1, 1970 in the current time zone. |
STRING | A text string. Stored as UTF-8 encoded bytes. |
DECIMAL | A fixed point decimal value with configurable precision and scale. |
NULL | Only stores null values and is typically only used when no other type can be determined from the SQL. |
BINARY | An array of non-nullable bytes. |
CALENDAR | Represents a period of time. Stored as months, days and microseconds. |
ARRAY | A sequence of elements. |
MAP | A set of key value pairs, the keys cannot be null. |
STRUCT | A series of named fields. |
UDT | User defined types and java Objects. These are not standard SQL types. |
Support
Value | Description |
---|---|
S | (Supported) Both Apache Spark and the RAPIDS Accelerator support this type fully. |
(Not Applicable) Neither Spark not the RAPIDS Accelerator support this type in this situation. | |
PS | (Partial Support) Apache Spark supports this type, but the RAPIDS Accelerator only partially supports it. An explanation for what is missing will be included with this. |
NS | (Not Supported) Apache Spark supports this type but the RAPIDS Accelerator does not. |
SparkPlan or Executor Nodes
Apache Spark uses a Directed Acyclic Graph(DAG) of processing to build a query. The nodes in this graph are instances of SparkPlan
and represent various high level operations like doing a filter or project. The operations that the RAPIDS Accelerator supports are described below.
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CoalesceExec | The backend for the dataframe coalesce method | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
CollectLimitExec | Reduce to single partition and apply limit | This is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ExpandExec | The backend for the expand operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
FileSourceScanExec | Reading data from files, often from Hive tables | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
FilterExec | The backend for most filter statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
GenerateExec | The backend for operations that generate more output rows than input rows like explode | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
GlobalLimitExec | Limiting of results across partitions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
LocalLimitExec | Per-partition limiting of results | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ProjectExec | The backend for most select, withColumn and dropColumn statements | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
RangeExec | The backend for range operator | None | Input/Output | S | |||||||||||||||||
SampleExec | The backend for the sample operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
SortExec | The backend for the sort operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
TakeOrderedAndProjectExec | Take the first limit elements as defined by the sortOrder, and do projection if needed | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
UnionExec | The backend for the union operator | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS unionByName will not optionally impute nulls for missing struct fields when the column is a struct and there are non-overlapping fields; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
CustomShuffleReaderExec | A wrapper of shuffle query stage | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
HashAggregateExec | The backend for hash based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ObjectHashAggregateExec | The backend for hash based aggregations supporting TypedImperativeAggregate functions | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
SortAggregateExec | The backend for sort based aggregations | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS only allowed when aggregate buffers can be converted between CPU and GPU | NS | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | PS not allowed for grouping expressions if containing Array or Map as child; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | NS |
DataWritingCommandExec | Writing data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS 128bit decimal only supported for Orc | NS | NS | NS | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS Only supported for Parquet; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | NS |
BatchScanExec | The backend for most file input | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, UDT | NS |
BroadcastExchangeExec | The backend for broadcast exchange of data | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
ShuffleExchangeExec | The backend for most data being exchanged between processes | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS Round-robin partitioning is not supported if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS Round-robin partitioning is not supported for nested structs if spark.sql.execution.sortBeforeRepartition is true; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
BroadcastHashJoinExec | Implementation of join using broadcast data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||
BroadcastNestedLoopJoinExec | Implementation of join using brute force. Full outer joins and joins where the broadcast side matches the join side (e.g.: LeftOuter with left broadcast) are not supported | None | condition (A non-inner join only is supported if the condition expression can be converted to a GPU AST expression) | S | |||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||
CartesianProductExec | Implementation of join using brute force | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
ShuffledHashJoinExec | Implementation of join using hashed shuffled data | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||
Executor | Description | Notes | Param(s) | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
SortMergeJoinExec | Sort merge join, replacing with shuffled hash join | None | leftKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
rightKeys | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||
condition | S | ||||||||||||||||||||
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||
AggregateInPandasExec | The backend for an Aggregation Pandas UDF, this accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS |
ArrowEvalPythonExec | The backend of the Scalar Pandas UDFs. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS |
FlatMapGroupsInPandasExec | The backend for Flat Map Groups Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS |
MapInPandasExec | The backend for Map Pandas Iterator UDF. Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. | None | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS |
WindowInPandasExec | The backend for Window Aggregation Pandas UDF, Accelerates the data transfer between the Java process and the Python process. It also supports scheduling GPU resources for the Python process when enabled. For now it only supports row based window frame. | This is disabled by default because it only supports row based frame for now | Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS |
WindowExec | Window-operator backend | None | partitionSpec | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | NS | NS |
Input/Output | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
Expression and SQL Functions
Inside each node in the DAG there can be one or more trees of expressions that describe various types of processing that happens in that part of the plan. These can be things like adding two numbers together or checking for null. These expressions can have multiple input parameters and one output value. These expressions also can happen in different contexts. Because of how the accelerator works different contexts have different levels of support.
The most common expression context is project
. In this context values from a single input row go through the expression and the result will also be use to produce something in the same row. Be aware that even in the case of aggregation and window operations most of the processing is still done in the project context either before or after the other processing happens.
Aggregation operations like count or sum can take place in either the aggregation
, reduction
, or window
context. aggregation
is when the operation was done while grouping the data by one or more keys. reduction
is when there is no group by and there is a single result for an entire column. window
is for window operations.
The final expression context is AST
or Abstract Syntax Tree. Before explaining AST we first need to explain in detail how project context operations work. Generally for a project context operation the plan Spark developed is read on the CPU and an appropriate set of GPU kernels are selected to do those operations. For example a >= b + 1
. Would result in calling a GPU kernel to add 1
to b
, followed by another kernel that is called to compare a
to that result. The interpretation is happening on the CPU, and the GPU is used to do the processing. For AST the interpretation for some reason cannot happen on the CPU and instead must be done in the GPU kernel itself. An example of this is conditional joins. If you want to join on A.a >= B.b + 1
where A
and B
are separate tables or data frames, the +
and >=
operations cannot run as separate independent kernels because it is done on a combination of rows in both A
and B
. Instead part of the plan that Spark developed is turned into an abstract syntax tree and sent to the GPU where it can be interpreted. The number and types of operations supported in this are limited.
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abs | `abs` | Absolute value | None | project | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Acos | `acos` | Inverse cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Acosh | `acosh` | Inverse hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Add | `+` | Addition | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Alias | Gives a column a name | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
AST | input | S | S | S | S | S | S | S | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | S | S | S | S | S | S | S | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
And | `and` | Logical AND | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
ArrayContains | `array_contains` | Returns a boolean if the array contains the passed in key | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
key | S | S | S | S | S | PS NaN literals are not supported. Columnar input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS NaN literals are not supported. Columnar input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | ||||||||||||||||||||||
ArrayMax | `array_max` | Returns the maximum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | |||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ArrayMin | `array_min` | Returns the minimum value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, STRUCT, UDT | |||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
ArrayTransform | `transform` | Transform elements in an array using the transform function. This is similar to a `map` in functional programming | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Asin | `asin` | Inverse sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Asinh | `asinh` | Inverse hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AtLeastNNonNulls | Checks if number of non null/Nan values is greater than a given value | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Atan | `atan` | Inverse tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Atanh | `atanh` | Inverse hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
AttributeReference | References an input column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
AST | result | S | S | S | S | S | S | S | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
BRound | `bround` | Round an expression to d decimal places using HALF_EVEN rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently | PS result may round slightly differently | S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
BitwiseAnd | `&` | Returns the bitwise AND of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
BitwiseNot | `~` | Returns the bitwise NOT of the operands | None | project | input | S | S | S | S | ||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | input | NS | NS | S | S | ||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseOr | `\|` | Returns the bitwise OR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
BitwiseXor | `^` | Returns the bitwise XOR of the operands | None | project | lhs | S | S | S | S | ||||||||||||||
rhs | S | S | S | S | |||||||||||||||||||
result | S | S | S | S | |||||||||||||||||||
AST | lhs | NS | NS | S | S | ||||||||||||||||||
rhs | NS | NS | S | S | |||||||||||||||||||
result | NS | NS | S | S | |||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CaseWhen | `when` | CASE WHEN expression | None | project | predicate | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
Cbrt | `cbrt` | Cube root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Ceil | `ceiling`, `ceil` | Ceiling of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
CheckOverflow | CheckOverflow after arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
Coalesce | `coalesce` | Returns the first non-null argument if exists. Otherwise, null | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
Concat | `concat` | List/String concatenate | None | project | input | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||
result | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ConcatWs | `concat_ws` | Concatenates multiple input strings or array of strings into a single string using a given separator | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Contains | Contains | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cos | `cos` | Cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cosh | `cosh` | Hyperbolic cosine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Cot | `cot` | Cotangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CreateArray | `array` | Returns an array with the given elements | None | project | arg | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | ||||||||||||||||||||||
CreateMap | `map` | Create a map | None | project | key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | ||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | PS UTC is only supported TZ for child TIMESTAMP | ||||||||
CreateNamedStruct | `named_struct`, `struct` | Creates a struct with the given field names and values | None | project | name | S | |||||||||||||||||
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
CurrentRow$ | Special boundary for a window frame, indicating stopping at the current row | None | project | result | S | ||||||||||||||||||
DateAdd | `date_add` | Returns the date that is num_days after start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateAddInterval | Adds interval to date | None | project | start | S | ||||||||||||||||||
interval | PS month intervals are not supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateDiff | `datediff` | Returns the number of days from startDate to endDate | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
DateFormatClass | `date_format` | Converts timestamp to a value of string in the format specified by the date format | None | project | timestamp | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
strfmt | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
DateSub | `date_sub` | Returns the date that is num_days before start_date | None | project | startDate | S | |||||||||||||||||
days | S | S | S | ||||||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfMonth | `dayofmonth`, `day` | Returns the day of the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfWeek | `dayofweek` | Returns the day of the week (1 = Sunday...7=Saturday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DayOfYear | `dayofyear` | Returns the day of the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
DenseRank | `dense_rank` | Window function that returns the dense rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
Divide | `/` | Division | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | PS Because of Spark's inner workings the full range of decimal precision (even for 128-bit values) is not supported. | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ElementAt | `element_at` | Returns element of array at given(1-based) index in value if column is array. Returns value for the given key in value if column is map | None | project | array/map | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS If it's map, only string is supported.; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||
index/key | NS | NS | NS | PS ints are only supported as array indexes, not as maps keys; Literal value only | NS | NS | NS | NS | NS | PS strings are only supported as map keys, not array indexes; Literal value only | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
EndsWith | Ends with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
EqualNullSafe | `<=>` | Check if the values are equal including nulls <=> | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
EqualTo | `=`, `==` | Check if the values are equal | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Exp | `exp` | Euler's number e raised to a power | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Explode | `explode`, `explode_outer` | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Expm1 | `expm1` | Euler's number e raised to a power minus 1 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Floor | `floor` | Floor of a number | None | project | input | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
FromUnixTime | `from_unixtime` | Get the string from a unix timestamp | None | project | sec | S | |||||||||||||||||
format | PS Only a limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GetArrayItem | Gets the field at `ordinal` in the Array | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||
ordinal | PS Literal value only | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
GetJsonObject | `get_json_object` | Extracts a json object from path | None | project | json | S | |||||||||||||||||
path | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
GetMapValue | Gets Value from a Map based on a key | None | project | map | PS unsupported child types BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, DECIMAL, NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||
key | NS | NS | NS | NS | NS | NS | NS | NS | NS | PS Literal value only | NS | NS | NS | NS | NS | NS | NS | NS | |||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | S | NS | NS | NS | NS | NS | NS | NS | NS | |||||
GetStructField | Gets the named field of the struct | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
GetTimestamp | Gets timestamps from strings using given pattern. | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | ||||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
GreaterThan | `>` | > operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
GreaterThanOrEqual | `>=` | >= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Greatest | `greatest` | Returns the greatest value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Hour | `hour` | Returns the hour component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
If | `if` | IF expression | None | project | predicate | S | |||||||||||||||||
trueValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
falseValue | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
In | `in` | IN operator | None | project | value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
list | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS Literal value only | PS UTC is only supported TZ for TIMESTAMP; Literal value only | PS Literal value only | PS Literal value only | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
InSet | INSET operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||
result | S | ||||||||||||||||||||||
InitCap | `initcap` | Returns str with the first letter of each word in uppercase. All other letters are in lowercase | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
InputFileBlockLength | `input_file_block_length` | Returns the length of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileBlockStart | `input_file_block_start` | Returns the start offset of the block being read, or -1 if not available | None | project | result | S | |||||||||||||||||
InputFileName | `input_file_name` | Returns the name of the file being read, or empty string if not available | None | project | result | S | |||||||||||||||||
IntegralDivide | `div` | Division with a integer result | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
IsNaN | `isnan` | Checks if a value is NaN | None | project | input | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
IsNotNull | `isnotnull` | Checks if a value is not null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | ||||||||||||||||||||||
IsNull | `isnull` | Checks if a value is null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
KnownFloatingPointNormalized | Tag to prevent redundant normalization | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
KnownNotNull | Tag an expression as known to not be null | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, UDT | NS | |||||
Lag | `lag` | Window function that returns N entries behind this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
LambdaFunction | Holds a higher order SQL function | None | project | function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
arguments | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
LastDay | `last_day` | Returns the last day of the month which the date belongs to | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Lead | `lead` | Window function that returns N entries ahead of this one | None | window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS |
offset | S | ||||||||||||||||||||||
default | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, MAP, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Least | `least` | Returns the least value of all parameters, skipping null values | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Length | `length`, `character_length`, `char_length` | String character length or binary byte length | None | project | input | S | NS | ||||||||||||||||
result | S | ||||||||||||||||||||||
LessThan | `<` | < operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
LessThanOrEqual | `<=` | <= operator | None | project | lhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
rhs | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | |||||
rhs | S | S | S | S | S | NS | NS | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | ||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Like | `like` | Like | None | project | src | S | |||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Literal | Holds a static value from the query | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
AST | result | S | S | S | S | S | S | S | NS | PS UTC is only supported TZ for TIMESTAMP | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
Log | `ln` | Natural log | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log10 | `log10` | Log base 10 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log1p | `log1p` | Natural log 1 + expr | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Log2 | `log2` | Log base 2 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Logarithm | `log` | Log variable base | None | project | value | S | |||||||||||||||||
base | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Lower | `lower`, `lcase` | String lowercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
MakeDecimal | Create a Decimal from an unscaled long value for some aggregation optimizations | None | project | input | S | ||||||||||||||||||
result | PS max DECIMAL precision of 18 | ||||||||||||||||||||||
MapEntries | `map_entries` | Returns an unordered array of all entries in the given map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
MapKeys | `map_keys` | Returns an unordered array containing the keys of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
MapValues | `map_values` | Returns an unordered array containing the values of the map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Md5 | `md5` | MD5 hash operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Minute | `minute` | Returns the minute component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
MonotonicallyIncreasingID | `monotonically_increasing_id` | Returns monotonically increasing 64-bit integers | None | project | result | S | |||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Month | `month` | Returns the month from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Multiply | `*` | Multiplication | None | project | lhs | S | S | S | S | S | S | S | |||||||||||
rhs | S | S | S | S | S | S | S | ||||||||||||||||
result | S | S | S | S | S | S | PS Because of Spark's inner workings the full range of decimal precision (even for 128-bit values) is not supported. | ||||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | |||||||||||||||
rhs | NS | NS | S | S | S | S | NS | ||||||||||||||||
result | NS | NS | S | S | S | S | NS | ||||||||||||||||
Murmur3Hash | `hash` | Murmur3 hash operator | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
result | S | ||||||||||||||||||||||
NaNvl | `nanvl` | Evaluates to `left` iff left is not NaN, `right` otherwise | None | project | lhs | S | S | ||||||||||||||||
rhs | S | S | |||||||||||||||||||||
result | S | S | |||||||||||||||||||||
NamedLambdaVariable | A parameter to a higher order SQL function | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
Not | `!`, `not` | Boolean not operator | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Or | `or` | Logical OR | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Pmod | `pmod` | Pmod | None | project | lhs | S | S | S | S | S | S | NS | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
PosExplode | `posexplode_outer`, `posexplode` | Given an input array produces a sequence of rows for each value in the array | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Pow | `pow`, `power` | lhs ^ rhs | None | project | lhs | S | |||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
AST | lhs | S | |||||||||||||||||||||
rhs | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
PreciseTimestampConversion | Expression used internally to convert the TimestampType to Long and back without losing precision, i.e. in microseconds. Used in time windowing | None | project | input | S | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||||||
PromotePrecision | PromotePrecision before arithmetic operations between DecimalType data | None | project | input | S | ||||||||||||||||||
result | S | ||||||||||||||||||||||
PythonUDF | UDF run in an external python process. Does not actually run on the GPU, but the transfer of data to/from it can be accelerated | None | aggregation | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
reduction | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
window | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, CALENDAR, MAP, UDT | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types DECIMAL, NULL, BINARY, MAP | |||||||
Quarter | `quarter` | Returns the quarter of the year for date, in the range 1 to 4 | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
RLike | `rlike` | RLike | This is disabled by default because the implementation is not 100% compatible. See the compatibility guide for more information. | project | str | S | |||||||||||||||||
regexp | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Rand | `random`, `rand` | Generate a random column with i.i.d. uniformly distributed values in [0, 1) | None | project | seed | S | S | ||||||||||||||||
result | S | ||||||||||||||||||||||
Rank | `rank` | Window function that returns the rank value within the aggregation window | None | window | ordering | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS |
result | S | ||||||||||||||||||||||
RegExpReplace | `regexp_replace` | RegExpReplace support for string literal input patterns | This is disabled by default because the implementation is not 100% compatible. See the compatibility guide for more information. | project | str | S | |||||||||||||||||
regex | PS Literal value only | ||||||||||||||||||||||
rep | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Remainder | `%`, `mod` | Remainder or modulo | None | project | lhs | S | S | S | S | S | S | NS | |||||||||||
rhs | S | S | S | S | S | S | NS | ||||||||||||||||
result | S | S | S | S | S | S | NS | ||||||||||||||||
Rint | `rint` | Rounds up a double value to the nearest double equal to an integer | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Round | `round` | Round an expression to d decimal places using HALF_UP rounding mode | None | project | value | S | S | S | S | PS result may round slightly differently | PS result may round slightly differently | S | |||||||||||
scale | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | ||||||||||||||||
RowNumber | `row_number` | Window function that returns the index for the row within the aggregation window | None | window | result | S | |||||||||||||||||
ScalaUDF | User Defined Function, the UDF can choose to implement a RAPIDS accelerated interface to get better performance. | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |||||
Second | `second` | Returns the second component of the string/timestamp | None | project | input | PS UTC is only supported TZ for TIMESTAMP | |||||||||||||||||
result | S | ||||||||||||||||||||||
ShiftLeft | `shiftleft` | Bitwise shift left (<<) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRight | `shiftright` | Bitwise shift right (>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
ShiftRightUnsigned | `shiftrightunsigned` | Bitwise unsigned shift right (>>>) | None | project | value | S | S | ||||||||||||||||
amount | S | ||||||||||||||||||||||
result | S | S | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Signum | `sign`, `signum` | Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
Sin | `sin` | Sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Sinh | `sinh` | Hyperbolic sine | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Size | `size`, `cardinality` | The size of an array or a map | None | project | input | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||
result | S | ||||||||||||||||||||||
SortArray | `sort_array` | Returns a sorted array with the input array and the ascending / descending order | None | project | array | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | |||||||||||||||||
ascendingOrder | S | ||||||||||||||||||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
SortOrder | Sort order | None | project | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | ||||||
SparkPartitionID | `spark_partition_id` | Returns the current partition id | None | project | result | S | |||||||||||||||||
SpecifiedWindowFrame | Specification of the width of the group (or "frame") of input rows around which a window function is evaluated | None | project | lower | S | S | S | S | NS | NS | NS | S | |||||||||||
upper | S | S | S | S | NS | NS | NS | S | |||||||||||||||
result | S | S | S | S | NS | NS | NS | S | |||||||||||||||
Sqrt | `sqrt` | Square root | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
StartsWith | Starts with | None | project | src | S | ||||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringLPad | `lpad` | Pad a string on the left | None | project | str | S | |||||||||||||||||
len | PS Literal value only | ||||||||||||||||||||||
pad | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringLocate | `position`, `locate` | Substring search operator | None | project | substr | PS Literal value only | |||||||||||||||||
str | S | ||||||||||||||||||||||
start | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRPad | `rpad` | Pad a string on the right | None | project | str | S | |||||||||||||||||
len | PS Literal value only | ||||||||||||||||||||||
pad | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringRepeat | `repeat` | StringRepeat operator that repeats the given strings with numbers of times given by repeatTimes | None | project | input | S | |||||||||||||||||
repeatTimes | S | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringReplace | `replace` | StringReplace operator | None | project | src | S | |||||||||||||||||
search | PS Literal value only | ||||||||||||||||||||||
replace | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StringSplit | `split` | Splits `str` around occurrences that match `regex` | None | project | str | S | |||||||||||||||||
regexp | PS very limited subset of regex supported; Literal value only | ||||||||||||||||||||||
limit | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrim | `trim` | StringTrim operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrimLeft | `ltrim` | StringTrimLeft operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
StringTrimRight | `rtrim` | StringTrimRight operator | None | project | src | S | |||||||||||||||||
trimStr | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Substring | `substr`, `substring` | Substring operator | None | project | str | S | NS | ||||||||||||||||
pos | PS Literal value only | ||||||||||||||||||||||
len | PS Literal value only | ||||||||||||||||||||||
result | S | NS | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
SubstringIndex | `substring_index` | substring_index operator | None | project | str | S | |||||||||||||||||
delim | PS only a single character is allowed; Literal value only | ||||||||||||||||||||||
count | PS Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Subtract | `-` | Subtraction | None | project | lhs | S | S | S | S | S | S | S | NS | ||||||||||
rhs | S | S | S | S | S | S | S | NS | |||||||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | lhs | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
rhs | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
Tan | `tan` | Tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Tanh | `tanh` | Hyperbolic tangent | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AST | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
TimeAdd | Adds interval to timestamp | None | project | start | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||
interval | PS month intervals are not supported; Literal value only | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
TimeSub | Subtracts interval from timestamp | None | project | start | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||
interval | PS months not supported; Literal value only | ||||||||||||||||||||||
result | PS UTC is only supported TZ for TIMESTAMP | ||||||||||||||||||||||
ToDegrees | `degrees` | Converts radians to degrees | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
ToRadians | `radians` | Converts degrees to radians | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
ToUnixTimestamp | `to_unix_timestamp` | Returns the UNIX timestamp of the given time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
TransformKeys | `transform_keys` | Transform keys in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
TransformValues | `transform_values` | Transform values in a map using a transform function | None | project | argument | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | |||||||||||||||||
function | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
UnaryMinus | `negative` | Negate a numeric value | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | NS | NS | S | S | S | S | NS | NS | ||||||||||||||
result | NS | NS | S | S | S | S | NS | NS | |||||||||||||||
UnaryPositive | `positive` | A numeric value with a + in front of it | None | project | input | S | S | S | S | S | S | S | NS | ||||||||||
result | S | S | S | S | S | S | S | NS | |||||||||||||||
AST | input | S | S | S | S | S | S | NS | NS | ||||||||||||||
result | S | S | S | S | S | S | NS | NS | |||||||||||||||
UnboundedFollowing$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnboundedPreceding$ | Special boundary for a window frame, indicating all rows preceding the current row | None | project | result | S | ||||||||||||||||||
UnixTimestamp | `unix_timestamp` | Returns the UNIX timestamp of current or specified time | None | project | timeExp | S | PS UTC is only supported TZ for TIMESTAMP | S | |||||||||||||||
format | PS A limited number of formats are supported; Literal value only | ||||||||||||||||||||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
UnscaledValue | Convert a Decimal to an unscaled long value for some aggregation optimizations | None | project | input | PS max DECIMAL precision of 18 | ||||||||||||||||||
result | S | ||||||||||||||||||||||
Upper | `upper`, `ucase` | String uppercase operator | This is not 100% compatible with the Spark version because the Unicode version used by cuDF and the JVM may differ, resulting in some corner-case characters not changing case correctly. | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WeekDay | `weekday` | Returns the day of the week (0 = Monday...6=Sunday) | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
WindowExpression | Calculates a return value for every input row of a table based on a group (or "window") of rows | None | window | windowFunction | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
windowSpec | S | S | S | S | NS | NS | PS max DECIMAL precision of 18 | S | |||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
WindowSpecDefinition | Specification of a window function, indicating the partitioning-expression, the row ordering, and the width of the window | None | project | partition | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |
value | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | |||||
Year | `year` | Returns the year from a date or timestamp | None | project | input | S | |||||||||||||||||
result | S | ||||||||||||||||||||||
AggregateExpression | Aggregate expression | None | aggregation | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
window | aggFunc | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
filter | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
ApproximatePercentile | `percentile_approx`, `approx_percentile` | Approximate percentile | This is disabled by default because The GPU implementation of approx_percentile is not bit-for-bit compatible with Apache Spark. See the compatibility guide for more information. | reduction | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||
percentage | NS | NS | |||||||||||||||||||||
accuracy | NS | ||||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||
aggregation | input | S | S | S | S | S | S | NS | NS | S | |||||||||||||
percentage | S | S | |||||||||||||||||||||
accuracy | S | ||||||||||||||||||||||
result | S | S | S | S | S | S | NS | NS | S | PS unsupported child types DATE, TIMESTAMP | |||||||||||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||
percentage | NS | NS | |||||||||||||||||||||
accuracy | NS | ||||||||||||||||||||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||||||||
Average | `avg`, `mean` | Average aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | |||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | |||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
CollectList | `collect_list` | Collect a list of non-unique elements, not supported in reduction | None | reduction | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS |
result | NS | ||||||||||||||||||||||
aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | ||||||||||||||||||||||
CollectSet | `collect_set` | Collect a set of unique elements, not supported in reduction | None | reduction | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS |
result | NS | ||||||||||||||||||||||
aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS | ||||
result | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | ||||||||||||||||||||||
Count | `count` | Count aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS |
result | S | ||||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | ||||
result | S | ||||||||||||||||||||||
window | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | ||||
result | S | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
First | `first_value`, `first` | first aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Last | `last`, `last_value` | last aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, UDT | NS | |||||
reduction | input | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | ||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |||||
window | input | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | ||||
result | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||
Max | `max` | Max aggregate operator | None | aggregation | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
reduction | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
window | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
Min | `min` | Min aggregate operator | None | aggregation | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
reduction | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
window | input | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | ||||||
PivotFirst | PivotFirst operator | None | aggregation | pivotColumn | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | NS | NS | |
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS | |||||
reduction | pivotColumn | S | S | S | S | S | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | PS Input must not contain NaNs and spark.rapids.sql.hasNans must be false. | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | NS | NS | ||||
valueColumn | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | NS | NS | |||||
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types NULL, BINARY, CALENDAR, ARRAY, MAP, STRUCT, UDT | NS | NS | NS | |||||
StddevPop | `stddev_pop` | Aggregation computing population standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
StddevSamp | `stddev_samp`, `std`, `stddev` | Aggregation computing sample standard deviation | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
Sum | `sum` | Sum aggregate operator | None | aggregation | input | S | S | S | S | S | S | S | |||||||||||
result | S | S | S | ||||||||||||||||||||
reduction | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
window | input | S | S | S | S | S | S | S | |||||||||||||||
result | S | S | S | ||||||||||||||||||||
VariancePop | `var_pop` | Aggregation computing population variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
Expression | SQL Functions(s) | Description | Notes | Context | Param/Output | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
VarianceSamp | `var_samp`, `variance` | Aggregation computing sample variance | None | reduction | input | NS | |||||||||||||||||
result | NS | ||||||||||||||||||||||
aggregation | input | S | |||||||||||||||||||||
result | S | ||||||||||||||||||||||
window | input | NS | |||||||||||||||||||||
result | NS | ||||||||||||||||||||||
NormalizeNaNAndZero | Normalize NaN and zero | None | project | input | S | S | |||||||||||||||||
result | S | S | |||||||||||||||||||||
ScalarSubquery | Subquery that will return only one row and one column | None | project | result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | |||||||
HiveGenericUDF | Hive Generic UDF, support requires the UDF to implement a RAPIDS accelerated interface | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |||||
HiveSimpleUDF | Hive UDF, support requires the UDF to implement a RAPIDS accelerated interface | None | project | param | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS | |
result | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | S | S | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types UDT | NS |
Casting
The above table does not show what is and is not supported for cast. This table shows the matrix of supported casts. Nested types like MAP, Struct, and Array can only be cast if the child types can be cast.
Some of the casts to/from string on the GPU are not 100% the same and are disabled by default. Please see the configs for more details on these specific cases.
Please note that even though casting from one type to another is supported by Spark it does not mean they all produce usable results. For example casting from a date to a boolean always produces a null. This is for Hive compatibility and the accelerator produces the same result.
AnsiCast
TO | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | ||||||||
BYTE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
SHORT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
INT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
LONG | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
FLOAT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DOUBLE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DATE | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
TIMESTAMP | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | NS | S | S | |||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | NS | NS | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | NS | PS The array's child type must also support being cast to the desired child type; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
MAP | NS | PS the map's key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
STRUCT | PS the struct's children must also support being cast to string | PS the struct's children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
UDT | NS | NS |
Cast
TO | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT | ||
FROM | BOOLEAN | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | ||||||||
BYTE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
SHORT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
INT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
LONG | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | ||||||||
FLOAT | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DOUBLE | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | PS Conversion may produce different results and requires spark.rapids.sql.castFloatToString.enabled to be true. | S | |||||||||
DATE | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
TIMESTAMP | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | ||||||||
STRING | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | ||||||
DECIMAL | NS | S | S | S | S | S | S | NS | S | S | |||||||||
NULL | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | NS | NS | NS | |
BINARY | NS | NS | |||||||||||||||||
CALENDAR | NS | NS | |||||||||||||||||
ARRAY | NS | PS The array's child type must also support being cast to the desired child type; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
MAP | NS | PS the map's key and value must also support being cast to the desired child types; UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
STRUCT | PS the struct's children must also support being cast to string | PS the struct's children must also support being cast to the desired child type(s); UTC is only supported TZ for child TIMESTAMP; unsupported child types CALENDAR, UDT | |||||||||||||||||
UDT | NS | NS |
Partitioning
When transferring data between different tasks the data is partitioned in specific ways depending on requirements in the plan. Be aware that the types included below are only for rows that impact where the data is partitioned. So for example if we are doing a join on the column a
the data would be hash partitioned on a
, but all of the other columns in the same data frame as a
don’t show up in the table. They are controlled by the rules for ShuffleExchangeExec
which uses the Partitioning
.
Partition | Description | Notes | Param | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HashPartitioning | Hash based partitioning | None | hash_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | S | NS | NS | NS | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, MAP, UDT | NS |
RangePartitioning | Range partitioning | None | order_key | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | S | NS | NS | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, CALENDAR, ARRAY, UDT | NS | |
RoundRobinPartitioning | Round robin partitioning | None | |||||||||||||||||||
SinglePartition$ | Single partitioning | None |
Input/Output
For Input and Output it is not cleanly exposed what types are supported and which are not. This table tries to clarify that. Be aware that some types may be disabled in some cases for either reads or writes because of processing limitations, like rebasing dates or timestamps, or for a lack of type coercion support.
Format | Direction | BOOLEAN | BYTE | SHORT | INT | LONG | FLOAT | DOUBLE | DATE | TIMESTAMP | STRING | DECIMAL | NULL | BINARY | CALENDAR | ARRAY | MAP | STRUCT | UDT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CSV | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | NS | NS | ||||||
Write | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | NS | |||||||
ORC | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | S | NS | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | PS UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, MAP, UDT | NS | |||
Parquet | Read | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | NS | ||
Write | S | S | S | S | S | S | S | S | PS UTC is only supported TZ for TIMESTAMP | S | PS max DECIMAL precision of 18 | NS | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | PS max child DECIMAL precision of 18; UTC is only supported TZ for child TIMESTAMP; unsupported child types BINARY, UDT | NS |