[Spark] Learning Spark(1) - Spark Jobs, Stages, Tasks

1 minute read


Learning Spark(1) - Spark Jobs, Stages, Tasks

  • Spark Jobs
  • Spark Stages
  • Spark Tasks

Spark Jobs

  • During interactive sessions with Spark shells, the driver converts your Spark application into one or more Spark jobs.
  • It then transforms each job into a DAG.
  • This, in essence, is Spark’s execution plan, where each node within a DAG could be a single or multiple Spark stages.


Spark Stages

  • As part of the DAG nodes, stages are created based on what operations can be per‐ formed serially or in parallel
  • Not all Spark operations can happen in a single stage, so they may be divided into multiple stages.
  • Often stages are delineated on the operator’s computation boundaries, where they dictate data transfer among Spark executors.


Spark Tasks

  • Each stage is comprised of Spark tasks (a unit of execution), which are then federated across each Spark executor; each task maps to a single core and works on a single partition of data.
  • As such, an executor with 16 cores can have 16 or more tasks working on 16 or more partitions in parallel, making the execution of Spark’s tasks exceedingly parallel.


ref

  • Learning Spark, 2nd Edition, by Jules S. Damji, Brooke Wenig, Tathagata Das, and Denny Lee. Copyright 2020 Databricks, Inc., 978-1-492-05004-9.

Tags:

Categories:

Updated: