Skip to content

[BUG] Use a stub to store Spark StageInfo #1524

Closed
@amahussein

Description

@amahussein

Describe the bug

the StageModel references a StageInfo field to get the details of the stage.
The problem with that design that this causes a deep-levelpointer to data that is not needed by the core tools for now.

@DeveloperApi
class StageInfo(
    val stageId: Int,
    private val attemptId: Int,
    val name: String,
    val numTasks: Int,
    val rddInfos: Seq[RDDInfo],
    val parentIds: Seq[Int],
    val details: String,
    val taskMetrics: TaskMetrics = null,
    private[spark] val taskLocalityPreferences: Seq[Seq[TaskLocation]] = Seq.empty,
    private[spark] val shuffleDepId: Option[Int] = None,
    val resourceProfileId: Int,
    private[spark] var isPushBasedShuffleEnabled: Boolean = false,
    private[spark] var shuffleMergerCount: Int = 0) {
  /** When this stage was submitted from the DAGScheduler to a TaskScheduler. */
  var submissionTime: Option[Long] = None
  /** Time when the stage completed or when the stage was cancelled. */
  var completionTime: Option[Long] = None
  /** If the stage failed, the reason why. */
  var failureReason: Option[String] = None

  /**
   * Terminal values of accumulables updated during this stage, including all the user-defined
   * accumulators.
   */
  val accumulables = HashMap[Long, AccumulableInfo]()

Ideally, we need to have stub class that only copies what we need.
We did that before in #1206 but we had to roll it back for compatibility with various Spark implementations in #1260

Metadata

Metadata

Assignees

Labels

core_toolsScope the core module (scala)performanceperformance and scalability of tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions