Why do we need Spark accumulators
An accumulator is a shared variable across all the nodes and it is used to accumulate values of a type ( Long or Double).
It is necessary to use an accumulator, to implement a distributed counting variable which can be updated by multiple processes.
Nodes may not read the value of an accumulator, but the driver has full access to it.
Nodes can only accumulate values into the accumulator.
You will find the functionality for this in the accumulator Class of Spark. Keep in mind, that we are using the AccumulatorV2, older accumulators are deprecated for Spark version below 2.0
1 2 3 |
LongAccumulator accum = jsc.sc().longAccumulator(); sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x ->; accum.add(x)); accum.value(); // returns 10 |
Don’t forget to register your accumulator to the Spark Context if you create it separately.
What did we learn?
In this short tutorial, you learned what Spark Accumulators are for, what accumulators do and how to use them in Java.