I recently did some feature engineering on a few datasets with spark.
I wanted to make the datasets available in our Hadoop cluster, so i used our normal dataset upload pattern, but ran into these nasty little errors
1 2 3 |
ERROR SparkContext: Error initializing SparkContext. Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem |
and also
1 2 |
18/07/26 14:36:35 ERROR Utils: Uncaught exception in thread driver-revive-thread org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. |
The full exception stacks looks like this
So what do these error mean and why do they occur?
- It often has to do something, with not initializing the Spark session properly
- Usually means, there is a wrong value, for the master location
- Double check the master adress for the spark session, by default it should use port 7077 and NOT 6066
- Check if the version of the spark cluster is the same as your spark version in the jar / of the job you want to submit
How do I fix “Could not find CoarseGrained Scheduler” or “an only call getServletHandlers on a running MetricsSystem”
- Update master URL
- Update dependencies POM/Gradle/Jar, so you use the same version as the cluster
Now your error should be fixed. Have fun with your Spark Cluster!