Category Archives: Exceptions and solutions

Intelij plugin for with Microsoft Azure tutorial deploying web app

In this tutorial we will checkout how to get the Microsoft Azure Plugin and how to use it.

First of all, start your IDE and hit Shift two times in quick succession and enter “plugin” to get quickly to the plugin instal menu.

Then just type Azure and install the fisrt plugin suggested, which is developed by microsoft.

After having installed and having created an account on the Azure website, you can login to your account through intelij.

Select the tools tab in the top toolbar and login into azure, using interactive mode and just type in your credentials you just used for making your Azure account.

Prepearing the Ressource groups

I was following this great tutorial from Microsoft, but I and probably a lot of other people encountered an error, when trying to launch a web app right after having created a new account in Azure.

Before you can launch anything in Azure, you need Ressoure groups. Even though the tutorial from Microsoft does not state it explicitly, you should really create an Ressource group, before attempting this.

Here is how you create a resource group :

Login to your Azure account on the Microsoft website and head to “My Account”

Next select “Create a resource”

And then select “Web App”

Enter a name for the App and the resource Group, click Create New

After having created the group, we are finally ready to deploy our app with Intelij!

Start a new project and select a web app in Maven and make sure you are creating the project from archtype!

Then just go to the root folder of your project and right click it in Intelij. You should now see the Azure options, which let you deploy your web app to the cloud!

If you did not login to Azure before, do it now.

Then you have to option to use an existing Web App or a new one. We want a new one, but we will use an existing resource group! For some reason, creating a resource group with intelij plugin, seems to result in exceptions. The only way to avoid those so far, is to create the group manualy in azure and then use that one for further deployment

After hitting run and waiting a few seconds, your console should update with an URL to your freshly deployed web app.

Thanks for reading and have fun in the cloud!

Java Spark Tips, Tricks and Basics 6 – How to broadcast a variable to Spark cluster? Why do we need to broadcast variables?

Why do we need Spark broadcasters?

Spark is all about cluster computing. In a cluster of nodes, each node of course has it’s personal private memory.

If we want all the nodes in the cluster to work towards a common goal,  having shared variables just seems necessary.

Let’s say we want to sum up all the rows in a CSV table with 1 million lines. It makes just sense, to let 1 node work with 1/2 million and the other work with the other 1/2 million rows. Both calculate their results and then the driver program will combine their results.

Broadcasting allows us to create a read-only cached copy of a variable on every node in our cluster. The distribution of those variables is handled by efficient broadcast algorithms implemented by Spark under the hood. This will also take the burden of thinking about serialization and deserialization since good old Spark takes care of that!

This great functionality for broadcasting is provided by the SparkContext class.  Alternatively, one can also consider to use the broadcast class right away, do your work

How to broadcast a variable in Spark Java

What did we learn?

In this short tutorial, you learned what Spark Broadcast is for,  what Broadcast does and how to use it in Java.

Spark Error ” Exception in thread “main” java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running” MetricsSystem”

I recently did some feature engineering on a few datasets with spark.

I wanted to make the datasets available in our Hadoop cluster, so i used our normal dataset upload pattern, but ran into these  nasty little errors

 

 

and also

 

 

 

The full exception stacks looks like this

 

 

So what do these error mean and why do they occur?

  • It often has to do something, with not initializing the Spark session properly
  • Usually means, there is a wrong value, for the master location
  • Double check the master adress for the spark session, by default it should use port 7077 and NOT  6066
  • Check if the version of the spark cluster  is the same as your spark version in the jar / of the job you want to submit

How do I fix “Could not find CoarseGrained Scheduler” or “an only call  getServletHandlers on a running MetricsSystem”

  • Update master URL
  • Update dependencies POM/Gradle/Jar, so you use the same version as the cluster

Now your error should be fixed. Have fun with your Spark Cluster!