Category Archives: Deployment

Deploying stuff in the cloud or even deploy clouds themselves!

Spark Kubernetes error Can only call getServletHandlers on a running MetricsSystem – How to fix it

Did you encounter an error like

java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

The fix is pretty straight forward.
The error is caused by running a different Spark version in the cluster then the one used for Spark submit.

Double-check the cluster Spark version via for the cluster UI or checking the Spark Master pod.
Then check the version of your spark-submit, it is usually shown during submitting a job.

This is the full stack trace :

java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.(SparkContext.scala:516)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at phabmacsJobs.KafkaJsonWriter$.main(KafkaJsonWriter.scala:32)
at phabmacsJobs.KafkaJsonWriter.main(KafkaJsonWriter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.(SparkContext.scala:516)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at phabmacsJobs.KafkaJsonWriter$.main(KafkaJsonWriter.scala:32)
at phabmacsJobs.KafkaJsonWriter.main(KafkaJsonWriter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Docker glossary

Terminology

  • Container Image
    • All the dependencies, deployment and execution configuration and needed to create a container packed together. Usually, an image is derived from multiple base images that are layer stacked on top of each other, to form the containers filesystem. Images are immutable once created.
  • Dockerfile
    • Text file that defines how to build a Docker image. Like a batch script, defining which images to use, which programs to install and which files to copy to get the environment working as needed.
  • Build
    • The act of building an container image based on the data provided by the Dockerfile and the other files in the same folder as the image creation folder.
  • Container
    • One instantiation of a Docker image. It represents a single process/ application/ service, running on the images host. It contains the Docker image, an execution environment and a standard set of instructions. For scaling to millions of users, you just have to deploy more containers and balance the work across them, so easy !
  • Volumes
    • Since images are immutable, a containerized application cannot write to it’s own image! That is, why we need volumes. They are an extra layer managed by Docker, which emulated a filesystem on the docker host machine, to which the containerized application will write. The containerized application does not notice the difference and act’s like usual, when working with the volumes.
  • Tag
    • A label or identifier which can be applied to an images, so multiple different versions of the same image can be identified.
  • Multi-stage Build
    • Use a large base image for compiling and publishing the application and then use the publishing folder with a small runtime-only base image, to produce a much smaller final image.
  • Repository
    • A collection of Docker images
  • Registry
    • A service, to access a Repository
  • Multi-arch image
    • For multi-architectures (Windows, Linux) it is a features, that automatically requests the proper image, when running a Dockerfile.
  • Docker-Hub
    • Public registry to upload images and work with them. Build triggers, web hooks, integration to GitHub and Bitbucket
  • Azure Container Registry
    • A registry for Docker images and its components in Azure
  • Docker Trusted Registry
    • a registry server for Docker that can be hosted on your own private server for private images
  • Docker Community Edition (CE)
    • 12
  • Docker Enterprise Edition (EE)
    • Enterprise-scale version of Docker tools for Linux and Windows development
  • Compose
    • Command-line tool and YAML file format for defining and running multi-container applications. You can define a single application based on multiple images with one or more YAML files and then deploy all of the containers/images with just one command.
    • Imagine, first set of images setup Hadoop, the next images setup Kafka, the next Spark and the last sets up some Java web server
  • Cluster
    • A collection of Docker hosts exposed as if they were a single virtual Docker host, so basically put many versions of the same container behind the same IP, to handle huge amounts of users aka to handle scaling.
    • Docker clusters can be created with Kubernetes, Azure Service Fabric, Docker Swarm and Mesosphere DC/OS
  • Orchestrator
    • Tool to simplify management of cluster and Docker hosts. Orchestrators enable you to manage their images, containers and hosts through a CLI interface or a GUI. You can manage container networking, configurations, load balancing, service discovery, high availability, Docker host configuration and much more. An orchestrator is responsible for running, distributing, scaling and healing workloads across a collection of nodes. Typically orchestrating is provided by the same products that provide cluster infrastructure, like Kubernetes and Azure Fabric.

Intelij plugin for with Microsoft Azure tutorial deploying web app

In this tutorial we will checkout how to get the Microsoft Azure Plugin and how to use it.

First of all, start your IDE and hit Shift two times in quick succession and enter “plugin” to get quickly to the plugin instal menu.

Then just type Azure and install the fisrt plugin suggested, which is developed by microsoft.

After having installed and having created an account on the Azure website, you can login to your account through intelij.

Select the tools tab in the top toolbar and login into azure, using interactive mode and just type in your credentials you just used for making your Azure account.

Prepearing the Ressource groups

I was following this great tutorial from Microsoft, but I and probably a lot of other people encountered an error, when trying to launch a web app right after having created a new account in Azure.

Before you can launch anything in Azure, you need Ressoure groups. Even though the tutorial from Microsoft does not state it explicitly, you should really create an Ressource group, before attempting this.

Here is how you create a resource group :

Login to your Azure account on the Microsoft website and head to “My Account”

Next select “Create a resource”

And then select “Web App”

Enter a name for the App and the resource Group, click Create New

After having created the group, we are finally ready to deploy our app with Intelij!

Start a new project and select a web app in Maven and make sure you are creating the project from archtype!

Then just go to the root folder of your project and right click it in Intelij. You should now see the Azure options, which let you deploy your web app to the cloud!

If you did not login to Azure before, do it now.

Then you have to option to use an existing Web App or a new one. We want a new one, but we will use an existing resource group! For some reason, creating a resource group with intelij plugin, seems to result in exceptions. The only way to avoid those so far, is to create the group manualy in azure and then use that one for further deployment

After hitting run and waiting a few seconds, your console should update with an URL to your freshly deployed web app.

Thanks for reading and have fun in the cloud!

Java Spark Tips, Tricks and Basics 7 – How to accumulate a variable in Spark cluster? Why do we need to accumulate variables?

Why do we need Spark accumulators

An accumulator is a shared variable across all the nodes and it is used to accumulate values of a type ( Long or Double).

It is necessary to use an accumulator, to implement a distributed counting variable which can be updated by multiple processes.

Nodes may not read the value of an accumulator, but the driver has full access to it.

Nodes can only accumulate values into the accumulator.

You will find the functionality for this in the accumulator Class of Spark. Keep in mind, that we are using the AccumulatorV2, older accumulators are deprecated for Spark version below 2.0

 

Don’t forget to register your accumulator to the Spark Context if you create it separately.

 

What did we learn?

In this short tutorial, you learned what Spark Accumulators are for,  what accumulators do  and how to use them in Java.

How to deploy tests with Maven on a Virtual Machine via Git

How to deploy tests with Maven on a Virtual machine.

In this tutorial you will learn how to deploy your Java project onto any VM and how to deploy and run it via MVN.

We will go over the following points.

1. Add project to github

We first need to have a github project

2.Add SSH keys on VM (optional)

Add a SSH key on your VM and then add it to your github account. Follow this tutorial if you are not sure how to do this : https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

3. Clone Project to VM

Get the URL of your repo and clone it to your VM

4. Compile project

mvn clean test-compile

5. Run Tests

mvn test

This will execute all you compiled tests within the /target/test-classes folder.

6. Run a specific test function

mvn -Dtest=testClass#testMethod test

This command will run the method “testMethod” inside of the “testClass”

Recap – What have we learned?

Congratulations, you have learned how to elegantly deploy your Java project on a remote virtual machine using Git and Maven.