Tag Archives: devops

Spark Kubernetes error Can only call getServletHandlers on a running MetricsSystem – How to fix it

Did you encounter an error like

java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

The fix is pretty straight forward.
The error is caused by running a different Spark version in the cluster then the one used for Spark submit.

Double-check the cluster Spark version via for the cluster UI or checking the Spark Master pod.
Then check the version of your spark-submit, it is usually shown during submitting a job.

This is the full stack trace :

java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.(SparkContext.scala:516)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at phabmacsJobs.KafkaJsonWriter$.main(KafkaJsonWriter.scala:32)
at phabmacsJobs.KafkaJsonWriter.main(KafkaJsonWriter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.(SparkContext.scala:516)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at phabmacsJobs.KafkaJsonWriter$.main(KafkaJsonWriter.scala:32)
at phabmacsJobs.KafkaJsonWriter.main(KafkaJsonWriter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Kubernetes Helm Install Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

If you ever run into this error message when installing a chart with Helm into Kubernetes, try closing your Kubernetes connection and open it again!
In Azure AKS it would be
az aks browse --resource-group bachelor-ckl --name aks-ckl

This resolves the Error Kubernetes Helm Install Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request.

Happy Devops -ing!

Creating a crontab with just 1 command

In this tutorial you will learn, how to add a cronjob with just one command to your crontab file.
This makes deploying virtual machines and their automatic configuration of Cronjobs easy!

That is all you need! This command will be appended to your crontab files.

 

It is pretty handy for setting up cronjobs on fresh Ubuntu VMs without requiring root.

Happy Deploying!

Creating Cromjobs

What is a cronjob or a crontab file?

Crontab (cron table) is a text file that specifies the schedule of cron jobs. There are two types of crontab files. The system-wide crontab files and individual user crontab files.

Users crontab files are stored by the user’s name and their location varies by operating systems. In Red Hat based system such as CentOS, crontab files are stored in the /var/spool/cron directory while on Debian and Ubuntu files are stored in the /var/spool/cron/crontabs directory.

Although you can edit the user crontab files manually, it is recommended to use the crontab command.

/etc/crontab and the files inside the /etc/cron.d directory are system-wide crontab files which can be edited only by the system administrators.
In most Linux distributions you can also put scripts inside the /etc/cron.{hourly,daily,weekly,monthly} directories and the scripts will be executed every hour/day/week/month

Linux Crontab Command

The crontab command allows you to install or open a crontab file for editing. You can use the crontab command to view, add, remove or modify cron jobs using the following options:

  • crontab -e – Edit crontab file, or create one if it doesn’t already exist.
  • crontab -l – Display crontab file contents.
  • crontab -r – Remove your current crontab file.
  • crontab -i – Remove your current crontab file with a prompt before removal.
  • crontab -u – Edit other use crontab file. Requires system administrator privileges.

Docker glossary

Terminology

  • Container Image
    • All the dependencies, deployment and execution configuration and needed to create a container packed together. Usually, an image is derived from multiple base images that are layer stacked on top of each other, to form the containers filesystem. Images are immutable once created.
  • Dockerfile
    • Text file that defines how to build a Docker image. Like a batch script, defining which images to use, which programs to install and which files to copy to get the environment working as needed.
  • Build
    • The act of building an container image based on the data provided by the Dockerfile and the other files in the same folder as the image creation folder.
  • Container
    • One instantiation of a Docker image. It represents a single process/ application/ service, running on the images host. It contains the Docker image, an execution environment and a standard set of instructions. For scaling to millions of users, you just have to deploy more containers and balance the work across them, so easy !
  • Volumes
    • Since images are immutable, a containerized application cannot write to it’s own image! That is, why we need volumes. They are an extra layer managed by Docker, which emulated a filesystem on the docker host machine, to which the containerized application will write. The containerized application does not notice the difference and act’s like usual, when working with the volumes.
  • Tag
    • A label or identifier which can be applied to an images, so multiple different versions of the same image can be identified.
  • Multi-stage Build
    • Use a large base image for compiling and publishing the application and then use the publishing folder with a small runtime-only base image, to produce a much smaller final image.
  • Repository
    • A collection of Docker images
  • Registry
    • A service, to access a Repository
  • Multi-arch image
    • For multi-architectures (Windows, Linux) it is a features, that automatically requests the proper image, when running a Dockerfile.
  • Docker-Hub
    • Public registry to upload images and work with them. Build triggers, web hooks, integration to GitHub and Bitbucket
  • Azure Container Registry
    • A registry for Docker images and its components in Azure
  • Docker Trusted Registry
    • a registry server for Docker that can be hosted on your own private server for private images
  • Docker Community Edition (CE)
    • 12
  • Docker Enterprise Edition (EE)
    • Enterprise-scale version of Docker tools for Linux and Windows development
  • Compose
    • Command-line tool and YAML file format for defining and running multi-container applications. You can define a single application based on multiple images with one or more YAML files and then deploy all of the containers/images with just one command.
    • Imagine, first set of images setup Hadoop, the next images setup Kafka, the next Spark and the last sets up some Java web server
  • Cluster
    • A collection of Docker hosts exposed as if they were a single virtual Docker host, so basically put many versions of the same container behind the same IP, to handle huge amounts of users aka to handle scaling.
    • Docker clusters can be created with Kubernetes, Azure Service Fabric, Docker Swarm and Mesosphere DC/OS
  • Orchestrator
    • Tool to simplify management of cluster and Docker hosts. Orchestrators enable you to manage their images, containers and hosts through a CLI interface or a GUI. You can manage container networking, configurations, load balancing, service discovery, high availability, Docker host configuration and much more. An orchestrator is responsible for running, distributing, scaling and healing workloads across a collection of nodes. Typically orchestrating is provided by the same products that provide cluster infrastructure, like Kubernetes and Azure Fabric.

Azure Security – Security methods overview

There are many ways to make your cloud system more secure, here is a little overview of the most common and useful techniques to achieve safe cloud infrastrucutre

Account Shared Access Signature

The account -SAS is a Signature, that enables the client to access resources in one or more of the storage services. Everything you can do with service SAS you can do with account SAS as well. So basically the account SAS is used for delegating access to a group of services

Service Shared Access Signature

The Service SAS is a Signature which is used to delegate access to exactly one resource.

Stored Access Policy

A stored acess policy gives you more fine tunes control over service SAS on the server side. The stored acess policy (SAP) can be used to group shared access signatures and to provide additional restrictions for signatures that are bound by that policy. You can use SAP on Blob containesr, File Shares, Qoues, and Tables.

Role Based Access controll (RBAC)

RBAC lets you distribute resource access much more fine-grained than with the other methods.

Things I wish I knew, before working with Azure- Everything you should know about, before starting with Microsoft Azure!

  • What are resource groups in Azure?
    • What is a resource?
      • Any manageable item that you can rent through Azure is considered a resource. For example, virtual machines, storage accounts, web apps, databases functions and more, basically anything you create and manage in Azure
    • What is a resource provider?
      • Resource providers are the services, that supplies Azure with resources on demand. For example, Microsoft.Compute provides virtual machines. Microsoft.Storage is providing storage as the name implies. The provider gives access to operations on the resources he is providing.
    • What is a resource manager template?
      • The resource manager template defines which resources to deploy to a resource group. With templates, you can define how resources will be made available consistently and also how and which resources to release, when the system is in a critical predefined state.
    • What is a resource group?
      • Resource Groups describe a collection of all building blocks you have defined for your app. If you want to share data between apps or functions, it makes often sense to put them in the same groups, as it also makes exchanging data between them easier.
    • What does deploying a web app mean in azure context?
      • When we deploy a web app in Azure, all we do is just tell Microsoft to rent out a few computer parts for us to run our server! We can define our web app locally and then just upload it to the cloud servers, which will serve our content worldwide!
  • What are the Azure functions
    • Serverless functions in Azure can be defined very simply and connected to any app with minimal effort! The code for the function is stored on azures servers and only invoked when it is triggered by one of the many trigger mechanism. They consist of a trigger, input bindings and output binding which we will explain in detail later on
  • What are Azure Logic apps
    • Logic apps enable you to automate and orchestrate tasks. They are one of the main tools to automate processes and save you precious time! Logic apps even let you combine and concatenate multiple different apps into one! Connect everything with everyone is the motto of this set of features.
  • What is a storage account and why do I need one in Azure?
    • A storage account is a reference to all the data object stored for your account like blobs, files, ques, tables, disks and so on.
  • Redis Cache
    • Instead of renting the normal data storage or the distributed Hadoop storage, you can also rent super fast Redis Cache, which is basically just RAM and highly cachable data storage. Depending on your use case, this can be very valuable for time and efficiency critical operations
  • Power Shell / Bash Shell
    • Microsoft provides a great CLI interface to manage your cloud infrastructure
  • What are the containers?
    • A container is basically a virtualized software. Instead of having to care about the hardware and the operating system, you just ask for a container and in that one, your software project will run. The great thing about containers is, that they are hardware and OS independent, so you can just share your app container with someone and they can run your app with any issues, saving a huge amount of time when deploying software! Using such a container-based design yields more efficient architectures. Containers also let your team work faster, deploy more efficiently and operate at a much larger scale. Using a container also mean, you do not have to set up a whole VM, it is just everything you need to contain the app! This means containers are much more lightweight than VM’s This basically means, your software is decoupled from the hardware and OS, which leaves many developers with much less headache! It also makes for a clean split between infrastructure management and software logic management
  • What are Azure function triggers?
    • Since functions in Azure are serverless, we need to define a trigger, which tells Azure when to call our function. There are many possible triggers we could use, the most common ones get triggered by any changes to the Cosmos DB, the blob storage, the queue storage, and the timer.
  • What are Azure function bindings?
    • Azure function bindings basically define the input and output arguments of any function in Azure.
  • What does serverless mean? Serverless function?
    • In the context of Azure, the are serverless functions and serverless logic apps. But they still run on a server, so how are they related to serverless? The real meaning behind serverless is, that developers do not worry about the servers, it all happens automagically in the backend implemented by Microsofts engineers
  • BONUS : What is the difference between a VM and a container?
    • You can imagien the VM as virtualizing the hardware and a container is virtualizing the software

Things I wish I knew about Azure functions , before working with Azure

Every function in Azure consists of a trigger, input and output bindings and the code defining the function of course!

What are Triggers?

Triggers are mechanisms that trigger the execution of your function. You can setup triggers for a HTTP request, a database update or almost anything.

What are bindings?

The bindings define which resources our function will have access to. It will be provided as a parameter to the function

How to configure bindings and triggers?

Every function is accompanied with a function.json, whcih defines the bindings, the directions and triggers. For compiled languages, so any non scripting language, we do not have to create the function.json file ourselves, since it can be automatically generated from the function code. But for scirpting languages, we must define the function.json ourselves.

What are Durable Functions?

Durable Functions extends Azures classical functions with functions that can have a state AND are still in a serverless enviroment! Durable Functions are also nessecary, if you want to create an Orchestrator Function. The Durable Functions made up of different classical Azure Functions.

What are some Durable Function patterns?

Often, one common pattern for the Durable function is that you chain together a bunch of normal functions and their output it piped to the next function, this together. There are also Fan-out/fan-in patterns, which runs a bunch of functions in parallel and waits for all of then to finish, to return the final result. Then there is Async HTTP Api calls and liek the name implies, it enables us to make API calls that are not synchronous. Also, there is one pattern to program a human in the loob, called human interaction . You can check for more patterns the offical docs here

Intelij plugin for with Microsoft Azure tutorial deploying web app

In this tutorial we will checkout how to get the Microsoft Azure Plugin and how to use it.

First of all, start your IDE and hit Shift two times in quick succession and enter “plugin” to get quickly to the plugin instal menu.

Then just type Azure and install the fisrt plugin suggested, which is developed by microsoft.

After having installed and having created an account on the Azure website, you can login to your account through intelij.

Select the tools tab in the top toolbar and login into azure, using interactive mode and just type in your credentials you just used for making your Azure account.

Prepearing the Ressource groups

I was following this great tutorial from Microsoft, but I and probably a lot of other people encountered an error, when trying to launch a web app right after having created a new account in Azure.

Before you can launch anything in Azure, you need Ressoure groups. Even though the tutorial from Microsoft does not state it explicitly, you should really create an Ressource group, before attempting this.

Here is how you create a resource group :

Login to your Azure account on the Microsoft website and head to “My Account”

Next select “Create a resource”

And then select “Web App”

Enter a name for the App and the resource Group, click Create New

After having created the group, we are finally ready to deploy our app with Intelij!

Start a new project and select a web app in Maven and make sure you are creating the project from archtype!

Then just go to the root folder of your project and right click it in Intelij. You should now see the Azure options, which let you deploy your web app to the cloud!

If you did not login to Azure before, do it now.

Then you have to option to use an existing Web App or a new one. We want a new one, but we will use an existing resource group! For some reason, creating a resource group with intelij plugin, seems to result in exceptions. The only way to avoid those so far, is to create the group manualy in azure and then use that one for further deployment

After hitting run and waiting a few seconds, your console should update with an URL to your freshly deployed web app.

Thanks for reading and have fun in the cloud!

Deployment Cylcle with Spark and Hadoop with java

This article will show you one of many possible cycles to deploy your code as quickly and efficiently as possible. Also, we will talk a little about what Hadoop and Spark actually is and how we can use it to make awesome distributed computations!

What is Hadoop and Spark for?

You use your Spark cloud, to do very computationally expensive tasks in a distributed fashion.

Hadoop provided the data in a distributed fashion, making it available from multiple nodes and by that increasing the rate at which every node in the cluster network will get its data.

We will write our code in Java and define cluster computations using the open source Apache Spark library.

After defining the code, we will use Maven to create a fat jar from it, which will contain all the dependencies.

We will make the Jar available from multiple sources, so that multiple computation nodes from our spark cluster can download it at the same time, this is achieved by making the data available distributed through hadoop.

What does a deployment cycle with kafka and hadoop look like in Java?

A typical cycle could look like this :

  1. Write code in Java
  2. Compile code into a fat Jar
  3. Make jar available in Hadoop cloud
  4. Launch Spark Driver which can allocate a dynamic amount of nodes to take care of the computations defined within the jar.

1.Write code in Java

You will have to define a main function with a main class. This will be the code that the cluster runs first, so everything starts from this function.

2. Compile code into fat Jar

mvn clean compile assembly:single

3. Make jar available from Hadoop cloud

Go into your Hadoop web interface and browse the file system

3.1 Create a folder in the cloud and upload the jar

After uploading your jar into your Hadoop cloud, it will be available to any computer that can talk with the Hadoop cloud. It is now distributed available on all the Hadoop nodes and is ready for highly efficient and fast data exchange with any cluster, in our example we use a Spark cluster.

If your hadoop node is called hadoop_fs and port is 9000, your jar is available to any node under the following URL:

hdfs://hadop_fs:9000/jars/example.jar

4. Launch distributed Spark Computation

To launch the driver, you need an instance of the spark-submit class. The most straightforward way to get it, is to just download the Spark library and unzip it.

wget http://apache.lauf-forum.at/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

4.1 Launch Spark driver from command line.

Go to the directory where you have unzipped your Spark library, for me it would be

loan@Y510P:~/Libraries/Apache/Spark/spark-2.3.0-bin-hadoop2.7$

In the ./bin/spark-submit will have all the functionality we will require,

4.2 Gathering the Parameters

You need the following parameters, to launch your jar in the cluster

  • Spark Master URL
  • Hadoop Jar Url
  • Name of your main Class
  • Define –deploy-mode as Cluster to run the computation in cluster mode

4.3 Final step :Put the parameters together and launch the Jar in the cluster

./bin/spark-submit –class com.package.name.mainClass –master spark://10.0.1.10:6066 –deploy-mode cluster hdfs://hadop_fs:9000/jars/example.jar

This will tell the Spark cluster, where the Jar we want to run is. It will launch a user defined (or appropriate) amount of executors and finish the computation in a distributed fashion

Your Task should now show up in your Spark Webinterface.

What have you learned :

  • How to turn your java code into a fat jar
  • How to deploy your fat jar into the Hadoop cloud
  • How to run your code distributed in Spark, usinsg Hadoop as data source