Tag Archives: Deployment

Docker glossary


  • Container Image
    • All the dependencies, deployment and execution configuration and needed to create a container packed together. Usually, an image is derived from multiple base images that are layer stacked on top of each other, to form the containers filesystem. Images are immutable once created.
  • Dockerfile
    • Text file that defines how to build a Docker image. Like a batch script, defining which images to use, which programs to install and which files to copy to get the environment working as needed.
  • Build
    • The act of building an container image based on the data provided by the Dockerfile and the other files in the same folder as the image creation folder.
  • Container
    • One instantiation of a Docker image. It represents a single process/ application/ service, running on the images host. It contains the Docker image, an execution environment and a standard set of instructions. For scaling to millions of users, you just have to deploy more containers and balance the work across them, so easy !
  • Volumes
    • Since images are immutable, a containerized application cannot write to it’s own image! That is, why we need volumes. They are an extra layer managed by Docker, which emulated a filesystem on the docker host machine, to which the containerized application will write. The containerized application does not notice the difference and act’s like usual, when working with the volumes.
  • Tag
    • A label or identifier which can be applied to an images, so multiple different versions of the same image can be identified.
  • Multi-stage Build
    • Use a large base image for compiling and publishing the application and then use the publishing folder with a small runtime-only base image, to produce a much smaller final image.
  • Repository
    • A collection of Docker images
  • Registry
    • A service, to access a Repository
  • Multi-arch image
    • For multi-architectures (Windows, Linux) it is a features, that automatically requests the proper image, when running a Dockerfile.
  • Docker-Hub
    • Public registry to upload images and work with them. Build triggers, web hooks, integration to GitHub and Bitbucket
  • Azure Container Registry
    • A registry for Docker images and its components in Azure
  • Docker Trusted Registry
    • a registry server for Docker that can be hosted on your own private server for private images
  • Docker Community Edition (CE)
    • 12
  • Docker Enterprise Edition (EE)
    • Enterprise-scale version of Docker tools for Linux and Windows development
  • Compose
    • Command-line tool and YAML file format for defining and running multi-container applications. You can define a single application based on multiple images with one or more YAML files and then deploy all of the containers/images with just one command.
    • Imagine, first set of images setup Hadoop, the next images setup Kafka, the next Spark and the last sets up some Java web server
  • Cluster
    • A collection of Docker hosts exposed as if they were a single virtual Docker host, so basically put many versions of the same container behind the same IP, to handle huge amounts of users aka to handle scaling.
    • Docker clusters can be created with Kubernetes, Azure Service Fabric, Docker Swarm and Mesosphere DC/OS
  • Orchestrator
    • Tool to simplify management of cluster and Docker hosts. Orchestrators enable you to manage their images, containers and hosts through a CLI interface or a GUI. You can manage container networking, configurations, load balancing, service discovery, high availability, Docker host configuration and much more. An orchestrator is responsible for running, distributing, scaling and healing workloads across a collection of nodes. Typically orchestrating is provided by the same products that provide cluster infrastructure, like Kubernetes and Azure Fabric.

How to deploy tests with Maven on a Virtual Machine via Git

How to deploy tests with Maven on a Virtual machine.

In this tutorial you will learn how to deploy your Java project onto any VM and how to deploy and run it via MVN.

We will go over the following points.

1. Add project to github

We first need to have a github project

2.Add SSH keys on VM (optional)

Add a SSH key on your VM and then add it to your github account. Follow this tutorial if you are not sure how to do this : https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/

3. Clone Project to VM

Get the URL of your repo and clone it to your VM

4. Compile project

mvn clean test-compile

5. Run Tests

mvn test

This will execute all you compiled tests within the /target/test-classes folder.

6. Run a specific test function

mvn -Dtest=testClass#testMethod test

This command will run the method “testMethod” inside of the “testClass”

Recap – What have we learned?

Congratulations, you have learned how to elegantly deploy your Java project on a remote virtual machine using Git and Maven.

Deployment Cylcle with Spark and Hadoop with java

This article will show you one of many possible cycles to deploy your code as quickly and efficiently as possible. Also, we will talk a little about what Hadoop and Spark actually is and how we can use it to make awesome distributed computations!

What is Hadoop and Spark for?

You use your Spark cloud, to do very computationally expensive tasks in a distributed fashion.

Hadoop provided the data in a distributed fashion, making it available from multiple nodes and by that increasing the rate at which every node in the cluster network will get its data.

We will write our code in Java and define cluster computations using the open source Apache Spark library.

After defining the code, we will use Maven to create a fat jar from it, which will contain all the dependencies.

We will make the Jar available from multiple sources, so that multiple computation nodes from our spark cluster can download it at the same time, this is achieved by making the data available distributed through hadoop.

What does a deployment cycle with kafka and hadoop look like in Java?

A typical cycle could look like this :

  1. Write code in Java
  2. Compile code into a fat Jar
  3. Make jar available in Hadoop cloud
  4. Launch Spark Driver which can allocate a dynamic amount of nodes to take care of the computations defined within the jar.

1.Write code in Java

You will have to define a main function with a main class. This will be the code that the cluster runs first, so everything starts from this function.

2. Compile code into fat Jar

mvn clean compile assembly:single

3. Make jar available from Hadoop cloud

Go into your Hadoop web interface and browse the file system

3.1 Create a folder in the cloud and upload the jar

After uploading your jar into your Hadoop cloud, it will be available to any computer that can talk with the Hadoop cloud. It is now distributed available on all the Hadoop nodes and is ready for highly efficient and fast data exchange with any cluster, in our example we use a Spark cluster.

If your hadoop node is called hadoop_fs and port is 9000, your jar is available to any node under the following URL:


4. Launch distributed Spark Computation

To launch the driver, you need an instance of the spark-submit class. The most straightforward way to get it, is to just download the Spark library and unzip it.

wget http://apache.lauf-forum.at/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz

4.1 Launch Spark driver from command line.

Go to the directory where you have unzipped your Spark library, for me it would be


In the ./bin/spark-submit will have all the functionality we will require,

4.2 Gathering the Parameters

You need the following parameters, to launch your jar in the cluster

  • Spark Master URL
  • Hadoop Jar Url
  • Name of your main Class
  • Define –deploy-mode as Cluster to run the computation in cluster mode

4.3 Final step :Put the parameters together and launch the Jar in the cluster

./bin/spark-submit –class com.package.name.mainClass –master spark:// –deploy-mode cluster hdfs://hadop_fs:9000/jars/example.jar

This will tell the Spark cluster, where the Jar we want to run is. It will launch a user defined (or appropriate) amount of executors and finish the computation in a distributed fashion

Your Task should now show up in your Spark Webinterface.

What have you learned :

  • How to turn your java code into a fat jar
  • How to deploy your fat jar into the Hadoop cloud
  • How to run your code distributed in Spark, usinsg Hadoop as data source