Tech - Successful Execution

Piyush Tiwari

Hey Guys, I found this interesting thing sometime back and was wondering how can somebody be not using this.

So we must be encountered the following situations while Unit Testing our code:

Hey there are some more scenarios against which we have to test the method !!!
Hey I did not take this scenario into account while unit testing and this results in a production issue.
Hey I had 100% code coverage, how can a bug will come?

Myth here is : 100% Code Coverage does not mean 100% Test Scenarios are covered.

The answer to all the above or to avoid such conditions is to use Mutation Testing. A cool testing technique to validate and enhance your test suite.
There are several Mutation Testing frameworks present right now however the most simple and active framework named PIT is a good choice.

Following are some other framework for Mutation Testing:

Jester Source-based mutation testing tool for Java
Judy Mutation testing tool for Java
Jumble Bytecode-based mutation testing tool for Java

Links to go through:

Piyush Tiwari

This time I would like to share something big and here I come with the "BigData". Though I won't talk much on BigData, I will explain the Single node setup for hadoop and a map reduce java program running on it.

Following are the pre-requisites of running a Single node hadoop:

Java 6 or higher.
Apache Hadoop 2.5.1 (Download Link)
SSH Server.
Password less SSH login for localhost.
Make/Edit ~/.bashrc file.

First 2 steps are pretty simple and can be accomplished with ease. At this point I assume that you have downloaded/installed java and hadoop.

Note: Hadoop installation is merely a copy and paste task, so once after downloading the tar file for hadoop, unpack the same and put it to the following directory: /usr/local/hadoop

Command: sudo cp –rf ~/Downloads/hadoop-2.5.1/* /usr/local/hadoop

Add your user as owner to the hadoop directory i.e. /usr/local/hadoop as below:

Command: sudo chown <username> /usr/local/hadoop

SSH Server:
Install the SSH Server using the below command:

Command: sudo apt-get install openssh-server openssh-client

Password less SSH login for localhost:
To get rid of input the password login to localhost, we have to execute the following commands:

1. Delete the SSH directory:

rm -rf ~/.ssh

2. Generate the SSH key:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

3. Register the generated key as authorized:

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4. Login to localhost:

ssh localhost (Should not ask for password)

Make/Edit ~/.bashrc file:

This is one of the important part of this setup, so insert the below entries into the .bashrc file if you have .bashrc file otherwise create .bashrc file by yourself and insert the entries:

JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

PATH=$PATH:$JAVA_HOME/bin

JRE_HOME=/usr/lib/jvm/java-7-openjdk-amd64

PATH=$PATH:$JRE_HOME/bin

HADOOP_INSTALL=/usr/local/hadoop/

PATH=$PATH:$HADOOP_INSTALL/bin

PATH=$PATH:/sbin

export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

export JAVA_HOME

export JRE_HOME

export PATH

Above entries are to set the important environment variables. Once after you are done with .bashrc file creation/modification close your terminal window and start a new one as changes may not reflect in the same terminal window.

Note: As we can see in the above entries there is an environment variable HADOOP_INSTALL, don’t confuse with it, it is same as the HADOOP_HOME however that is deprecated now so use HADOOP_INSTALL in place of HADOOP_HOME.

Okay !!! Now we are done with the pre-requisites so let’s make some configuration changes necessary for the hadoop single node setup to work:

1. Move to the directory which has all the configuration files in the hadoop installation directory i.e. /usr/local/hadoop/etc/hadoop

2. Set the JAVA_HOME variable inside file hadoop-env.sh:

export JAVA_HOME=${JAVA_HOME}

OR

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

3. Add following lines in the core-site.xml:

<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
</property>

As we can see here the value for hadoop.tmp.dir is /usr/local/hadoop/tmp, we have to create tmp directory before running hadoop scripts.

4. Add following lines in the hdfs-site.xml:

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>/var/hadoop/namenode</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>/var/hadoop/datanode</value>
</property>

Here we are defining the name node and data node directories; these should also be created before we run/start the hadoop. Also we have to add our username as the owner of these directories. Below are the commands for the same:

sudo mkdir /var/hadoop/namenode

sudo mkdir /var/hadoop/datanode

sudo chown <username> /var/hadoop

5. There will be a mapred-site.xml.template file inside /usr/local/hadoop/etc/hadoop so we have to move the file as mapred-site.xml:

sudo mv etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml

Then add below entries in the same:

<property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
</property>

Now we are done with the configurations files.

HDFS Steps:

Now we’ve to format the name node : (we are at the /usr/local/hadoop/)

hadoop namenode –format

To check if the name node is formatting is successful check for the below message:

/var/hadoop/namenode had been successfully formatted. (In our case)

After this start all the necessary services using below command:

sbin/start-all.sh

<username>@mysystem-desktop /usr/local/hadoop $ sbin/start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

Starting namenodes on [localhost]

localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-<username>-namenode-xxx-desktop.out

localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-<username>-datanode-xxx-desktop.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-<username>-secondarynamenode-xxx-desktop.out

starting yarn daemons

starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-<username>-resourcemanager-xxx-desktop.out

localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-<username>-nodemanager-xxx-desktop.out

You can check about the running processes using jps command as below:

<username>@mysystem-desktop /usr/local/hadoop $ jps

3316 SecondaryNameNode

3506 ResourceManager

3160 DataNode

3932 Jps

3607 NodeManager

3060 NameNode

This will tell us about the running services and components. All the components listed above should be running if not there is some issue.

Once all the processes are started you can view the name node setup with the help of below URL: NameNode - http://localhost:50070/. Here you can get all the information about the directories, datanode etc.

Setup for Map-Reduce:

To run the Map-Reduce example we first have to make the input directory where we will store our input files on which map-reduce program will run.

1. Make the HDFS directories required to execute MapReduce jobs:

$ bin/hdfs dfs -mkdir -p /user/<username>/input

2. Copy the input files into the distributed filesystem: (This is not exactly the input file but the configuration files)

$ bin/hdfs dfs -put etc/hadoop input

3. Map Reduce Dictionary example: Map-Reduce Example, we are using the example from the link mentioned before, where the whole procedure is divided in below steps:

1. Go to the below links and download the dictionary files for Italian, French ans Spanish:

http://www.ilovelanguages.com/IDP/files/Italian.txt

http://www.ilovelanguages.com/IDP/files/French.txt

http://www.ilovelanguages.com/IDP/files/Spanish.txt

2. Merge all the three files into one file with the below:

cat French.txt >> fulldictionary.txt
cat Italian.txt >> fulldictionary.txt
cat Spanish.txt >> fulldictionary.txt

3. Copy this file i.e. fulldictionary.txt to hadoop file system input directory with the below command: (assuming you are in the path: /usr/local/hadoop)

bin/hdfs dfs -copyFromLocal ~/<your_path>/fulldictionary.txt input

4. In this example Map-Reduce Example there is one Dictionary.java file having map-reduce logic in it, which you have to compile, make jar and run the jar using below commands:

Compilation:

javac -classpath $HADOOP_INSTALL/share/hadoop/common/hadoop-common-2.5.1.jar:$HADOOP_INSTALL/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.5.1.jar:$HADOOP_INSTALL/share/hadoop/common/lib/commons-cli-1.2.jar Dictionary.java

JAR Creation:

jar -cvf dc.jar Dict*.class

JAR Execution:

hadoop jar dc.jar Dictionary input output

Note: Before compiling the Dictionary.java file make the below changes in it:

FileOutputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

Instead of hard-coding the path for Input and Output take it from command line arguments as shown above.

Once you are done with JAR Execution, check the output directory in hdfs for result. Please note that, do not make output directory by yourself, output directory will be created by the map-reduce job itself else you will get an error saying output directory already exists.

Viola !!! The End. Hope this tutorial will be helpful for the beginners.

Piyush Tiwari

Note: This article is posted while keeping in mind that the reader has already gone through the Proto file syntax i.e. what variable are used data types etc, last but not the least what protobuf is? If answer to this question is NO, Please visit: https://developers.google.com/protocol-buffers/

Getting Goggle Protobuf working is quite a heck so to make it easy I thought it would be worth sharing a quick read:

Steps to configure a sample maven java project in eclipse using protobuf in LINUX:

1. Install the protobuf compiler as below:

sudo apt-get install protobuf-compiler

2. Maven dependency that needs to be inserted into the pom.xml

<dependency>

<groupId>com.google.protobuf</groupId>

<artifactId>protobuf-java</artifactId>

<version>2.6.0</version>

</dependency>

3. First of all we have to create a .proto file and then compile it using the protobuf compiler which results in .java class:

Syntax:

protoc --java_out=<path where you want your respective java files to be created> <the actual .proto file's path>

Example:

protoc --java_out=src/com/piyush/demo src/com/piyush/demo/addressbook.proto

protoc --java_out=src/ src/com/piyush/demo/my_object.proto

4. Earlier step can be automated using "maven-antrun-plugin", however get this running is quite a heck so please refer the below thread for this:

Links:

We can use maven-antrun-plugin in following ways:

For Single proto file compilation:

<echo>Generate</echo>

</exec>

For Multiple proto files compilation:

<tasks>

<echo>Generate</echo>

</fileset>

</path>

</exec>

</tasks>

Tips & Tricks:

1. Message name should be in camel case.

2. Beware that the name of the message and java_outer_classname should not be same in the proto file. For Example below proto file won't compile:

option java_package = "com.xyz.zrtb.simulator.protos";

option java_outer_classname = "A";

message a {

optional string id = 1;

optional string name = 2;

repeated string cat = 4;

optional string domain = 3;

}

Below will compile fine:

option java_package = "com.xyz.zrtb.simulator.protos";

option java_outer_classname = "OuterA";

message A {

optional string id = 1;

optional string name = 2;

repeated string cat = 4;

optional string domain = 3;

}

3. Importing proto file in one another is quite a heck and so it is good if you get it right at the very first place:

Import relies on the on the parameter '--proto_path' so If you've a package com.abc.pqr.model and defined '--proto_path' as below:

--proto_path=${project.basedir}/src/main/java/com/abc/pqr/model

THEN you can import just by writing xxx.proto However If you've defined '--proto_path' as below:

--proto_path=${project.basedir}/

THEN you can import by writing src/main/java/com/abc/pqr/model/xxx.proto and many such combinations.

4. The maven-antrun-plugin is problematic and can be a blocker, so below is the ready to use pom.xml configuration:

<build>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-compiler-plugin</artifactId>

</configuration>

</plugin>

<!--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build

itself. -->

<groupId>org.eclipse.m2e</groupId>

<artifactId>lifecycle-mapping</artifactId>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-antrun-plugin</artifactId>

<goals>

</goals>

</pluginExecutionFilter>

</action>

</pluginExecution>

</pluginExecutions>

</lifecycleMappingMetadata>

</configuration>

</plugin>

</plugins>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-antrun-plugin</artifactId>

<id>generate-sources</id>

<phase>generate-sources</phase>

<echo>Generate</echo>

</fileset>

</path>

</exec>

</target>

</configuration>

<goals>

</goals>

</execution>

</executions>

</plugin>

</plugins>

</build>

Links to go through:

https://code.google.com/p/protoclipse/
http://www.siafoo.net/user/stou/blog/2010/01/29/Protocol-Buffers-and-Eclipse
http://www.masterzen.fr/2011/12/25/protobuf-maven-m2e-and-eclipse-are-on-a-boat/
http://techtraits.com/noproto/
http://xmeblog.blogspot.in/2013/12/sending-protobuf-serialized-data-using.html
http://sleeplessinslc.blogspot.in/2010/03/restful-representation-with-google.html
http://tutorials.jenkov.com/maven/maven-tutorial.html
https://developers.google.com/protocol-buffers/docs/proto#generating

Piyush Tiwari

To start off with my first words here, I would like to share a quote which I always found significant and worth following in life:

"There is nothing noble about being superior to some other person, true nobility lies in about being superior to your former self".

Tech - Successful Execution

Sunday, 9 November 2014

Strengthen your test suite - Mutation Testing

Tuesday, 4 November 2014

Hadoop Single Node Setup & Map-Reduce

HDFS Steps:

Setup for Map-Reduce:

Wednesday, 29 October 2014

Google Protobuf - Get it working on Linux

Friday, 21 February 2014

Blog Archive