Learn Geek languages like Big data,Hadoop,Hive,Pig,Sqoop ,flume,Cassandra,Hbase,Ruby On Rails,Python,Java and many more.

Friday 23 June 2017


CODE

#include <iostream>
using namespace std;

int addition (int a, int b)
{
  int r;
  r=a+b;
  return r;
}

int main ()
{
  int z;
  z = addition (5,3);
  cout << "The result is " << z;
}


Output:

The result is 8

CODE:

#include <iostream>
using namespace std;

void printmessage ()
{
  cout << "I'm a function!";
}

int main ()
{
  printmessage ();
}

Output:

I'm a Function!

Tuesday 25 October 2016

What is Apache Hive?
Apache Hive is a Data warehouse system which is built to work on Hadoop. It is used to querying and managing large datasets residing in distributed storage. Before becoming a open source project of Apache Hadoop, Hive was originated in Facebook. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL.

What is HQL?
Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analysis.

Uses of Hive:

1. The Apache Hive distributed storage.
2. Hive provides tools to enable easy data extract/transform/load (ETL).
3. It provides the structure on a variety of data formats.

Data Definition Language (DDL )

DDL statements are used to build and modify the tables and other objects in the database.
Example :CREATE, DROP, TRUNCATE, ALTER, SHOW, DESCRIBE Statements.

Data Manipulation Language (DML )

DML statements are used to retrieve, store, modify, delete, insert and update data in the database.
Example :LOAD, INSERT Statements.

Saturday 22 October 2016

Steps to install apache Pig

1. Download the tar file apache pig
2. Extract the tar file by

$ tar xvzf pig-0.15.0.tar.gz

3. Set the path of pig in bashrc file
4. Open the bashrc file by

$ sudo gedit .bashrc

5.paste these export lines in bashrc file at bottom
export PIG_HOME=/home/ratul/pig-0.15.0
export PATH=$PATH:/home/ratul/pig-0.15.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf
6.run the command on terminal
for local mode

$ pig -x local

for mapreduce mode

$ pig -x mapreduce

Thursday 20 October 2016

 Steps to install Hive Database


1.Download the tar file of Hive

2. Extract file of hive
$ tar xvzf apache-hive-1.2.1-bin

3. go to  bashrc file by 
  $ gedit .bashrc
  
paste below lines in bashrc file

# Set HIVE_HOME
export HIVE_HOME=/home/ratul/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
4. Go to bin folder of hive
$ cd  home/ratul/apache-hive-1.2.1-bin/bin
5. edit the hive-config.sh file

In  hive-config.sh add
export HADOOP_HOME=/home/ratul/hadoop-2.6.0

6. before run hive,first run hadoop server than run hive
$ hive

Wednesday 19 October 2016

Steps to Install hadoop on  ubuntu


1. Install Jdk 1.6 or greater here.
     to install jdk write 
 $ sudo apt-get install openjdk-8-jdk

2. Download the required hadoop .

3. Extract by tar xvzf hadoop-2.6.0.tar.gz

4. update the JAVA_HOME inside the hadoop-env.sh file.
--write
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386

5. Update your bashrc file by 

    $sudo gedit .bashrc

paste this export lines at the end of file

export HADOOP_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_CONF_DIR=/home/ratul/hadoop-2.6.0/etc/hadoop
export HADOOP_MAPRED_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_COMMON_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_HDFS_HOME=/home/ratul/hadoop-2.6.0
export YARN_HOME=/home/ratul/hadoop-2.6.0

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386
export PATH=$PATH:/home/ratul/hadoop-2.6.0/bin

export HADOOP_USER_CLASSPATH_FIRST=true

6. modify your core-site.xml hdfs-site.xml and mapred-site.xml.

7. Install ssh on your system using sudo apt-get install ssh.

8. ssh localhost should log you in.

9. run the below two commands to save the auth keys.
       $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
       $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

10. now your system is setup and installed with hadoop, format your namenode by

     $ hadoop namenode -format

11. To run your namenode,datanode,secondarynamenode,jobtracker and tasktracker.

     $ cd home/ratul/hadoop-2.6.0/sbin
     $./start-all.sh

12. You can view the namenode http://localhost:50070

13. You can view the cluster at http://localhost:8088

14. You can interact with hdfs using hadoop fs -ls /

Apache Hadoop

hadoop

Apache Hadoop is, an open-source software framework, written in Java, by Doug Cutting and Michael J. Cafarella, that supports data-intensive distributed licensed under the Apache v2 license. It supports of applications on large clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.

The name "Hadoop" was given by Doug Cutting's, he named it after his son's toy elephant. Doug used the name for his open source project because it was easy to pronounce and to Google.The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the is divided into many small of work, each of which may be executed or re-executed on any node in the cluster. It provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. The entire Apache Hadoop platform is commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), and number of related projects including Apache Hive, Apache HBase, Apache Pig, Zookeeper etc.

Before you start proceeding with this hadoop, you should have prior exposure to Core Java, database concepts, and any of the Linux operating system flavors.