Learn Geek languages like Big data,Hadoop,Hive,Pig,Sqoop ,flume,Cassandra,Hbase,Ruby On Rails,Python,Java and many more.

Tuesday 25 October 2016

What is Apache Hive?
Apache Hive is a Data warehouse system which is built to work on Hadoop. It is used to querying and managing large datasets residing in distributed storage. Before becoming a open source project of Apache Hadoop, Hive was originated in Facebook. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL.

What is HQL?
Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analysis.

Uses of Hive:

1. The Apache Hive distributed storage.
2. Hive provides tools to enable easy data extract/transform/load (ETL).
3. It provides the structure on a variety of data formats.

Data Definition Language (DDL )

DDL statements are used to build and modify the tables and other objects in the database.
Example :CREATE, DROP, TRUNCATE, ALTER, SHOW, DESCRIBE Statements.

Data Manipulation Language (DML )

DML statements are used to retrieve, store, modify, delete, insert and update data in the database.
Example :LOAD, INSERT Statements.

Saturday 22 October 2016

Steps to install apache Pig

1. Download the tar file apache pig
2. Extract the tar file by

$ tar xvzf pig-0.15.0.tar.gz

3. Set the path of pig in bashrc file
4. Open the bashrc file by

$ sudo gedit .bashrc

5.paste these export lines in bashrc file at bottom
export PIG_HOME=/home/ratul/pig-0.15.0
export PATH=$PATH:/home/ratul/pig-0.15.0/bin
export PIG_CLASSPATH=$HADOOP_HOME/conf
6.run the command on terminal
for local mode

$ pig -x local

for mapreduce mode

$ pig -x mapreduce

Thursday 20 October 2016

 Steps to install Hive Database


1.Download the tar file of Hive

2. Extract file of hive
$ tar xvzf apache-hive-1.2.1-bin

3. go to  bashrc file by 
  $ gedit .bashrc
  
paste below lines in bashrc file

# Set HIVE_HOME
export HIVE_HOME=/home/ratul/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
4. Go to bin folder of hive
$ cd  home/ratul/apache-hive-1.2.1-bin/bin
5. edit the hive-config.sh file

In  hive-config.sh add
export HADOOP_HOME=/home/ratul/hadoop-2.6.0

6. before run hive,first run hadoop server than run hive
$ hive

Wednesday 19 October 2016

Steps to Install hadoop on  ubuntu


1. Install Jdk 1.6 or greater here.
     to install jdk write 
 $ sudo apt-get install openjdk-8-jdk

2. Download the required hadoop .

3. Extract by tar xvzf hadoop-2.6.0.tar.gz

4. update the JAVA_HOME inside the hadoop-env.sh file.
--write
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386

5. Update your bashrc file by 

    $sudo gedit .bashrc

paste this export lines at the end of file

export HADOOP_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_CONF_DIR=/home/ratul/hadoop-2.6.0/etc/hadoop
export HADOOP_MAPRED_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_COMMON_HOME=/home/ratul/hadoop-2.6.0
export HADOOP_HDFS_HOME=/home/ratul/hadoop-2.6.0
export YARN_HOME=/home/ratul/hadoop-2.6.0

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-i386
export PATH=$PATH:/home/ratul/hadoop-2.6.0/bin

export HADOOP_USER_CLASSPATH_FIRST=true

6. modify your core-site.xml hdfs-site.xml and mapred-site.xml.

7. Install ssh on your system using sudo apt-get install ssh.

8. ssh localhost should log you in.

9. run the below two commands to save the auth keys.
       $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
       $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

10. now your system is setup and installed with hadoop, format your namenode by

     $ hadoop namenode -format

11. To run your namenode,datanode,secondarynamenode,jobtracker and tasktracker.

     $ cd home/ratul/hadoop-2.6.0/sbin
     $./start-all.sh

12. You can view the namenode http://localhost:50070

13. You can view the cluster at http://localhost:8088

14. You can interact with hdfs using hadoop fs -ls /

Apache Hadoop

hadoop

Apache Hadoop is, an open-source software framework, written in Java, by Doug Cutting and Michael J. Cafarella, that supports data-intensive distributed licensed under the Apache v2 license. It supports of applications on large clusters of commodity hardware. Hadoop was derived from Google's MapReduce and Google File System (GFS) papers.

The name "Hadoop" was given by Doug Cutting's, he named it after his son's toy elephant. Doug used the name for his open source project because it was easy to pronounce and to Google.The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named MapReduce, where the is divided into many small of work, each of which may be executed or re-executed on any node in the cluster. It provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both reduce and the distributed file system are designed so that node failures are automatically handled by the framework. It enables applications to work with thousands of computation-independent computers and petabytes of data. The entire Apache Hadoop platform is commonly considered to consist of the Hadoop kernel, MapReduce and Hadoop Distributed File System (HDFS), and number of related projects including Apache Hive, Apache HBase, Apache Pig, Zookeeper etc.

Before you start proceeding with this hadoop, you should have prior exposure to Core Java, database concepts, and any of the Linux operating system flavors.

Monday 17 October 2016

BIG DATA

Big data analytics is the process of examining large data sets containing a variety of data types -- i.e., big data -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits.

90% of the world’s data was generated in the last few years.

What is Big Data?

Big data means really a big data, it is a collection of large datasets that cannot be processed using traditional computing techniques. Big data is not merely a data, rather it has become a complete subject, which involves various tools, technqiues and frameworks.


What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are some of the fields that come under the umbrella of Big Data.
  • Black Box Data : It is a component of helicopter, airplanes, and jets, etc. It captures voices of the flight crew, recordings of microphones and earphones, and the performance information of the aircraft.
  • Social Media Data : Social media such as Facebook and Twitter hold information and the views posted by millions of people across the globe.
  • Stock Exchange Data : The stock exchange data holds information about the ‘buy’ and ‘sell’ decisions made on a share of different companies made by the customers.
  • Power Grid Data : The power grid data holds information consumed by a particular node with respect to a base station.
  • Transport Data : Transport data includes model, capacity, distance and availability of a vehicle.
  • Search Engine Data : Search engines retrieve lots of data from different databases.
  • Benefits of Big Data

  • Using the information kept in the social network like Facebook, the marketing agencies are learning about the response for their campaigns, promotions, and other advertising mediums.
  • Using the information in the social media like preferences and product perception of their consumers, product companies and retail organizations are planning their production.
  • Using the data regarding the previous medical history of patients, hospitals are providing better and quick service.

  • Big Data Challenges

  • The major challenges associated with big data are as follows:
  • Capturing data
  • Curation
  • Storage
  • Searching
  • Sharing
  • Transfer
  • Analysis
  • Presentation

Saturday 8 October 2016


final:

final is a keyword. The variable decleared as final should be
initialized only once and cannot be changed. Java classes
declared as final cannot be extended. Methods declared as final
cannot be overridden.


finally:

finally is a block. The finally block always executes when the
try block exits. This ensures that the finally block is executed
even if an unexpected exception occurs. But finally is useful for
more than just exception handling - it allows the programmer to
avoid having cleanup code accidentally bypassed by a return,
continue, or break. Putting cleanup code in a finally block is
always a good practice, even when no exceptions are anticipated.


finalize:

finalize is a method. Before an object is garbage collected, the
runtime system calls its finalize() method. You can write system
resources release code in finalize() method before getting garbage
collected.