hadoop2.3单机搭建

   没事整理了之前搭建hadoop的过程,这里使用了最新的hadoop版本,想在单机上做一些测试,顺手也就整理了一下这个文档。

一、准备环境

1Hadoop是用Java开发的,必须要安装JDK1.6或更高版本

apt-get install openjdk-6-jdk

2Hadoop是通过SSH来启动slave主机中的守护进程,必须安装OpenSSH

apt-get install openssh-server

3Hadoop更新比较快,我们采用最新版hadoop2.3来安装

4.配置对应Hosts记录,关闭iptablesselinux(过程略)

5.创建相同用户及配置无密码认证

.配置SSH无密码登录

helight:data1$ ssh-keygen -t rsa #一直回车生成

helight:data1$ cd

helight:~$ ls

helight:~$ls -a

. .. .bash_logout .bashrc .profile .ssh

helight:~$ cd .ssh/

helight:.ssh$ ls

id_rsa id_rsa.pub

helight:.ssh$ cat id_rsa.pub >> authorized_keys

helight:.ssh$ ls

authorized_keys id_rsa id_rsa.pub

helight:.ssh$ chmod 600 authorized_keys

hhelight:.ssh$ chmod 700 ../.ssh #目录权限必须设置700

root@debian:/data1# vim /etc/ssh/sshd_config #开启RSA认证

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile %h/.ssh/authorized_keys

 

 

三、Hadoop的安装与配置

1.下载

http://hadoop.apache.org/下载hadoop Release 版本,这里我下载了2.3.0的版本,目前是最新版本。解压到本地文件系统中,我下载到了/data1/目录中。

$tar xzf hadoop-2.3.0.tar.gz

2.用户设置环境变量
编辑
~/.bashrc,添加JAVA_HOME,PATH

JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

export JAVA_HOME
export HADOOP_HOME=/data1/hadoop/

export HADOOP_HOME

PATH=$PATH:/data1/hadoop/bin:/data1/hadoop-2.3.0/sbin

export PATH

编辑完成之后,重新打开一个shell终端,这个大家应该都知道是为了上面配置的环境变量在新终端中生效,或者执行source /.bashrc

再来看看配置结果:看一下hadoop可不可以执行。

helight:data1$ hadoop version

Hadoop 2.3.0

Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123

Compiled by jenkins on 2014-02-11T13:40Z

Compiled with protoc 2.5.0

From source with checksum dfe46336fbc6a044bc124392ec06b85

This command was run using /data1/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar

helight:data1$

 

3.hadoop环境变量配置

cd hadoop-2.3.0/etc/hadoop

vim hadoop-env.sh

# The java implementation to use.

JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64

export JAVA_HOME

export JAVA_HOME=${JAVA_HOME}

export HADOOP_HOME=/data1/hadoop-2.3.0/

export HADOOP_HOME

PATH=$PATH:/data1/hadoop-2.3.0/bin/:/data1/hadoop-2.3.0/sbin

export PATH

4.slaves设置从节点

只有一个所以是localhost,要是多个需要设置hosts和这里添加域名,并且设置ssh登录无密码。

helight:hadoop$ vi slaves

localhost

 

5hdfs核心配置文件:core-site.xml

<property>

<name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>io.file.buffer.size</name>

<value>131072</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>file:/home/hadoop/tmp</value>

</property>

6.hdfs存储配置文件:hdfs-site.xml

<configuration>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/home/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.namenode.data.dir</name>

<value>file:/home/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.replication</name> #数据副本数量,默认3,设置1

<value>1</value>

</property>

</configuration>

 

7mapreaduce管理器配置文件:yarn-site.xml

<configuration>

<property>

<name>yarn.resourcemanager.address</name>

<value>localhost:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>localhost:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>localhost:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>localhost:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>localhost:8088</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

   <value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

</configuration>

8.mapreaduce框架mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapreduce.jobhistory.address</name>

<value>localhost:10020</value>

</property>

<property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>localhost:19888</value>

</property>

</configuration>

四、格式化文件系统并启动

1.格式化新的分布式文件系统(hdfs namenode -format

初始化hdfs目录结构和数据。

helight:sbin$ hdfs namenode -format

2.启动hadoop

这里使用了start-all.sh ,实际上这个脚本是启动了start-dfs.shstart-yarn.sh ,所以也可以分阶段启动,start-dfs.sh启动了hdfsstart-yarn.sh 启动mapreduce调度框架。

helight:sbin$ ./start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

14/03/23 22:38:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Starting namenodes on [localhost]

localhost: starting namenode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-namenode-debian.out

localhost: starting datanode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-datanode-debian.out

Starting secondary namenodes [0.0.0.0]

0.0.0.0: starting secondarynamenode, logging to /data1/hadoop-2.3.0/logs/hadoop-helight-secondarynamenode-debian.out

14/03/23 22:39:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

starting yarn daemons

starting resourcemanager, logging to /data1/hadoop-2.3.0/logs/yarn-helight-resourcemanager-debian.out

localhost: starting nodemanager, logging to /data1/hadoop-2.3.0/logs/yarn-helight-nodemanager-debian.out

3使用jps检查守护进程是否启动

helight:sbin$ jps

14506 NodeManager

14205 SecondaryNameNode

14720 Jps

14072 DataNode

13981 NameNode

14411 ResourceManager

4.查看集群状态

hdfs dfsadmin -report

14/03/23 22:43:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Configured Capacity: 83225169920 (77.51 GB)

Present Capacity: 78803722240 (73.39 GB)

DFS Remaining: 78803697664 (73.39 GB)

DFS Used: 24576 (24 KB)

DFS Used%: 0.00%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0

 

————————————————-

Datanodes available: 1 (1 total, 0 dead)

 

Live datanodes:

Name: 127.0.0.1:50010 (localhost)

Hostname: debian.xu

Decommission Status : Normal

Configured Capacity: 83225169920 (77.51 GB)

DFS Used: 24576 (24 KB)

Non DFS Used: 4421447680 (4.12 GB)

DFS Remaining: 78803697664 (73.39 GB)

DFS Used%: 0.00%

DFS Remaining%: 94.69%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Last contact: Sun Mar 23 22:43:48 HKT 2014

 

5.通过web查看资源(http://localhost:8088

主要看mr进程管理

6、查看HDFS状态(http://localhost:50070

可以看到hdfs的运营情况,是hdfs的一个简单oss

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *