2013년 2월 20일 수요일

Hadoop 설치하기 on Ubuntu12.10

Hadoop 설치하기 on Ubuntu12.10


0. 설치 준비 

기본적으로 hadoop은 리눅스 기반에서 동작하기 때문에, 윈도우 VMWARE 에 우분투(ubuntu 12.10)를 설치해서 hadoop 을 설치 작업을 진행한다.

 - openSSH server 설치 : software center => Secure shell client and server 설치
 - JDK 다운로드 :  http://www.oracle.com/technetwork/java/javase/downloads/jdk6downloads-1902814.html
 - Hadoop 다운로드 : http://hadoop.apache.org/releases.html
     여러가지 버전중에 hadoop-1.0.4.tar.gz 를 이용해서 테스트 함 (stable)
   

이제 필요한 준비를 모두 된거 같아 하나씩 설치해 보자

1. Hadoop 용 계정 추가 

citylock@ubuntuA:~/Downloads$ sudo adduser hadoop
[sudo] password for citylock:
Adding user `hadoop' ...
Adding new group `hadoop' (1001) ...
Adding new user `hadoop' (1001) with group `hadoop' ...
Creating home directory `/home/hadoop' ...


2. Hadoop이 사용할 temp 디렉토리 만들기 

citylock@ubuntuA:/home/hadoop$ sudo chown hadoop.hadoop temp/
citylock@ubuntuA:/home/hadoop$ ll
total 36
drwxr-xr-x 3 hadoop hadoop 4096 Feb 20 06:14 ./
drwxr-xr-x 4 root   root   4096 Feb 20 06:10 ../
-rw-r--r-- 1 hadoop hadoop  220 Feb 20 06:10 .bash_logout
-rw-r--r-- 1 hadoop hadoop 3486 Feb 20 06:10 .bashrc
-rw-r--r-- 1 hadoop hadoop 8445 Feb 20 06:10 examples.desktop
-rw-r--r-- 1 hadoop hadoop  675 Feb 20 06:10 .profile
drwxr-xr-x 2 hadoop hadoop 4096 Feb 20 06:14 temp/

** hadoop이 실행되면 map, reduce 하는 과정에서 사용되는 temp 디렉토리


3. ssh 키생성후 authorized_keys 로 등록하기(비밀번호 입력없이 바로 접속할수 있다)

hadoop@ubuntuA:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
4e:34:7d:6d:81:a3:f7:77:03:61:73:8d:d4:9b:cb:78 hadoop@ubuntuA
The key's randomart image is:
+--[ RSA 2048]----+
|             ooo.|
|         .  o=.oo|
|        o ..o.* o|
|       . ....o o |
|        S  . .+ .|
|       o     ..Eo|
|        .     ..o|
|                 |
|                 |
+-----------------+

hadoop@ubuntuA:~/.ssh$ cp id_rsa.pub authorized_keys
hadoop@ubuntuA:~/.ssh$ cat authorized_keys 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7J1YY10OHCcH3C+uHyOpjFs2BkgPcS4Mn+gAd3yb05jgwqd1ff3vfzxwma3dsWeWehUNhnZOrvQmphJYQ+JNBsFPGaRLQxA2d8/YiLyrq9d3gj/NS+Es1JDO1hsPnbzHnLxacLvl0xayab9uYsFDX11tNJyTYQX+sc5GqIOaYyZwNYXh+04tSBPcW2ksIXCISvmpXwSV2Rp9UAFTdf+nHIKPuu8nDsJzHCHIRghFjRkS0awP7LsUMPY4oerAQtKa0dd16cMN1J4LW0Nfci4k+WT5IomlJtv3BWuod1wTNxVHgzy8qsdGO2x6tLhEE0GwNHLkU2++g98oEdJtCGPkZ hadoop@ubuntuA


이렇게 생성된 authorized_keys 파일을 slave 서버의 ~/.ssh/ 디렉토리에 복사한다.


4. 다운로드 받은 hadoop package 압축풀기
hadoop@ubuntuA:~/Downloads$ tar xvfz hadoop-1.0.4.tar.gz

hadoop@ubuntuA:~/Downloads$ mkdir ../bin
hadoop@ubuntuA:~/Downloads$ mv hadoop-1.0.4 ../bin/
hadoop@ubuntuA:~/Downloads$

hadoop@ubuntuA:~/bin$ ln -s hadoop-1.0.4 hadoop
hadoop@ubuntuA:~/bin$ ll
total 12
drwxrwxr-x  3 hadoop hadoop 4096 Feb 20 06:38 ./
drwxr-xr-x 23 hadoop hadoop 4096 Feb 20 06:37 ../
lrwxrwxrwx  1 hadoop hadoop   12 Feb 20 06:38 hadoop -> hadoop-1.0.4/
drwxr-xr-x 14 hadoop hadoop 4096 Oct  2 22:17 hadoop-1.0.4/


5. JAVA - JDK 설치하기

hadoop@ubuntuA:~/Downloads$ ./jdk-6u41-linux-i586.bin

hadoop@ubuntuA:~/Downloads$ sudo mv jdk1.6.0_41 /usr/local/
hadoop@ubuntuA:~/Downloads$ cd /usr/local/

drwxr-xr-x  8 hadoop hadoop 4096 Feb 20 07:25 jdk1.6.0_41/

hadoop@ubuntuA:/usr/local$ sudo chown -R root:root /usr/local/jdk1.6.0_41/
hadoop@ubuntuA:/usr/local$ sudo ln -s jdk1.6.0_41 java-6-sun


6. JAVA 및 HADOOP 환경 변수 설정

.bashrc 파일에 아래와 같이 환경변수를 추가한다.


export JAVA_HOME=/usr/local/java-6-sun
export HADOOP_HOME=/home/hadoop/bin/hadoop

export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH


7. Hadoop 환경설정 파일 

/home/hadoop/bin/hadoop/conf/hadoop-env.sh 파일에서 JAVA_HOME 에 대한 부분을 수정한다.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/local/java-6-sun


8. 기타 hadoop conf 파일 수정 


hadoop@ubuntuA:~/bin/hadoop/conf$ cat core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:10001</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/temp</value>
</property>
</configuration>
hadoop@ubuntuA:~/bin/hadoop/conf$ cat hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
hadoop@ubuntuA:~/bin/hadoop/conf$ cat mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://master:10002</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/hdfs/mapreduce/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/hdfs/mapreduce/local</value>
</property>
</configuration>
hadoop@ubuntuA:~/bin/hadoop/conf$ cat masters 

master
hadoop@ubuntuA:~/bin/hadoop/conf$ cat slaves 
slave01
slave02
master


9. Hadoop fs format 하기

hadoop@ubuntuA:~$ hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.

13/02/21 02:38:59 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntuA/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
13/02/21 02:38:59 INFO util.GSet: VM type       = 32-bit
13/02/21 02:38:59 INFO util.GSet: 2% max memory = 19.33375 MB
13/02/21 02:38:59 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/02/21 02:38:59 INFO util.GSet: recommended=4194304, actual=4194304
13/02/21 02:38:59 INFO namenode.FSNamesystem: fsOwner=hadoop
13/02/21 02:38:59 INFO namenode.FSNamesystem: supergroup=supergroup
13/02/21 02:38:59 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/02/21 02:38:59 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/02/21 02:38:59 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/02/21 02:38:59 INFO namenode.NameNode: Caching file names occuring more than 10 times 
13/02/21 02:38:59 INFO common.Storage: Image file of size 112 saved in 0 seconds.
13/02/21 02:38:59 INFO common.Storage: Storage directory /hdfs/name has been successfully formatted.
13/02/21 02:38:59 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntuA/127.0.1.1
************************************************************/

** hadoop fs format 은 master 부터 시작해서 모든 slave 에서도 수행한다. 


** Directory 생성에 필요한 권한이 없는 경우 에러가 발생할수 있다. 
hadoop@ubuntuA:~$ hadoop namenode -format
Warning: $HADOOP_HOME is deprecated.

13/02/21 02:35:10 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntuA/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
************************************************************/
13/02/21 02:35:10 INFO util.GSet: VM type       = 32-bit
13/02/21 02:35:10 INFO util.GSet: 2% max memory = 19.33375 MB
13/02/21 02:35:10 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/02/21 02:35:10 INFO util.GSet: recommended=4194304, actual=4194304
13/02/21 02:35:10 INFO namenode.FSNamesystem: fsOwner=hadoop
13/02/21 02:35:10 INFO namenode.FSNamesystem: supergroup=supergroup
13/02/21 02:35:10 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/02/21 02:35:10 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/02/21 02:35:10 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
13/02/21 02:35:10 INFO namenode.NameNode: Caching file names occuring more than 10 times 
13/02/21 02:35:11 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /hdfs/name/current
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:297)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1320)
at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1339)


/hdfs 를 생성하고 hadoop 계정에 모든 권한을 준다. 

     sudo chown hadoop:hadoop hdfs


10. Hadoop 프로세스 실행 및 확인

hadoop의 모든 프로세스는 master node에서 컨트롤된다. 따라서 master node 에서 start-all.sh 스크립트를 실행하면 자신의 hadoop 프로세스 뿐만 아니라 ssh 로 slave 서버에 접속해서 관련 프로세스를 모두 실행시킨다.

master 서버의 프로세스 확인 (master & slave 기능)

hadoop@ubuntuA:~/bin/hadoop-1.0.4/bin$ /usr/local/java-6-sun/bin/jps
7532 Jps
6932 DataNode
6708 NameNode
7153 SecondaryNameNode
7456 TaskTracker
7234 JobTracker
slave 서버의 프로세스 확인 
hadoop@ubuntuA:~/bin/hadoop/bin$ /usr/local/java-6-sun/bin/jps
10696 Jps
10439 DataNode
10627 TaskTracker



참고사이트
http://coolkim.tistory.com/352
http://blog.softwaregeeks.org/archives/138
http://hadoop.apache.org/docs/r1.0.4/cluster_setup.html#Configuration

댓글 없음:

댓글 쓰기