Big Data

Apache Hadoop-2.7.5와 Spark-2.1.0 완전 분산 모드 설치

kogun82 2018. 4. 3. 13:20

1). 가상화 서버 구성 (계정: hadoop 으로 설치)

 호스트명 IP 주소  용도 
 nn01  192.168.130.219  NameNode, SecondaryNameNode, ResourceManager, Master
 dn01  192.168.130.187  NodeManager, DataNode, Worker
 dn02  192.168.130.249  NodeManager, DataNode, Worker

 

2). Java 1.8 설치 (nn01, dn01, dn02)

tar -xvzpf jdk-8u131-linux-x64.tar.gz

mkdir -p /opt/jdk/1.8.0_131

mv jdk1.8.0_131/* /opt/jdk/1.8.0_131/

ln -s /home/hadoop/jdk/1.8.0_131 /home/hadoop/jdk/current

install java with alternatives

alternatives --install /usr/bin/java java /opt/jdk/1.8.0_131/bin/java 2

alternatives --config java


# There is 1 program that provides 'java'.

# Selection    Command
# -----------------------------------------------
# *+ 1           /opt/jdk1.8.0_131/bin/java

# Enter to keep the current selection[+], or type selection number:

# javac와 jar 명령어 경로도 alternatives 적용 권장
# At this point JAVA 8 has been successfully installed on your system. 
# We also recommend to setup javac and jar commands path using alternatives

alternatives --install /usr/bin/jar jar /opt/jdk/1.8.0_131/bin/jar 2

alternatives --install /usr/bin/javac javac /opt/jdk/1.8.0_131/bin/javac 2

alternatives --set jar /opt/jdk/1.8.0_131/bin/jar

alternatives --set javac /opt/jdk/1.8.0_131/bin/javac

 

3). Hadoop-2.7.5 네임노드(nn01) 설치 

wget http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

tar -xvzf hadoop-2.7.5.tar.gz

mkdir -p /home/hadoop/program/hadoop/2.7.5

mv hadoop-2.7.5/* /home/hadoop/program/hadoop/2.7.5/

ln -s /home/hadoop/program/hadoop/2.7.5 /home/hadoop/program/hadoop/current

 

4). Java 및  Hadoop 환경 변수 추가 (nn01, dn01, dn02)

vi ~/.bash_profile

#hadoop 2.7.5#
export HADOOP_HOME=/opt/hadoop/current
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=/home/hadoop/program/hive/current/lib/*.jar:/home/hadoop/program/hadoop/current/lib/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/common/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/hdfs/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/mapreduce/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/yarn/*.jar:/opt/jdk/current/lib/*.jar:/home/hadoop/program/sqoop/current/lib/*.jar:/home/hadoop/program/sqoop/current/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/tools/lib/*.jar

#java 1.8.0#
export JAVA_HOME=/opt/jdk/current
export PATH=$PATH:$JAVA_HOME/bin

source ~/.bash_profile

 

5). ssh-keygen 을 이용한 ssh 공개키 생성및 비밀번호 없이 로그인 설정

ssh-keygen -t rsa 
ssh-copy-id dn01
ssh-copy-id dn02

 

6). Hadoop 설정 (nn01)

[core-site.xml]
 <configuration> 
 	<property> 
    	<name>fs.defaultFS</name> 
        <value>hdfs://nn01:9000</value> 
	</property> 
</configuration> 

[hdfs-site.xml] 
<configuration> 
	<property> 
    	<name>dfs.replication</name> 
        <value>1</value> 
	</property> 
    <property> 
    	<name>dfs.namenode.http-address</name> 
        <value>nn01:50070</value> 
    </property> 
	<property> 
    	<name>dfs.namenode.secondary.http-address</name> 
        <value>nn01:50090</value> 
    </property> 
    <property> 
    	<name>dfs.namenode.name.dir</name> 
        <value>file:/home/hadoop/hadoop_data/hdfs/namenode</value> 
    </property> 
    <property> 
    	<name>dfs.datanode.data.dir</name> 
        <value>file:/home/hadoop/hadoop_data/hdfs/datanode</value> 
    </property> 
    <property> 
    	<name>dfs.namenode.checkpoint.dir</name> 
        <value>file:/home/hadoop/hadoop_data/hdfs/namesecondary</value> 
    </property> 
    <property> 
    	<name>dfs.webhdfs.enabled</name> 
        <value>true</value> 
    </property> 
</configuration> 

[yarn-site.xml] 
<configuration> 
	<property> 
    	<name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
    	<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
        <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
    </property> 
    <property> 
    	<name>yarn.resourcemanager.scheduler.address</name> 
        <value>nn01:8030</value> 
    </property> 
    	<property> 
        	<name>yarn.resourcemanager.resource-tracker.address</name> 
            <value>nn01:8031</value> 
        </property> 
        <property> 
        	<name>yarn.resourcemanager.address</name> 
            <value>nn01:8032</value> 
        </property> 
        <property> 
        	<name>yarn.resourcemanager.hostname</name> 
            <value>nn01</value> 
        </property> 
    </configuration> 
    
[mapred-site.xml] 
<configuration> 
	<property> 
    	<name>mapreduce.framework.name</name> 
        <value>yarn</value> 
    </property> 
    <property> 
    	<name>mapreduce.jobtracker.hosts.exclude.filename</name> 
        <value>$HADOOP_HOME/etc/hadoop/exclude</value> 
    </property> 
    <property> 
    	<name>mapreduce.jobtracker.hosts.filename</name> 
        <value>$HADOOP_HOME/etc/hadoop/include</value> 
    </property> 
</configuration> 

vi /home/hadoop/program/hadoop/current/etc/hadoop/masters 
nn01 

vi /home/hadoop/program/hadoop/current/etc/hadoop/slaves 
dn01 

vi /home/hadoop/program/hadoop/current/etc/hadoop/hadoop-env.sh 
# The java implementation to use. 
export JAVA_HOME=/opt/jdk/current 

vi /home/hadoop/program/hadoop/current/etc/hadoop/yarn-env.sh 
# some Java parameters 
export JAVA_HOME=/opt/jdk/current

 

7). Hadoop 설정 복사(dn01, dn02)

(1). nn01
scp -r /home/hadoop/program/hadoop dn01:/home/hadoop/program/

(2). dn01, dn02
ln -s /home/hadoop/program/hadoop/2.7.5 /home/hadoop/program/hadoop/current

 

8). Hadoop namenode 디렉토리 생성 (nn01 : Namenode)

mkdir -p ~/hadoop_data/hdfs/namenode
mkdir -p ~/hadoop_data/hdfs/namesecondary

 

9). Hadoop datanode 디렉토리 생성 (dn01 : Datanode)

mkdir -p ~/hadoop_data/hdfs/datanode

 

10). Namenode 포맷 (nn01)

hadoop namenode -format

 

11). Daemon 시작 (nn01)

start-all.sh

 

12). Spark-2.1.0 마스터 노드 (nn01) 설치 

wget https://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz
mkdir -p /home/hadoop/program/spark/2.1.0
mv spark-2.1.0/* /home/hadoop/program/spark/2.1.0/
ln -s /home/hadoop/program/spark/2.1.0 /home/hadoop/program/spark/current


13). Spark 설정 (nn01)

(1). config 설정

[spark-defaults.conf]

spark.master spark://nn01:7077

[spark-env.sh]

export JAVA_HOME=/opt/jdk/current
export HADOOP_CONF_DIR=/home/hadoop/program/hadoop/current/etc/hadoop
SPARK_MASTER_IP=nn01
SPARK_MASTER_PORT=7077

[slave]
vi /home/hadoop/program/spark/current/conf/slave dn01
dn02

[log4j.properties]
# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR


14). Spark 설정 복사(dn01, dn02)

(1). nn01
scp -r /home/hadoop/program/spark dn01:/home/hadoop/program/

(2). dn01, dn02
ln -s /home/hadoop/program/spark/2.1.0 /home/hadoop/program/spark/current