Big Data
Apache Hadoop-2.7.5와 Spark-2.1.0 완전 분산 모드 설치
kogun82
2018. 4. 3. 13:20
1). 가상화 서버 구성 (계정: hadoop 으로 설치)
호스트명 | IP 주소 | 용도 |
nn01 | 192.168.130.219 | NameNode, SecondaryNameNode, ResourceManager, Master |
dn01 | 192.168.130.187 | NodeManager, DataNode, Worker |
dn02 | 192.168.130.249 | NodeManager, DataNode, Worker |
2). Java 1.8 설치 (nn01, dn01, dn02)
tar -xvzpf jdk-8u131-linux-x64.tar.gz
mkdir -p /opt/jdk/1.8.0_131
mv jdk1.8.0_131/* /opt/jdk/1.8.0_131/
ln -s /home/hadoop/jdk/1.8.0_131 /home/hadoop/jdk/current
install java with alternatives
alternatives --install /usr/bin/java java /opt/jdk/1.8.0_131/bin/java 2
alternatives --config java
# There is 1 program that provides 'java'.
# Selection Command
# -----------------------------------------------
# *+ 1 /opt/jdk1.8.0_131/bin/java
# Enter to keep the current selection[+], or type selection number:
# javac와 jar 명령어 경로도 alternatives 적용 권장
# At this point JAVA 8 has been successfully installed on your system.
# We also recommend to setup javac and jar commands path using alternatives
alternatives --install /usr/bin/jar jar /opt/jdk/1.8.0_131/bin/jar 2
alternatives --install /usr/bin/javac javac /opt/jdk/1.8.0_131/bin/javac 2
alternatives --set jar /opt/jdk/1.8.0_131/bin/jar
alternatives --set javac /opt/jdk/1.8.0_131/bin/javac
3). Hadoop-2.7.5 네임노드(nn01) 설치
wget http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
tar -xvzf hadoop-2.7.5.tar.gz
mkdir -p /home/hadoop/program/hadoop/2.7.5
mv hadoop-2.7.5/* /home/hadoop/program/hadoop/2.7.5/
ln -s /home/hadoop/program/hadoop/2.7.5 /home/hadoop/program/hadoop/current
4). Java 및 Hadoop 환경 변수 추가 (nn01, dn01, dn02)
vi ~/.bash_profile
#hadoop 2.7.5#
export HADOOP_HOME=/opt/hadoop/current
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=/home/hadoop/program/hive/current/lib/*.jar:/home/hadoop/program/hadoop/current/lib/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/common/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/hdfs/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/mapreduce/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/yarn/*.jar:/opt/jdk/current/lib/*.jar:/home/hadoop/program/sqoop/current/lib/*.jar:/home/hadoop/program/sqoop/current/*.jar:/home/hadoop/program/hadoop/current/share/hadoop/tools/lib/*.jar
#java 1.8.0#
export JAVA_HOME=/opt/jdk/current
export PATH=$PATH:$JAVA_HOME/bin
source ~/.bash_profile
5). ssh-keygen 을 이용한 ssh 공개키 생성및 비밀번호 없이 로그인 설정
ssh-keygen -t rsa
ssh-copy-id dn01
ssh-copy-id dn02
6). Hadoop 설정 (nn01)
[core-site.xml]
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://nn01:9000</value>
</property>
</configuration>
[hdfs-site.xml]
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>nn01:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>nn01:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/home/hadoop/hadoop_data/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
[yarn-site.xml]
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>nn01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>nn01:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>nn01:8032</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>nn01</value>
</property>
</configuration>
[mapred-site.xml]
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.hosts.exclude.filename</name>
<value>$HADOOP_HOME/etc/hadoop/exclude</value>
</property>
<property>
<name>mapreduce.jobtracker.hosts.filename</name>
<value>$HADOOP_HOME/etc/hadoop/include</value>
</property>
</configuration>
vi /home/hadoop/program/hadoop/current/etc/hadoop/masters
nn01
vi /home/hadoop/program/hadoop/current/etc/hadoop/slaves
dn01
vi /home/hadoop/program/hadoop/current/etc/hadoop/hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/opt/jdk/current
vi /home/hadoop/program/hadoop/current/etc/hadoop/yarn-env.sh
# some Java parameters
export JAVA_HOME=/opt/jdk/current
7). Hadoop 설정 복사(dn01, dn02)
(1). nn01
scp -r /home/hadoop/program/hadoop dn01:/home/hadoop/program/
(2). dn01, dn02
ln -s /home/hadoop/program/hadoop/2.7.5 /home/hadoop/program/hadoop/current
8). Hadoop namenode 디렉토리 생성 (nn01 : Namenode)
mkdir -p ~/hadoop_data/hdfs/namenode
mkdir -p ~/hadoop_data/hdfs/namesecondary
9). Hadoop datanode 디렉토리 생성 (dn01 : Datanode)
mkdir -p ~/hadoop_data/hdfs/datanode
10). Namenode 포맷 (nn01)
hadoop namenode -format
11). Daemon 시작 (nn01)
start-all.sh
12). Spark-2.1.0 마스터 노드 (nn01) 설치
wget https://archive.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
tar -xvzf spark-2.1.0-bin-hadoop2.7.tgz
mkdir -p /home/hadoop/program/spark/2.1.0
mv spark-2.1.0/* /home/hadoop/program/spark/2.1.0/
ln -s /home/hadoop/program/spark/2.1.0 /home/hadoop/program/spark/current
13). Spark 설정 (nn01)
(1). config 설정
[spark-defaults.conf]
spark.master spark://nn01:7077
[spark-env.sh]
export JAVA_HOME=/opt/jdk/current
export HADOOP_CONF_DIR=/home/hadoop/program/hadoop/current/etc/hadoop
SPARK_MASTER_IP=nn01
SPARK_MASTER_PORT=7077
[slave]
vi /home/hadoop/program/spark/current/conf/slave dn01
dn02
[log4j.properties]
# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
14). Spark 설정 복사(dn01, dn02)
(1). nn01
scp -r /home/hadoop/program/spark dn01:/home/hadoop/program/
(2). dn01, dn02
ln -s /home/hadoop/program/spark/2.1.0 /home/hadoop/program/spark/current
반응형