Hadoop 之 Yarn部署和使用

Posted by Jackson on 2017-08-14

1.Yarn 伪分布式 单点部署 主从架构

Yarn的单点部署文档https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-common/SingleCluster.html

1.1 Configure parameters as follows

etc/hadoop/mapred-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

etc/hadoop/yarn-site.xml:

1
2
3
4
5
6
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

1.2 Start ResourceManager daemon and NodeManager daemon:启动yarn

1
2
3
4
5
6
7
8
9
10
11
[hadoop@bigdata01 ~]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-resourcemanager-bigdata01.out
bigdata01: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/yarn-hadoop-nodemanager-bigdata01.out
[hadoop@bigdata01 ~]$ jps
11857 SecondaryNameNode
11570 NameNode
11698 DataNode
12002 ResourceManager
12391 Jps
12105 NodeManager

1.3 查看resourcemanager进程

1
2
3
4
[hadoop@bigdata01 ~]$ netstat -nlp |grep 12002
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp6 0 0 :::8088 :::* LISTEN 12002/java

2.运行案例测试yarn

2.1 寻找yarn测试jar包

1
2
3
4
5
6
[root@bigdata01 ~]# find / -name '*example*.jar'
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce1/hadoop-examples-2.6.0-mr1-cdh5.16.2.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-test-sources.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/sources/hadoop-mapreduce-examples-2.6.0-cdh5.16.2-sources.jar
/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
最后一个为我们需要的案例jar包

2.2 查看命令帮助,确认怎么运行jar包

1
2
3
4
5
6
7
8
9
[hadoop@bigdata01 ~]$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
。。。。

Most commands print help when invoked w/o parameters.

2.3 运行jar包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[hadoop@bigdata01 ~]$

2.4 提示输入参数,这里做wordcount案例,所以再次运行

1
2
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount
Usage: wordcount <in> [<in>...] <out>

提示需要输入和输出路径

2.5 我们创建文件

1
2
3
4
[hadoop@bigdata01 ~]$ hadoop fs -cat /wordcount/test/test1.txt
19/12/02 21:45:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop hadoop hadoop spark flume spark flink hive hue
flink hbase kafka kafka spark hadoop hive

2.6 再次运行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
[hadoop@bigdata01 ~]$ hadoop jar /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.16.2.jar wordcount /wordcount/test /wordcount/output
19/12/02 21:46:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/12/02 21:46:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/12/02 21:46:46 INFO input.FileInputFormat: Total input paths to process : 1
19/12/02 21:46:46 INFO mapreduce.JobSubmitter: number of splits:1
19/12/02 21:46:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575293526101_0001
19/12/02 21:46:46 INFO impl.YarnClientImpl: Submitted application application_1575293526101_0001
19/12/02 21:46:47 INFO mapreduce.Job: The url to track the job: http://bigdata01:8088/proxy/application_1575293526101_0001/
19/12/02 21:46:47 INFO mapreduce.Job: Running job: job_1575293526101_0001
19/12/02 21:46:57 INFO mapreduce.Job: Job job_1575293526101_0001 running in uber mode : false
19/12/02 21:46:57 INFO mapreduce.Job: map 0% reduce 0%
19/12/02 21:47:03 INFO mapreduce.Job: map 100% reduce 0%
19/12/02 21:47:10 INFO mapreduce.Job: map 100% reduce 100%
19/12/02 21:47:10 INFO mapreduce.Job: Job job_1575293526101_0001 completed successfully
19/12/02 21:47:10 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=100
FILE: Number of bytes written=286249
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=209
HDFS: Number of bytes written=62
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4554
Total time spent by all reduces in occupied slots (ms)=2929
Total time spent by all map tasks (ms)=4554
Total time spent by all reduce tasks (ms)=2929
Total vcore-milliseconds taken by all map tasks=4554
Total vcore-milliseconds taken by all reduce tasks=2929
Total megabyte-milliseconds taken by all map tasks=4663296
Total megabyte-milliseconds taken by all reduce tasks=2999296
Map-Reduce Framework
Map input records=2
Map output records=16
Map output bytes=160
Map output materialized bytes=100
Input split bytes=111
Combine input records=16
Combine output records=8
Reduce input groups=8
Reduce shuffle bytes=100
Reduce input records=8
Reduce output records=8
Spilled Records=16
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=108
CPU time spent (ms)=1520
Physical memory (bytes) snapshot=329445376
Virtual memory (bytes) snapshot=5455265792
Total committed heap usage (bytes)=226627584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=6

2.7 查看结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[hadoop@bigdata01 ~]$ hadoop fs -ls /wordcount
19/12/02 21:48:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2019-12-02 21:47 /wordcount/output
drwxr-xr-x - hadoop supergroup 0 2019-12-02 21:45 /wordcount/test
[hadoop@bigdata01 ~]$ hadoop -ls /wordcount/output
Error: No command named `-ls' was found. Perhaps you meant `hadoop ls'
[hadoop@bigdata01 ~]$ hadoop fs -ls /wordcount/output
19/12/02 21:48:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2019-12-02 21:47 /wordcount/output/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 62 2019-12-02 21:47 /wordcount/output/part-r-00000
[hadoop@bigdata01 ~]$ hadoop fs -cat /wordcount/output/part-r-00000
19/12/02 21:49:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
flink 2
flume 1
hadoop 4
hbase 1
hive 2
hue 1
kafka 2
spark 3
[hadoop@bigdata01 ~]$

3. 修改机器名称:

3.1 查看当前机器名称

1
2
3
4
5
6
7
8
9
10
11
12
[hadoop@bigdata01 ~]$ hostnamectl
Static hostname: bigdata01
Icon name: computer-vm
Chassis: vm
Machine ID: 928fc74e61be492eb9a51cc408995739
Boot ID: 32e41529ec49471dba619ba744be31b1
Virtualization: vmware
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-957.el7.x86_64
Architecture: x86-64
[hadoop@bigdata01 ~]$

3.2 查看命令帮助:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[hadoop@bigdata01 ~]$ hostnamectl --help
hostnamectl [OPTIONS...] COMMAND ...

Query or change system hostname.

-h --help Show this help
--version Show package version
--no-ask-password Do not prompt for password
-H --host=[USER@]HOST Operate on remote host
-M --machine=CONTAINER Operate on local container
--transient Only set transient hostname
--static Only set static hostname
--pretty Only set pretty hostname

Commands:
status Show current hostname settings
set-hostname NAME Set system hostname
set-icon-name NAME Set icon name for host
set-chassis NAME Set chassis type for host
set-deployment NAME Set deployment environment for host
set-location NAME Set location for host

3.3 使用的命令如下:

1
2
3
4
5
set-hostname NAME      Set system hostname
[hadoop@bigdata01 ~]$ hostnamectl xxx bigdata01

[hadoop@bigdata01 ~]$ cat /etc/hostname
bigdata01

修改主机名称之后需要修改对应的hosts文件中的ip地址和主机名称的映射

4.jps的正确使用

4.1 jps的位置

1
2
3
4
5
6
7
8
9
[root@bigdata01 ~]# which jps
/usr/java/jdk1.8.0_121/bin/jps
[root@bigdata01 ~]# jps -l
11857 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
11570 org.apache.hadoop.hdfs.server.namenode.NameNode
11698 org.apache.hadoop.hdfs.server.datanode.DataNode
12002 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager
14471 sun.tools.jps.Jps
12105 org.apache.hadoop.yarn.server.nodemanager.NodeManager

4.2 对应的标识文件

1
2
3
4
5
6
7
8
9
[hadoop@bigdata01 tmp]$ cd /tmp/
[hadoop@bigdata01 tmp]$ ll
total 28
-rw-rw-r-- 1 root root 52 Nov 28 16:38 apache.log
drwxrwxr-x 4 hadoop hadoop 37 Dec 2 21:32 hadoop-hadoop
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 21:31 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 21:31 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 21:31 hadoop-hadoop-secondarynamenode.pid
drwxr-xr-x 2 hadoop hadoop 71 Dec 2 21:55 hsperfdata_hadoop

最后这个就是对应的标识文件里面的文件就对应我们jps中的pid进程号,本质上jps也是读取的文件

1
2
3
4
5
6
7
8
[hadoop@bigdata01 tmp]$ cd hsperfdata_hadoop
[hadoop@bigdata01 hsperfdata_hadoop]$ ll
total 160
-rw------- 1 hadoop hadoop 32768 Dec 2 21:59 11570
-rw------- 1 hadoop hadoop 32768 Dec 2 21:59 11698
-rw------- 1 hadoop hadoop 32768 Dec 2 21:59 11857
-rw------- 1 hadoop hadoop 32768 Dec 2 21:59 12002
-rw------- 1 hadoop hadoop 32768 Dec 2 21:59 12105

4.3 jps的作用

查询pid 进程名称

1
2
3
[root@bigdata01 tmp]# ps -ef |grep 11570
hadoop 11570 1 0 21:31 ? 00:00:09 /usr/java/jdk1.8.0_121/bin/java -Dproc_namenode -Xmx1000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs -Dhadoop.log.file=hadoop-hadoop-namenode-bigdata01.log -Dhadoop.home.dir=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/home/hadoop/app/hadoop-2.6.0-cdh5.16.2/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
root 14969 14657 0 22:03 pts/2 00:00:00 grep --color=auto 11570

这个是NameNode对应进程的pid

4.4 process information unavailable

hadoop用户

1
2
3
4
5
6
[hadoop@bigdata01 ~]$ jps
15781 NameNode
15914 DataNode
16187 Jps
16078 SecondaryNameNode
[hadoop@bigdata01 ~]$

mysqladmin用户

1
2
3
[mysqladmin@bigdata01 ~]$ jps
16266 Jps
[mysqladmin@bigdata01 ~]$

root用户 看所有的,但是显示不可用

4.5 真假判断:

使用ps -ef |gerp xxx |grep -V grep |wc -l

jps本质上读取的是文件 并不影响进程启动和停止

4.6 Linux机制 oom-kill 机制

解释:某个进程占用的内存过高,机器为了保护自己,防止夯住,去杀死内存占用最多的进程
使用free -m 查看机器的内存状况

进程被 oom kill 掉,日志是不会显示error信息的

进程死掉的解决方案:
查看log文件的日志信息==>有error的分析错误信息,没有的想到oom 用free -m 查看进程信息

查看系统的日志信息
查看系统日志:/var/log/messages 和 /var/log/secure这两个文件
cat /var/log/messages | grep oom

4.7 Linux机制

Linux机制 /tmp默认存储周期 1个月 会自动清空不在规则以内的文件

生产上面将yarn和hadoop的pid放到自己的目录里面进行维护,不要放到/tmp 目录里面

1
2
3
4
5
[root@bigdata01 hadoop]$ vi hadoop-env.sh
export HADOOP_PID_DIR=/home/ruoze/tmp

[root@bigdata01 hadoop]$ cat yarn-env.sh
export YARN_PID_DIR=/home/ruoze/tmp

4.8进入系统/tmp 目录下面查看pid文件

1
2
3
4
5
6
7
8
[mysqladmin@bigdata01 tmp]$ ll
total 20
-rw-rw-r-- 1 root root 52 Nov 28 16:38 apache.log
drwxrwxr-x 4 hadoop hadoop 37 Dec 2 21:32 hadoop-hadoop
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 22:08 hadoop-hadoop-datanode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 22:08 hadoop-hadoop-namenode.pid
-rw-rw-r-- 1 hadoop hadoop 6 Dec 2 22:08 hadoop-hadoop-secondarynamenode.pid
drwxr-xr-x 2 hadoop hadoop 45 Dec 2 22:08 hsperfdata_hadoop

彩蛋:

1.yarn中job的命名规范是什么?
application + 时间戳 + 任务序号