MapReduce docs
이전 시간에 test 폴더 내에 file01, file 02 올린 상태로 진행하는 내용이다. (210715 내용 참고)
1. test 폴더 하위에 wc.jar 파일을 저장한다
wc.jar 파일은 위의 docs 의 source code를 다운받아 jar 파일로 export 하면된다.
(이클립스에 java source code 저장 후 export 하고 저장된 jar 파일을 공용 폴더에 넣고
hadoop root계정에서 test 폴더 하위에 저장한다.)
2. 디렉토리 올리기
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar /user/joe/wordcount/input /user/joe/wordcount/output
실행화면
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar /user/joe/wordcount/input /user/joe/wordcount/output
21/07/16 20:24:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/07/16 20:24:33 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.56.116:8040
21/07/16 20:24:37 INFO input.FileInputFormat: Total input files to process : 2
21/07/16 20:24:37 INFO mapreduce.JobSubmitter: number of splits:2
21/07/16 20:24:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1626432687117_0001
21/07/16 20:24:38 INFO conf.Configuration: resource-types.xml not found
21/07/16 20:24:38 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/07/16 20:24:38 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/07/16 20:24:38 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/07/16 20:24:39 INFO impl.YarnClientImpl: Submitted application application_1626432687117_0001
21/07/16 20:24:39 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1626432687117_0001/
21/07/16 20:24:39 INFO mapreduce.Job: Running job: job_1626432687117_0001
21/07/16 20:24:59 INFO mapreduce.Job: Job job_1626432687117_0001 running in uber mode : false
21/07/16 20:24:59 INFO mapreduce.Job: map 0% reduce 0%
21/07/16 20:25:15 INFO mapreduce.Job: map 100% reduce 0%
21/07/16 20:25:30 INFO mapreduce.Job: map 100% reduce 100%
21/07/16 20:25:30 INFO mapreduce.Job: Job job_1626432687117_0001 completed successfully
21/07/16 20:25:30 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=117
FILE: Number of bytes written=625427
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=292
HDFS: Number of bytes written=67
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=27787
Total time spent by all reduces in occupied slots (ms)=10943
Total time spent by all map tasks (ms)=27787
Total time spent by all reduce tasks (ms)=10943
Total vcore-milliseconds taken by all map tasks=27787
Total vcore-milliseconds taken by all reduce tasks=10943
Total megabyte-milliseconds taken by all map tasks=28453888
Total megabyte-milliseconds taken by all reduce tasks=11205632
Map-Reduce Framework
Map input records=3
Map output records=9
Map output bytes=93
Map output materialized bytes=123
Input split bytes=234
Combine input records=9
Combine output records=9
Reduce input groups=8
Reduce shuffle bytes=123
Reduce input records=9
Reduce output records=8
Spilled Records=18
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=570
CPU time spent (ms)=2360
Physical memory (bytes) snapshot=546463744
Virtual memory (bytes) snapshot=6203805696
Total committed heap usage (bytes)=301146112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
com.test.WordCount2$TokenizerMapper$CountersEnum
INPUT_WORDS=9
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=67
3. MR 작업 된 파일 output 읽어 오기
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
실행결과
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output/part-r-00000
21/07/16 20:26:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Bye 1
Goodbye 1
Hadoop, 1
Hello 2
World! 1
World, 1
hadoop. 1
to 1
4. 위 내용에서 필요없는 단어 ( , . ! to) 지울 내용을 pattern.txt 로 만들기
1) 비주얼로 home/hadoop/test/pattern.txt 만들고 아래 내용을 입력 후 저장
\.
\,
\!
to
5. pattern.txt 파일을 디렉토리에 올리기 (잘 올라가지 않는다면 50070 에 직접 비주얼로 올림)
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -put pattern.txt /user/joe/wordcount/input
5. 디렉토리 올린 pattern.txt 파일을 읽기
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/input/pattern.txt
실행파일
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/input/pattern.txt
21/07/16 20:31:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
\.
\,
\!
to
6. file01, file02 txt 파일 내의 문장을 mr 작업 후에
대/소문자를 구분해서 출력해보고 구분하지 않고 모두 소문자로 변환 후 출력해보자.
(Dwordcount 사용)
1) 대/ 소문자를 구분해서 출력하기 = true
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar -Dwordcount.case.sensitive=true /user/joe/wordcount/input /user/joe/wordcount/output1 -skip /user/joe/wordcount/pattern.txt
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output1/part-r-00000
실행 화면
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar -Dwordcount.case.sensitive=true /user/joe/wordcount/input /user/joe/wordcount/output1 -skip /user/joe/wordcount/pattern.txt
21/07/16 20:43:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/07/16 20:43:19 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.56.116:8040
21/07/16 20:43:22 INFO input.FileInputFormat: Total input files to process : 2
21/07/16 20:43:22 INFO mapreduce.JobSubmitter: number of splits:2
21/07/16 20:43:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1626432687117_0004
21/07/16 20:43:23 INFO conf.Configuration: resource-types.xml not found
21/07/16 20:43:23 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/07/16 20:43:23 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/07/16 20:43:23 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/07/16 20:43:24 INFO impl.YarnClientImpl: Submitted application application_1626432687117_0004
21/07/16 20:43:24 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1626432687117_0004/
21/07/16 20:43:24 INFO mapreduce.Job: Running job: job_1626432687117_0004
21/07/16 20:43:38 INFO mapreduce.Job: Job job_1626432687117_0004 running in uber mode : false
21/07/16 20:43:38 INFO mapreduce.Job: map 0% reduce 0%
21/07/16 20:43:59 INFO mapreduce.Job: map 100% reduce 0%
21/07/16 20:44:11 INFO mapreduce.Job: map 100% reduce 100%
21/07/16 20:44:12 INFO mapreduce.Job: Job job_1626432687117_0004 completed successfully
21/07/16 20:44:12 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=92
FILE: Number of bytes written=629259
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=292
HDFS: Number of bytes written=50
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=36721
Total time spent by all reduces in occupied slots (ms)=8984
Total time spent by all map tasks (ms)=36721
Total time spent by all reduce tasks (ms)=8984
Total vcore-milliseconds taken by all map tasks=36721
Total vcore-milliseconds taken by all reduce tasks=8984
Total megabyte-milliseconds taken by all map tasks=37602304
Total megabyte-milliseconds taken by all reduce tasks=9199616
Map-Reduce Framework
Map input records=3
Map output records=8
Map output bytes=82
Map output materialized bytes=98
Input split bytes=234
Combine input records=8
Combine output records=7
Reduce input groups=6
Reduce shuffle bytes=98
Reduce input records=7
Reduce output records=6
Spilled Records=14
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=790
CPU time spent (ms)=4120
Physical memory (bytes) snapshot=598175744
Virtual memory (bytes) snapshot=6201077760
Total committed heap usage (bytes)=301146112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
com.test.WordCount2$TokenizerMapper$CountersEnum
INPUT_WORDS=8
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=50
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output1/part-r-00000
21/07/16 20:49:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Bye 1
Goodbye 1
Hadoop 1
Hello 2
World 2
hadoop 1
2) 대/ 소문자를 모두 소문자로 변환 후 출력하기 = false
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar -Dwordcount.case.sensitive=false /user/joe/wordcount/input /user/joe/wordcount/output2 -skip /user/joe/wordcount/pattern.txt
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output2/part-r-00000
실행화면
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop jar wc.jar -Dwordcount.case.sensitive=false /user/joe/wordcount/input /user/joe/wordcount/output2 -skip /user/joe/wordcount/pattern.txt
21/07/16 20:51:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/07/16 20:51:21 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.56.116:8040
21/07/16 20:51:25 INFO input.FileInputFormat: Total input files to process : 2
21/07/16 20:51:25 INFO mapreduce.JobSubmitter: number of splits:2
21/07/16 20:51:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1626432687117_0005
21/07/16 20:51:26 INFO conf.Configuration: resource-types.xml not found
21/07/16 20:51:26 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
21/07/16 20:51:26 INFO resource.ResourceUtils: Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE
21/07/16 20:51:26 INFO resource.ResourceUtils: Adding resource type - name = vcores, units = , type = COUNTABLE
21/07/16 20:51:26 INFO impl.YarnClientImpl: Submitted application application_1626432687117_0005
21/07/16 20:51:26 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1626432687117_0005/
21/07/16 20:51:26 INFO mapreduce.Job: Running job: job_1626432687117_0005
21/07/16 20:51:40 INFO mapreduce.Job: Job job_1626432687117_0005 running in uber mode : false
21/07/16 20:51:40 INFO mapreduce.Job: map 0% reduce 0%
21/07/16 20:51:56 INFO mapreduce.Job: map 100% reduce 0%
21/07/16 20:52:07 INFO mapreduce.Job: map 100% reduce 100%
21/07/16 20:52:07 INFO mapreduce.Job: Job job_1626432687117_0005 completed successfully
21/07/16 20:52:07 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=79
FILE: Number of bytes written=629236
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=292
HDFS: Number of bytes written=41
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=25850
Total time spent by all reduces in occupied slots (ms)=8221
Total time spent by all map tasks (ms)=25850
Total time spent by all reduce tasks (ms)=8221
Total vcore-milliseconds taken by all map tasks=25850
Total vcore-milliseconds taken by all reduce tasks=8221
Total megabyte-milliseconds taken by all map tasks=26470400
Total megabyte-milliseconds taken by all reduce tasks=8418304
Map-Reduce Framework
Map input records=3
Map output records=8
Map output bytes=82
Map output materialized bytes=85
Input split bytes=234
Combine input records=8
Combine output records=6
Reduce input groups=5
Reduce shuffle bytes=85
Reduce input records=6
Reduce output records=5
Spilled Records=12
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=633
CPU time spent (ms)=2500
Physical memory (bytes) snapshot=571346944
Virtual memory (bytes) snapshot=6201217024
Total committed heap usage (bytes)=301146112
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
com.test.WordCount2$TokenizerMapper$CountersEnum
INPUT_WORDS=8
File Input Format Counters
Bytes Read=58
File Output Format Counters
Bytes Written=41
[hadoop@hadoop01 test]$ $HADOOP_HOME/bin/hadoop fs -cat /user/joe/wordcount/output2/part-r-00000
21/07/16 20:52:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
bye 1
goodbye 1
hadoop 2
hello 2
world 2
HDFS - Browse Directoy 확인
'|Playdata_study > HADOOP' 카테고리의 다른 글
210715_HADOOP(HDFS3 연결최종) (0) | 2021.07.15 |
---|---|
210714_HADOOP (MapReduce 2) (0) | 2021.07.14 |
210713_HADOOP (HDFS Format) (0) | 2021.07.14 |
210712_HADOOP (환경설정 및 WordCount2 예제1) (0) | 2021.07.12 |
210709_HADOOP(설치 + Java설치) (0) | 2021.07.09 |
댓글