hadoop questions

2020-07-25 10:43发布

问题:

I want to verify the answer for following sample questions

Question 1

You use the hadoop fs -put command to add sales.txt to HDFS. This file is small enough that it fits into a single block, which is replicated to three nodes within your cluster. When and how will the cluster handle replication following the failure of one of these nodes?

A. The cluster will make no attempt to re-replicate this block.
B. This block will be immediately re-replicated and all other HDFS operations on the cluster will halt while this is in progress.
C. The block will remain under-replicated until the administrator manually deletes and recreates the file.
D. The file will be re-replicated automatically after the NameNode determines it is under-replicated based on the block reports it receives from the DataNodes.

I believe the answer is D

Question 2

You need to write code to perform a complex calculation that takes several steps. You have decided to chain these jobs together and develop a custom composite class for the key that stores the results of intermediate calculations. Which interface must this key implement?

A. Writable
B. Transferable
C. CompositeSortable
D. WritableComparable

I believe the answer is D

Question 3

You are developing an application that uses a year for the key. Which Hadoop-supplied data type would be most appropriate for a key that represents a year?

A. Text
B. IntWritable
C. NullWritable
D. BytesWritable
E. None of these would be appropriate. You would need to implement a custom key.

I believe the answer is B

回答1:

1 - Correct, You can find this in any literature that describes the fault tolerance of HDFS. There is a section in Chapter 3 of Hadoop: The Definitive Guide that describes the process of a client writing data to HDFS and they describe how this process of failure works play by play.

2 - Correct, Keys Must be Writable Comparable so that they don't fall into other categories that may include arrays and other non-comparable types.

3 - Correct, a year is a number value so out of all of these the most appriprate option would be intwritable.



回答2:

For Q 1 & 2 answer D is correct, but for Q 3 I think it is D, I might be wrong.



回答3:

The Q3 can be so tricky but most likely B.

The best one to use is ShortWritable since it takes 2 bytes which number ranges from -32000 ~ +32000. So it is possible to use BytesWritable and only assign 2 bytes. The IntWritable takes 4 bytes which ranges from -2 billion to +2 billion, which is just waste of extra 2 bytes.

Even if you use Text, it's same byte size (4 bytes since it has 4 character) as IntWritable. But, if you're doing any Int related operation with the Key then I assume IntWritable is better.

The only reason why I think the answer is B is that many java developer always use Int regardless of the number range and rarely ever use Short instead. So this question is "depends". If I'm not doing any integer related operation on a key then I'll use Bytes, otherwise I'll use IntWritable. If I must pick an answer then B.



标签: hadoop