Making Hadoop Run Faster
One of the challenges in processing data is that the speed at which we can input data is quite often much faster than the speed at which we can process it. This problem becomes even more pronounced in the context of Big Data, where the volume of data keeps on growing, along with a corresponding need for more insights, and thus the need for more complex processing also increases.
Batch Processing to the Rescue
Hadoop was designed to deal with this challenge in the following ways:
1. Use a distributed file system: This enables us to spread the load and grow our system as needed.
2. Optimize for write speed: To enable fast writes the Hadoop architecture was designed so that writes are first logged, and then processed. This enables fairly fast write speeds.
3. Use batch processing (Map/Reduce) to balance the speed for the data feeds with the processing speed.
Batch Processing Challenges