18/06/2019 02:04:23 PM

Phần mềm-Ứng dụng

HADOOP

Hadoop was one of the first popular open source big data technologies. It is a scalable fault-tolerant system

for processing large datasets across a cluster of commodity servers. It provides a simple programming

framework for large-scale data processing using the resources available across a cluster of computers.

Hadoop is inspired by a system invented at Google to create inverted index for its search product. Jeffrey

Dean and Sanjay Ghemawat published papers in 2004 describing the system that they created for Google.

The first one, titled “MapReduce: Simplified Data Processing on Large Clusters” is available at research.

google.com/archive/mapreduce.html. The second one, titled “The Google File System” is available at

research.google.com/archive/gfs.html. Inspired by these papers, Doug Cutting and Mike Cafarella

developed an open source implementation, which later became Hadoop.

Many organizations have replaced expensive proprietary commercial products with Hadoop for

processing large datasets. One reason is cost. Hadoop is open source and runs on a cluster of commodity

hardware. You can scale it easily by adding cheap servers. High availability and fault tolerance are provided

by Hadoop, so you don’t need to buy expensive hardware. Second, it is better suited for certain types of data

processing tasks, such as batch processing and ETL (extract transform load) of large-scale data.

Hadoop is built on a few important ideas. First, it is cheaper to use a cluster of commodity servers for

both storing and processing large amounts of data than using high-end powerful servers. In other words,

Hadoop uses scale-out architecture instead of scale-up architecture.

» Tin mới nhất:

The TMM Structure: The Testing Maturity Levels (17/07/2025)
Process improvement models (17/07/2025)
The general requirements for TMM development (17/06/2025)
The Need for a Testing Maturity Model (17/06/2025)
Procedures for Process Reuse (19/05/2025)

» Các tin khác:

Test Oracle (16/05/2019)
Faults (Defects) (16/05/2019)
Mối liên quan giữa phần MS Access và các môn học khác đối với sinh viên không chuyên tin (18/04/2019)
Phân biệt giữa dữ liệu và thông tin? (18/04/2019)
Goals for the Testers’ Workbench (17/04/2019)
Benefits of a Defect Prevention Program (17/04/2019)
Ứng dụng việc thiết kế REPORT (18/03/2019)
Ứng dụng việc thiết kế FORM (18/03/2019)
GENERATE INPUTS USING EXISTING DB STATES (18/03/2019)
Kiểm thử hồi quy cho các ứng dụng Lustre/SCADE (18/03/2019)

Hôm nay, ngày

19/07/2025

Tuần học:

Sinh viên tiêu biểu

video

Số lượt truy cập: 10152579