Data

What’s the Hullabaloo Over Hadoop?


HadoopAt Cloudera, we’re committed to helping the government take advantage of all of their data. We want to help them make sense of Big Data and make it work for them. Ninety percent of the world’s data was created in the last two years, yet agencies are working with processes, procedures, and governance that are (sometimes) decades old. The main barrier we come up against in selling to the government is not a competitor, but the inertia of doing nothing. With all this said, agencies have to be ready for the change that will accelerate with the emergence of the “Internet of Things.”

Apache Hadoop™ ­– with its rich ecosystem and diverse applications and as the core of data management architectures like the enterprise data hub (EDH) – is a great solution within the government. My colleague, Annette Baldenegro, recently gave a talk on the Motivation for Hadoop at the Tidewater Big Data Event. This talk gave a great overview of how Hadoop works and what it is enabling today.

As she displayed, Hadoop, at the core of an EDH, provides a different approach to data. At first, agencies and industry just built bigger machines to deal with growing data. Yet, this approach quickly ran into limitations, so data became distributed.  These next systems could scale, but still faced the issues of bandwidth and other resources constraints that hampered just how much they could do, because the functions of storage and compute, while powerful, were separate elements that needed to communicate over the network. And at scale, the network is the scarce resource.

Hadoop flips this data management equation and distributes not only the data, but also the processing.  In effect, both storage and computing are fully distributed across of the network, and agency workloads can take advantage of the scale and parallel processing of both, easily and economically. As agencies work within new and tighter budgets, Hadoop is an excellent solution for short- and long-term data because it provides for inexpensive storage, using industry standard servers, yet does not sacrifice performance and computing.

With Hadoop, data is immediately accessible to a wide range of workloads for the business and the mission, such as data exploration and collaboration, and IT teams can optimize their entire data management portfolio by examining how ill-suited processes and data on existing systems might be a good fit for Hadoop. It is a great tool for maximizing infrastructure skills and data return-on-investment across the board because it not only breaks down the barriers posed by traditional approaches to storage and analytics – such as loads of valuable data and insight locked up into a particular application or view – but also connects to the many tools and applications already in place today within the agency, like SQL-based BI tools and search applications. This is timely since agencies need to visualize across more types and volumes of data in more cost-effective ways while continuing to tap into the wealth of knowledge grounded in existing tools.

Once you are running a Hadoop-based EDH, you can really start talking about workloads like data fusion – the integration of all structured and unstructured data combined with social media data, like Twitter feeds and satellite imagery data. Hadoop brings the data into one repository, an enterprise data hub, and makes it accessible in an accurate, consistent, yet flexible and dynamic manner. It is a 360-degree view of data and analytics. An example is the assistance to our warfighters by infusing vast amounts of data about weather conditions, enemy troop movements, terrain, and social media feeds to quickly determine how to direct our troops using this critical information.

For more on Cloudera and Hadoop, check out this Q&A in Executive Biz.

Related Articles