Hadoop Big Data Analytic with Hunk
Have you read the news telling that Yahoo uses Hunk to explore, analyze and visualize data from its Hadoop environment? Just click that link, and you will know that Hunk gives Yahoo deep visibility into its massive Hadoop data store, which stores more than 600 petabytes of data (read: 600,000,000 Gigabytes of data).
Apache Hadoop is one of the most famous Big Data product in the world, and Yahoo has one of the largest Hadoop implementation in the world with more than 35,000 datanodes. Basically, Hadoop technology consists of 2 major part, Hadoop Distributed File System (HDFS) as the storage part, and MapReduce as the data processing part. While storing data into HDFS would be simple, analyzing and visualizing it in timely fashion is far from easy.
If you already have data in Hadoop, let’s see how Hunk can make your life easier.
How does Hunk simplify Analytics in Hadoop?
First things first, if we have stored our data in Hadoop HDFS, we need to be able to browse the data and see the content before we can analyze it. What if we stored a very complex table with so many columns which we can’t even remember the names? What if we stored logfiles in RAW format which contains custom fields that we haven’t defined yet?
Browse and Find the data we need
Thanks to Hunk, now we can easily browse to our data in HDFS easily in timely fashion. Unstructured data can also have field value statistics by using Hunk field extractor which also known as Hunk schema on-the-fly, therefore custom fields can be added later on by the time we need them. Hunk allows us to set event sampling rate in order to show instant results, instead of waiting the the entire job to finish, we can also pause to show the early results, refine the query and resume.
Process and Visualize the Data
Hunk has some built-in graphics/charts that can be used to easily visualize almost any aggregated data. This feature can significantly accelerate development cycle to build analytical dashboards/charts.
By default, we visualizes panel with Hunk out of the box charts : Table, Column Chart, Bar Chart, Pie Chart, Line Chart, Area Chart, Scatter Chart, Bubble Chart, Radial Gauge, Filler Gauge, Marker Gauge, Cluster Map, and Choropleth Map. Custom visualization can also be added individually, some of the most popular are : Sunburst Charts (pie chart with grouping), and the famous Sankey Diagram (interconnectivity between category of events).
Save and Share your Dashboard
After spending time with our big data analytics skill, now it’s time to multiply the value by sharing it to others. With Hunk we can build and share interactive dashboard, that allows the non-technical user to play around with clickable filter such as : radio button, combo box, time picker, zoom in/out (for map), and many others. The result can also be exported to PDF format, in case of we need to share it to someone who doesn’t have Hunk access.
Data scientists can also enjoy the power of Hunk dashboard by inspecting the RAW data behind the dashboard panel, thus they can play around to change the query, add/remove data fields, create calculated fields, add some data processing command, and choose their preferable visualization.
Big Data is still new in Indonesia, Hunk would be the perfect tool to accelerate Hadoop adoption, because it is simple, and it has strong support, vibrant community with thousands of questions and answers available in the internet, and also it is part of Splunk ecosystem. That’s why people says that Hunk is the Splunk Analytics for Hadoop.
Hunk itself is an enterprise grade Big Data software that can provide us:
- Browse and Visualize in timely fashion: Hunk can do “event sampling” and pause search job to provide instant result, so that data scientist can immediately review whether their chart will show the data correctly or not. That’s why Hunk has a very fast time to value, by shortening the development time, thus business executives will get the data sooner.
- Schema-on-the-fly : Hunk can add the “unknown fields” later on, thanks to it’s field extractor. Therefore, Hunk is very good in analyzing unstructured data in Hadoop.
- Data Privacy with Role Based Access Control : Hunk can enforce row-based search filter to a role, so that sensitive information will not be displayed to the person who doesn’t need it, thus customer data stays protected, even if that person try to search the sensitive data with custom freetext query.
Watch Hunk Demo here :
Hunk supports these Hadoop distributions:
- Apache Hadoop
- Cloudera Distribution
- Hortonworks Data Platform
- Amazon Elastic MapReduce
- IBM Infosphere BigInsights
- Pivotal HD
By the end of the day, it’s all about business benefits, and business doesn’t really care about database, data warehouse, applications, solutions, technologies, and tools. However, every single business cares about the customers, what’s important to the customers, what’s the trend from time to time, and what’s it needs to do next. That’s why business needs data analytic, unfortunately hadoop project always keeps us busy on preparing the framework, architecture, platform, server, storage, hadoop cluster management, MapReduce, and other nitty gritty things about the big data projects. While Hunk gives us more time to do analysis on the data itself.
Please feel free to contact us should you need any further information.