Optimizing the Hadoop Infrastructure that Runs Your Big Data Initiatives

By John Seifert III, Product Manager, Capacity Optimization, BMC

Hadoop, the technical foundation for big data analytics, is now 10 years old and quickly becoming mainstream. Investments in Hadoop infrastructure continue to increase at a CAGR of 21.7% (IDC), and software and services for Hadoop continue to soar at a similar rate.
 
The investment in developing the technology layer for big data is split among hardware, software and services. And of course budgets are finite. By planning and managing the utilization of your Hadoop infrastructure, you can affect changes in all three of these investment areas.
 
The cost of infrastructure
 
There has been a historical trend of overspending on infrastructure as new technologies have been introduced. Servers were cheap compared to the mainframe, so organizations did not feel the need to manage server utilization. And then we repeated that same behavior with virtualization. What starts off as a small, seemingly controllable investment quickly gets out of control and turns into “sprawl.” Hadoop infrastructure is no different. Hadoop resources are seen as relatively cheap, and infrastructure purchases have gone unchecked in most organizations.
 
There is also the choice of on-premises infrastructure or public cloud infrastructure as a service (IaaS). Both have a place in the data center and in the Hadoop ecosystem – and both must be planned and managed. Gartner has estimated that the average utilization of a physical server is 12%. Servers are a capital expense – once purchased, there is no immediate cost incurred for inefficiency. However with public cloud services, if you have the same inefficiencies, you are paying for that unused infrastructure – CPU, memory, storage, network and more – as an operating expense and it becomes part of your run rate.
 
Making data-driven decisions
 
Optimizing infrastructure cost and usage – on premises physical and virtual or public cloud – requires a capacity management practice. Such a practice should include an analytics solution that provides a holistic view of your Hadoop infrastructure.
 
A good capacity management practice serves all stakeholders – IT and the business. It requires strong relationships with IT teams and the ability to collect a range of data about workloads, users, network, CPU, memory, storage and more – all needed for proper ongoing analysis.
 
With a good capacity management analytic solution, you can automate the routine data collection, analysis and modeling that provides saturation rates, forecasts upcoming needs, reports costs and delivers the views or reports that stakeholders want and should expect.
 
Big investments for big data
 
According to the Wall Street Journal, IT executives plan to focus on big data even when many say their budgets are constrained. A good capacity management practice and automated analytics solution can extend those budget dollars by decreasing infrastructure costs on average by 30%, leaving more of your big data budget available for services and software or future growth.
 
With the average annual spend for big data initiatives at $7.4 million and $13.8 million for large enterprises, investing in a capacity management practice can provide big payback.
 
To learn more about capacity management, go to www.bmc.com/hadoopcapacity or contact John at John_Seifert@bmc.com.
 
BMC is a global leader in innovative software solutions that enable businesses to transform into digital enterprises for the ultimate competitive advantage.  Visit us today at http://www.bmc.com/it-solutions/big-data.html

Advertisement

Follow Us:

Sitemap | Privacy | Copyright © © 2017, WSTA®, All Rights Reserved.