Hadoop is the new black, no doubt about it. Hadoop ecosystem is taking its place as a standard component in the enterprise architectures side-by-side with ERPs and other standard components. Hadoop enables modern data architecture and ability to create analytical applications on top of enterprises data.
It is no secret: Hadoop is not easy! Good thing is that as it grows more mature all the time, we are seeing more and more automated and out-of-the-box solutions for data transfer, analytics, administration, operations – and deployment.
Deploying and configuring Hadoop cluster can be a very complex task when done and optimized correctly. Gladly there are today several tools and platforms which can make this process significantly easier, reduce risks and components which you have to maintain yourself. I will introduce just a few options, which I have tried out myself.
Azure Marketplace and Hortonworks Data Platform (HDP)
I was very surprised of the Cloudbreak’s capabilities earlier last year already when trying it out, but Azure surprised me even more – even though the idea and the functionalities are not exactly the same. Launching a standardized Hadoop environment for test or pilot purposes has never been this easy – at least for me! It is in theory as easy for production use, but you do need to plan your architecture a bit more to match your use cases even though this is a highly-automated IaaS/PaaS type of cloud environment.
“The Hortonworks and Microsoft relationship has enabled a seamless implementation of Apache Hadoop on Azure.”
What I really needed was just an Azure account and a ssh-rsa key, which I got in 5 minutes. New Azure users even get a 30-days free trial with some funny money budget, which is enough to deploy a 5-node HDP cluster in Azure with 8Tb of disk per node. The deployment process itself is fully automated. Obviously you want to select the subscription, number and size of nodes etc. when doing anything more than evaluating the product and the process.
Psst. There are also other options than Hortonworks HDP available in Azure Marketplace – but my choice is HDP as it is THE Hadoop platform and the only one committed to being 100 % open.
Cloudbreak
Hortonworks acquired SequenceIQ in 2015. This acquisition provides Hortonworks technology which can automate Hadoop cluster deployment process to public or private cloud environments. It has nice features such as policy-based auto-scaling on the major cloud platforms including Microsoft Azure, Amazon Web Services, Google Cloud Platform, and OpenStack, as well as platforms that support Docker containers.
I had chance to see Cloudbreak live in action already in Hadoop Summit June 2015. To be honest it looked too good to be true. It was my turn to try Cloudbreak hands-on in Hortonworks Masterclass in Stockholm later 2015: It is easier than it even sounds! Main requirements are really that you need a Cloudbreak installation, you pick a blueprint, choose a cloud and deploy! Now this baby is a permanent part of our own on-premises solutions. I warmly recommend everyone to try it out!
If you want hear more about HDP Hadoop, Modern Data Architecture and Azure Marketplace you may like these blog-posts:
Mikko Mattila: Hadoop – IKEA of the IT ecosystem: Part 1
Mikko Mattila: Hadoop – IKEA of the IT ecosystem: Part 2
Mikko Mattila: Hadoop – IKEA of the IT ecosystem: Part 3
Mikko Mattila: Hadoop – IKEA of the IT ecosystem: Part 4
