“What, Where and How do I start?” ….. is the question most often asked my many trying to play catch-up with information technology industry trends and buzzwords. There is numerous conference, seminars, webinars and forums on the topic of Big Data and Cloud Computing and seems overused word in day to day. There is still some amount of ambiguity about what comprises of Big Data – Is it just the sheer volume or is it mix of volume, variety, velocity regardless the size of data or is the voluminous unstructured data coming from social media and machine logs. The definition has evolved from 3 Vs to 5Vs – volume, velocity, variety, verification and value. While I believe first 3 (volume, variety and velocity) are attributes and character of data while the last 2 (verification and value) is part of the process and outcome.
I have been asked by few, whether Big data = Unstructured data?. I believe the simple test to define Big Data are the basic 3 criteria defined by 3V’s in the original definition. Organizations in Retail, Financial, Healthcare and Hi-Tech (Ebay/Credit Card…) that deal with massive amount of structured data coming from variety of sources already deal with velocity, volume and in some aspect variety due to format of incoming data from various sources. For example, VISA Data Warehouse system built on IBM DB2 9 has 400 terabytes of the primary data. and close to 2,000 tables, thousands of users and has very complex processing. In one of my previous blogs, I had mentioned that Big data is complementary to Enterprise Data Warehouse (EDW) and is not a replacement of EDW. Processing the information that’s now available as BigData adds a huge value to interpretation of data and brings in new insight that was not tapped previously. Information Management is a journey – EDW being the first Union Station and Big Data being the next Grand Junction in this journey and more to come in next few decade. Artificial Intelligence is still in it’s infancy in day to day business operations and it will use EDW and Big Data as foundation before it matures and is embedded in business application in the main stream.
Organizations are now accumulating terabytes and petabytes of data coming from various devices – machine, mobile, user, weblogs and cookies, social media, etc. but the challenge is not in storing this information but able to find usage of this data to bring in competitive advantage. Organizations are rushing to store this wealth of information fearing missed opportunities. This takes us back to topic of this blog – What, Where and How do I start? I believe we have addressed ‘What’ part of the question or challenge.
Let’s tackle the ‘Where’ part of the question now. In my previous blog – 5 part use case series, I have addressed the ‘where’ business cases wherein organization can start Big data initiatives and determine initiatives based on ROI, Capital Investment and Competitive advantage. Now the technical ‘where’ can be answered here. Organizations can now build a big data platform using Cloudera or IBM or leveraging advancements from the open source community, such as Apache Hadoop, and technology vendors, including cloud computing providers. Commodity hardware components and new techniques for assembling and analyzing large data sets make it possible companies that have hesitated before to experiment. In my lab, it took less than 2 business days to stand up cloud based infrastructure using Amazon EC2, RightScale, IBM BigInsights and Hadoop. There are many choices available. Now, oOrganization can hit the ground running with POC with very little investment – time and effort. Thanks to the Cloud offerings – PAAS, IAAS and SAAS!
Lastly, the ‘How’ part of the topic. While part of the How is addressed in above paragraph through technology, we will attempt to deep dive into this topic more with process and methodology. As mentioned earlier, organizations are rushing into POC of Big Data and storing of all possible data coming in from variety of sources fearing missed opportunities or possible ignorance of intelligence that may be tapped from the data. The key to winning the race to competitive advantage is not by storing all and most of the data but by deriving value and insight from it to be able to tie it with business plan that can drive business outcome, ROI and profitability. Here are the high level steps that I recommend you need to begin with in your Big Data journey:
1. Identify business use-case tied with business outcome and metrics, Big Data Roadmap
2. Identify Big data champion – Business and Technical (IT)
3. Select Infrastructure, Tools and Architecture for Big Data POC / Implementation
4. Staff the project with big data skills or partner with strategic big data implementation partner
5. Run project/POC in sprints or short projects with tangible and measurable outcomes
6. Build upon small successes and integrate with EDW/Applications including webportals
In my next blog and upcoming white paper, I will discuss the Reference Architecture and Framework for Big data implementation getting into the nuts and bolt of the engine. This will guide you through the process of implementing a scalable and flexible architecture. Stay tuned and thanks for following my blog.
Sushil Pramanick is a BI industry thought leader and a Big data champion. You can also reach him at 949 391 8520 and follow him on his twitter @Pramanicks. His LinkedIn profile is @ http://www.linkedin.com/in/pramanicks.
Currently, Sushil serves as a Vice President – Analytics and Information Management (AIM Practice) with Encore Software Services. To know more about Encore’s Big data offerings and capabilities, visit us at http://www.EncoreSS.com or email at firstname.lastname@example.org. Encore Software is a leader in Big Data implementations and consulting services.