Author Archives: Sushil Pramanick

Hortonworks Certifies Spark On YARN, Had

Hortonworks Certifies Spark On YARN, Hadoop


Big Data defini…

Big Data definition by The Big Data Institute (TBDI)

Big Data is a term applied to voluminous data objects that are variety in nature – structured, unstructured or a semi-structured, including sources internal or external to an organization, and generated at a high degree of velocity with an uncertainty pattern, that does not fit neatly into traditional, structured, relational data stores and requires strong sophisticated information ecosystem with high performance computing platform and analytical capabilities to capture, process, transform, discover and derive business insights and value within a reasonable elapsed time.

<a href="

” title=”TBDI – Premier source for Big Data and Advanced Analytics”>TBDI – Premier source for Big Data and Advanced Analytics

2012 in review

The stats helper monkeys prepared a 2012 annual report for this blog.

Here’s an excerpt:

600 people reached the top of Mt. Everest in 2012. This blog got about 5,400 views in 2012. If every person who reached the top of Mt. Everest viewed this blog, it would have taken 9 years to get that many views.

Click here to see the complete report.

Each new day is…

Each new day is a blank page in the diary of your life. The secret of success is in turning that diary into the best story you possibly can. This New Year, may you be blessed with hope for better tomorrow, love to fill up your heart, warmth in your hearth, and happy smiles of your family. I wish you Happy New Year and diary full of best stories ever written in your life.

Big Data Implementation Best Practices

Big Data is relatively still new with many organizations and its significance in business processes and outcome has been changing every day. Encore Software Services has mastered the art of implementing Big Data and Analytical solution. Here are some of the key best practices that implementation team needs to increase the chances of success.

1. “Implementing big data is a business decision not IT”. This is a wonderful quote that wraps up the most important best practice for implementing Big Data. Analytics solutions are most successful when approached from business perspective and not from IT/Engineering end. IT needs to get away from model of ‘Build it and they will come’ to ‘Custom Order solutions to business needs’.


2. Gather business requirements before gathering data. Begin big data implementations by first gathering, analyzing and understanding the business requirement. Understanding a business’ data requirements is the first and essential step in the big data analytics process. Align big data with specific business goals


3. Use Agile and Iterative Approach to Implementation: Typically Big data projects start with a specific use-case and specific large data set. Over the course of implementations, we have observed that organization needs evolve as they understand the data once they touch and feel and start harnessing its potential value. Use agile and iterative implementation techniques that deliver quick solutions based on current needs instead of a big bang application development. When it comes to the practicalities of big data analytics, best practice is to start small by identifying specific, high value opportunities, while not losing site of the big picture. We achieve these objectives with our big data framework: Think Big, Act Small.


 4.  Evaluate data requirements: Whether a business is ready for big data analytics or not, carrying out a full evaluation of data coming into a business and how it can best be used to the business’s advantage is advised. This process usually requires input from your business stakeholders. Together we analyze what data needs to be retained, managed and made accessible, and what data can be discarded.


5. Ease skills shortage with standards and governance: Since big data has so much potential, there’s a growing shortage of professionals who can manage and mine information. Short of offering huge signing bonuses, the best way to overcome potential skills issues is standardizing big data efforts within an IT governance program.


6.  Optimize knowledge transfer with a center of excellence: Establishing a Center of Excellence (CoE) to share solution knowledge, plan artifacts and ensure oversight for projects can help minimize mistakes. Whether Big Data is a new or expanding investment, the soft and hard costs can be shared across the enterprise. Another benefit from the CoE approach is that it will continue to drive the Big Data and overall information architecture maturity in a more structured and systematical way.


7.  Embrace and plan your sandbox for prototype and performance:. Allow data scientists to construct their data experiments and prototypes using their preferred languages and programming environments. Then, after proof of concept, systematically reprogram and/or reconfigure these implementations with an “IT turn-over team.” Sometimes, it may be difficult to even know what you are looking for. “Management and IT needs to support this ‘lack of direction’ or ‘lack of clear requirement.’


8.  Align with the cloud operating model: “Analytical sandboxes should be created on-demand and resource management needs to have a control of the entire data flow, from pre-processing, integration, in-database summarization, post-processing, and analytical modeling. A well planned private and public cloud provisioning and security strategy plays an integral role in supporting these changing requirements”. The advantage of a public cloud is that it can be provisioned and scaled up instantly. Examples include Amazon EMR and Google BigQuery. In those cases where the sensitivity of the data allows quick in-and-out prototyping, this can be very effective.


9.  Associate Big data with Enterprise data: To unleash the value of Big Data, it needs to be associated with enterprise application data. Enterprises should establish new capabilities constantly and leverage their prior investments in infrastructure, platform, Business Intelligence and Data Warehouse, rather than throwing them away. Investing in integration capabilities can enable the knowledge workers to correlate different types and sources of data, to make associations, and to make meaningful discoveries.


10.  Embed Analytics and decision making using intelligence into Operational workflow/routine. For analytics to be a competitive advantage, organizations need to make ‘analytics’ the way it does business – a ‘corporate culture’. Nowadays the competitive advantage of data driven organizations is no longer just a good ally, but a must have and a must do. The range of analytical capabilities emerging with big data and the fact that businesses can be modeled and forecasting is becoming a common practice Analytics need not be left to silos of team but made as part of day to day operational function of front-end staff.

 EAM Model


  4. Newly Emerging Best Practices for Big Data – By the Kimball Group

Big Data Age – perpetrator for more privacy intrusion?

There have been hundreds of seminars, millions of tweets, few blogs, tens of LinkedIn group about Big Data topic. The topics cover from use cases to implementation details and product features. Large multinational vendors like IBM, Oracle, EMC, HP Vertica and many others specialized like Cloudera, Hortonworks, etc. are betting on the race to get attention from customers and prospects. There are new products and releases at every major Summit and Conference. With Big Data getting the media attention from customers to boardrooms and investors on Wall Street, it is predicted that by 2015, most of the mid-size to large cap companies in specific sectors like Utilities, Communications, Retail, Banking, Healthcare, Hi-Tech and Aviation will launch Big Data projects. As vendors are leading the race, companies are willing to hear them and bet big dollars in implementing Big Data programs. With the evolving technology, big data has already made executives rethink their marketing and operation strategy.

Big data has made a puncture in corporate strategy for many companies. As a result, most CMO, COO and C-suite are engaging strategy consulting firms to guide them through this change. Companies are gathering more data than ever and amusing part is many don’t know if they need that data or not. In my recent Big Data assessment workshop with one of my clients dealing with big data, the executive from customer had specifically asked if they should be storing everything that they had been for past year. Surprisingly, and unknowingly consumers are being tracked every event and every interaction. Facebook is one of several companies that secretly track you at millions of websites. Recently I installed DNT (Do Not Track) ad-on on Firefox and realized CNN has around 11 tracking cookies.

Facebook and many third party companies are tracking every click that you make on the web not only on their site but also on other commercial and social sites. This is stored mostly as anonymous data but these cookies can capture significant information about individual from life style, age, income level, interests and likes, profession or occupation, buying patterns.

Now imagine your Call Records at work and personal. Companies are now recording CDR (Call Detail Record) to analyze the pattern including speech recognition for better customer service, competitive pricing and next best offers. It can also use to analyze training issues and performance. Your personal mobile device transmits location information back to carriers, service providers and vendors that can track your location and duration. New vehicles are being researched and will be launched with advanced telematics data that can provide you with real-time coupons and shopping deals based on your preference and social profile. Intelligence will be built over period of time that would use artificial intelligence to provide you with personalized offers based on previous buying habits and patterns.

Recently, A Wall Street Journal investigation published in Saturday’s newspaper examines a fast-spreading technology that police and private companies alike are using to build databases of Americans’ movements by car. The Journal has obtained a sizeable sample of one of these databases – collected by the sheriff of Riverside County, Calif. — through public-records requests there. Riverside is one of California’s largest counties, and the Journal’s two-year sample contains data on about 2 million unique license plates. Riverside County has about 1.6 million registered vehicles.

With sophistication in technology now and more to come, every second of human life can now be tracked, captured and analyzed. New businesses models are evolving and coming up as a service. Encore is building and exploring Big Data As A Service (BDAAS) that would capture all Social Media data – FB, Twitter, Foursquare, Tumble, Google+, LinkedIn, YouTube, Tumblr, WordPress, Instagram, Blogger, Flickr, Klout…and provide the social profile for individuals to organizations ready to buy that data for targeting audience. So someone will be making profits from selling your data to someone else. Now mash that data with credit scores, criminal records, driving history, internal emails, calls, notes that organizations capture at work place, this will make 3600 profile of a customer. Privacy is out the window and next time you think about using web or gadget, think about someone tracking your every move. Welcome to Big Data age!





About Encore Software Services:

Encore Software Services is a leading solution provider and systems integrator in Business Intelligence and Analytics domain. Our Analytics and Information Management services help customers accelerate business decisions by offering tailor-made comprehensive analytical solutions and frameworks. They are designed to enhance business performance by leveraging all forms of informational assets. Our solutions are customer specific to provide that added edge that our clients require for their business scenarios.

Encore provides an extensive suite of Analytics services that enable customers to mine this data and make analytical assessments. These services enable improvement in many areas such as customer support services, fraud detection in financial transactions, predicting success of new product offerings, context aware placement of Advertisements etc. Using industry standard tool sets combined with proprietary algorithms these analytic services use Big Data and Enterprise clusters to provide efficient processing of large data sets. You can visit us @ or reach us at

Mind the Big Data Analytics Gap!

Interestingly, different forms of ‘data’, ‘reporting, ‘intelligence’, ‘analytics’ and now ‘big data’ buzzwords have continued to hook industry attention for past few decades. Business Intelligence is high on the radar at most companies. Gartner says that businesses rank BI as their highest priority in 2012, and IDC is forecasting that the BI market will reach $39.9 billion in 2012. Gartner analysts said the data warehouse is set to remain a key component of the IT infrastructure and believe that, as the demand for business intelligence (BI) and the wider category of business analytics increases, optimization, flexible designs and alternative strategies will become more important. With decades of experience, one could imagine that vendors and system integrators would have mastered the art of implementing information management solution to generate positive ROI. However, reality and experience from industry leaders and executives has been most often otherwise. The Data Warehouse Network will tell you that 70% of data warehouses are failures based on survey. Few years back, Gartner had predicted that number to be 50%. One of the leading BI research firm BeyeNetwork states the high failure rate of BI projects and programs has been well documented throughout the years.  The reasons why they fail has also been well documented, which leads to the obvious question of “why do they continue to fail at such a high rate?”

If we look at the current trend, Big Data and Analytics is in infancy stage of its adoption life cycle with most industry verticals with some leading than others. Generally speaking Finance, Telco and Retail have been paving the way with Healthcare, Utilities and Logistics following the trail. Many citizens around the world regard this collection of information with deep suspicion, seeing the data flood as nothing more than an intrusion of their privacy. But there is strong evidence that big data can play a significant economic role to the benefit not only of private commerce but also of national economies and their citizens. As per McKinsey, there are many ways that big data can be used to create value across sectors of the global economy. Indeed, research at McKinsey suggests that we are on the cusp of a tremendous wave of innovation, productivity, and growth, as well as new modes of competition and value capture—all driven by big data as consumers, companies, and economic sectors exploit its potential. But why should this be the case now? Haven’t data always been part of the impact of information and communication technology? Yes, but research suggests that the scale and scope of changes that big data are bringing about are at an inflection point, set to expand greatly, as a series of technology trends accelerate and converge.

Most of the IT players from product vendors, OEMs to system integrators have rushed to create Big Data solution that they can market to customers. The focus from most vendors, if not all, has been in getting the data from RFID, Machine logs, social media, gadgets, sensors, etc. into large data repositories. The ‘gap’ that is  widening with time is companies already have TB of data and now adding PB of bigdata but how do they mine these data, what statistical techniques should they apply and how intelligence can be made available easily to Line Managers, Middle Management, Executives like CMO, CFO, CIO, COO and CEO to make business decisions. Didn’t data warehouses made similar promises decade back. James Kobelius, a senior BI analyst with Forrester Research, notes that vendors like to push what he calls “the lottery value of BI” — that a single great decision can transform a company. But in practice, he says, most decisions supported by BI are routine and operational — and unlikely to provide great incremental value. Most of the companies are still hesitant and in a listening mode to understand the big data application and implementation details that would generate them ROI and make them successful. There obviously is a value with every new data point you can provide to run a business but the benefits need to outweigh the pain and cost of implementation. And this value can be generated by Data Scientists and Data Analysts that are skilled to mine these data and apply statistical techniques.

What’s often lost in the big data discussion is that data and analysis tools generate no value until they lead companies or individuals to make different (and better) decisions than they would have made otherwise. While we do occasionally read about big data success stories (for example, Target’s marketing to newly pregnant women, most companies still aren’t clear on how they actually could use big data to impact their business. For data to become valuable, companies need to have a direct path from that data and analysis to tangible action. Big data is not just yet another IT implementation and push towards latest cutting edge technology. Business people need to partner equally with IT in harnessing the value in data. Right questions needs to be asked at the right time, at the right place and to the right individual. Data Scientist is the key to unlock the potential. If right skills are not staffed and supported by companies, it will be another data integration implementation project where millions of dollars would be spent in building and maintaining it.



About Encore Software Services:

Encore Software Services is a leading solution provider and systems integrator in Business Intelligence and Analytics domain. Our Analytics and Information Management services help customers accelerate business decisions by offering tailor-made comprehensive analytical solutions and frameworks. They are designed to enhance business performance by leveraging all forms of informational assets. Our solutions are customer specific to provide that added edge that our clients require for their business scenarios.

Encore provides an extensive suite of Analytics services that enable customers to mine this data and make analytical assessments. These services enable improvement in many areas such as customer support services, fraud detection in financial transactions, predicting success of new product offerings, context aware placement of Advertisements etc. Using industry standard tool sets combined with proprietary algorithms these analytic services use Big Data and Enterprise clusters to provide efficient processing of large data sets. You can visit us @ or reach us at

Follow my new Blog on Business Intelligence & Analytics

Follow my new blog on Business Intelligence & Analytics topics @ This new blog will focus on BI & Analytics Techniques, Standards, Best Practices, Algorithms, Metrics and KPIs. There are ‘000s of blogs and forum on BI & Analytics topic hence my plan is to make this blog geared more towards business application of BI and Analytics instead of sheer sharing updates on product vendors, tools and technology. I hope you all will like it the same way you have enjoyed reading the Big Data Blog. If you have specific topic of interest, drop me a note or leave a comment.

Happy reading!


Big Data – What, Where and How do I start…

“What, Where and How do I start?” ….. is the question most often asked my many trying to play catch-up with information technology industry trends and buzzwords. There is numerous conference, seminars, webinars and forums on the topic of Big Data and Cloud Computing and seems overused word in day to day. There is still some amount of ambiguity about what comprises of Big Data – Is it just the sheer volume or is it mix of volume, variety, velocity regardless the size of data or is the voluminous unstructured data coming from social media and machine logs. The definition has evolved from 3 Vs to 5Vs – volume, velocity, variety, verification and value. While I believe first 3 (volume, variety and velocity) are attributes and character of data while the last 2 (verification and value) is part of the process and outcome.

I have been asked by few, whether Big data = Unstructured data?. I believe the simple test to define Big Data are the basic 3 criteria defined by 3V’s in the original definition. Organizations in Retail, Financial, Healthcare and Hi-Tech (Ebay/Credit Card…) that deal with massive amount of structured data coming from variety of sources already deal with velocity, volume and in some aspect variety due to format of incoming data from various sources. For example, VISA Data Warehouse system built on IBM DB2 9 has 400 terabytes of the primary data. and close to 2,000 tables, thousands of users and has very complex processing. In one of my previous blogs, I had mentioned that Big data is complementary to Enterprise Data Warehouse (EDW) and is not a replacement of EDW. Processing the information that’s now available as BigData adds a huge value to interpretation of data and brings in new insight that was not tapped previously. Information Management is a journey – EDW being the first Union Station and Big Data being the next Grand Junction in this journey and more to come in next few decade. Artificial Intelligence is still in it’s infancy in day to day business operations and it will use EDW and Big Data as foundation before it matures and is embedded in business application in the main stream.

Organizations are now accumulating terabytes and petabytes of data coming from various devices – machine, mobile, user, weblogs and cookies, social media, etc. but the challenge is not in storing this information but able to find usage of this data to bring in competitive advantage. Organizations are rushing to store this wealth of information fearing missed opportunities. This takes us back to topic of this blog – What, Where and How do I start? I believe we have addressed ‘What’ part of the question or challenge.

Let’s tackle the ‘Where’ part of the question now. In my previous blog – 5 part use case series, I have addressed the ‘where’ business cases wherein organization can start Big data initiatives and determine initiatives based on ROI, Capital Investment and Competitive advantage. Now the technical ‘where’ can be answered here. Organizations can now build a big data platform using Cloudera or IBM or leveraging advancements from the open source community, such as Apache Hadoop, and technology vendors, including cloud computing providers. Commodity hardware components and new techniques for assembling and analyzing large data sets make it possible companies that have hesitated before to experiment. In my lab, it took less than 2 business days to stand up cloud based infrastructure using Amazon EC2, RightScale, IBM BigInsights and Hadoop. There are many choices available. Now, oOrganization can hit the ground running with POC with very little investment – time and effort. Thanks to the Cloud offerings – PAAS, IAAS and SAAS!

Lastly, the ‘How’ part of the topic. While part of the How is addressed in above paragraph through technology, we will attempt to deep dive into this topic more with process and methodology. As mentioned earlier, organizations are rushing into POC of Big Data and storing of all possible data coming in from variety of sources fearing missed opportunities or possible ignorance of intelligence that may be tapped from the data. The key to winning the race to competitive advantage is not by storing all and most of the data but by deriving value and insight from it to be able to tie it with business plan that can drive business outcome, ROI and profitability. Here are the high level steps that I recommend you need to begin with in your Big Data journey:

1. Identify business use-case tied with business outcome and metrics, Big Data Roadmap

2. Identify Big data champion – Business and Technical (IT)

3. Select Infrastructure, Tools and Architecture for Big Data POC / Implementation

4. Staff the project with big data skills or partner with strategic big data implementation partner

5. Run project/POC in sprints or short projects with tangible and measurable outcomes

6. Build upon small successes and integrate with EDW/Applications including webportals

In my next blog and upcoming white paper, I will discuss the Reference Architecture and Framework for Big data implementation getting into the nuts and bolt of the engine. This will guide you through the process of implementing a scalable and flexible architecture. Stay tuned and thanks for following my blog.

About Author:


Sushil Pramanick is a BI industry thought leader and a Big data champion. You can also reach him at 949 391 8520 and follow him on his twitter @Pramanicks. His LinkedIn profile is @ 

Currently, Sushil serves as a Vice President – Analytics and Information Management (AIM Practice) with Encore Software Services. To know more about Encore’s Big data offerings and capabilities, visit us at or email at Encore Software is a leader in Big Data implementations and consulting services.

%d bloggers like this: