Big Data Challenges To Data Warehouse
Jan 19, 2016For decades, the enterprise data warehouse (EDW) has been the aspirational analytic system for just about every organization. It has taken many forms throughout the enterprise, but all share the same core concepts of integration/consolidation of data from disparate sources, governing that data to provide reliability and trust, and enabling reporting and analytics. A successful EDW implementation can drastically reduce IT staff bottlenecks and resource requirements, while empowering and streamlining data access for both technical and nontechnical users.The last few years, however, have been very disruptive to the data management landscape. What we refer to as the “big data” era has introduced new technologies and techniques that provide alternatives to the traditional EDW approach, and in many cases, exceeding its capabilities. Many claim we are now in a post-EDW era and the concept itself is legacy. We position the EDW as a sound concept, however, one that needs to evolve. Challenges With the Traditional EDWThe EDW implementation itself can be a fairly difficult task with a high risk of failure.
- Big data normally used a distributed file system to load huge data in a distributed way, but data warehouse doesn’t have that kind of concept. From a business point of view, as big data has a lot of data, analytics on that will be very fruitful, and the result will be more meaningful which help to take proper decision for that organization.
- However, one of the reasons big data is so underutilized is because big data and big data technologies also present many challenges. One survey found that 55% of big data projects are never completed. This finding was repeated in a second survey, that found the majority of on-premises big data projects aren’t successful.
Generally accepted survey data puts the failure rate somewhere around 70%. And of the 30% deemed nonfailures, a great number never achieve ROI or successful user acceptance. To a great extent this has been caused by legacy interpretations of EDW design and traditional waterfall SDLC. It’s safe to say more modern, agile techniques for design and implementation prove more successful and offer a higher ROI. These techniques allow EDW implementations to grow organically and be malleable as the underlying data and business requirements change.The fundamental issue is that traditional EDW does not solve all problems. In many organizations, the EDW has been seen as the only solution for all data analytics problems.
Calculating the operational cost for a data warehouse and its big data platform is a complex task that includes initial acquisition costs for infrastructure, plus labor costs for implementing the architecture, plus infrastructure and labor costs for ongoing maintenance, including external help commissioned from consultants and experts.
Data consumers have been conditioned to believe that if they want analytics support for problems, their only choice is to integrate data and business processes into the EDW program. At times, this has been a “cart before the horse” situation when extreme amounts of effort have been put into modeling new use cases into a rigid and governed system before the true requirements and value of the data are known.In other cases, the underlying design and technology of the EDW does not fit the problem. Semi-structured data analysis, real-time streaming analytics, network analysis, search, and discovery are ill-served by the traditional EDW backed by relational database technology.For more articles on the state of big data, your guide to the enterprise and technology issues IT professionals are being asked to cope with in 2016 as business or organizational leadership increasingly defines strategies that leverage the 'big data' phenomenon.Use cases such as these have become more common in the era of big data.
In the “old days,” most data came from rigid, premise-based systems backed by relational database technology. Although these systems still exist, many have moved to the cloud as SaaS models. In addition, many no longer run on relational platforms, and our method of interaction with them is often via API with JSON and XML responses. Additionally, there are new data sources, such as social, sensor and machine data, logs, and even video and audio.
Not only are they producing data at overwhelming rates and with inherent mismatch to the relational model, there is often no internal ownership of the data, making it difficult to govern and conform to a rigid structure. The Big Data RevolutionIn response, there has been an amazing disruption in the tools and techniques used to store and process data. This innovation was born in large tech companies such as Twitter and Facebook and continues to rapidly evolve as all organizations realize similar challenges with their own data. Today, the excitement of the big data era is not just about having lots of data. What’s truly interesting is that organizations with all data sizes now each approach data problems in different and tailored ways. It’s no longer a one-size-fits-all shoehorn into traditional systems. Organizations now objectively design and build systems based on business and data requirements, not on preconceived design approaches.
When it comes to data warehouses, the collected data can be stored in a number of different forms. In order to effectively analyse and use the data, data enterprises often need to integrate the information to make it compatible. This takes a lot of manual work, which, in turn, can take up valuable time. To combat this, data automation tools can be used to streamline the process. Here we examine why they’re so valuable for organisations.As data warehouses store data in a variety different forms, this can cause problems for data enterprises and controllers trying to effectively analyse the data they have. Integration is the process of translating all data into one, unified form.
The process involves correcting discrepancies in naming and units of measurement. This enables organisations to analyse the data more efficiently, drawing results that they can use effectively.Data integration done manually can often go awry for a number of reasons, including:. Incorrect information entered into the source system. Inadequate knowledge of interdependencies among data sources.
Differing timelines amongst data sources. A data warehouse system that is overly complex. Approximations used in data.
Different encoding formats. Overall lack of policy and planning within the data enterpriseData warehouse automation goes some way to addressing these issues. But what exactly is automation and how can it help?Data warehouse automation is described by The Data Warehouse Institution as ‘using technology to gain efficiencies and improve effectiveness in data warehousing processes’.
Big Data And Data Warehousing
They argue that ‘data warehouse automation is much more than simply automating the development process’. In their eyes ‘it encompasses all of the core processes of data warehousing including design, development, testing, deployment, operations, impact analysis, and change management’.In short, data automation lets you do far more in much less time, making it an invaluable tool for all organisations that have data warehouses. The benefits include:.
Big Data Challenges Today
Quality – Automated processes allow you to control and maintain integration quality across the board. It can essentially eliminate human error which can lead to the discrepancies lists above. Agility – Business requirements can often change at a rapid pace.
While manual integration might not be able to keep up with this, automation can. Cost – Automation can reduce an organisation’s costs. Reducing the amount of manual work involved means that IT specialists and data scientists can get on with other pressing jobs.Understand more about how your business can utilise data warehouse automation techniques by attending Big Data LDN on 15-16 November 2017, registration is free!