From the book The Data Warehouse Lifecycle Toolkit, 2nd Edition Author: Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy, & Bob Becker ISBN-13: 978-0470149775
1. The authors include an eight step approach to developing an architecture plan. Discuss elements of these eight steps.
2. The authors include information on selecting products for the system's architecture. These can include products that are custom developed to meet your organization's needs or can be purchased off-the-shelf. Off-the-shelf products can be used as-is out of the box or in some instances can be customized. Discuss elements selecting the products to support your architecture.
3. The authors include the topic of Metadata for the architecture. This topic of Metadata is the data about the architecture data. Discuss elements of metadata of the architecture.
Q1) 8 Steps approach to developing an architecture plan
Data warehouse teams approach the technical architecture design process from opposite ends of the spectrum. Some teams simply don’t understand the benefits of an architecture and feel that the topic and tasks are too nebulous. They’re so focused on data warehouse delivery that the architectures feels like a distraction and impediment to progress, so they opt to bypass architecture design. Instead, they piece together the technical components required for the first iteration with bailing twine and chewing gum, but the integration and interfaces get taxed as we add more data, more users, or more functionality. Eventually, these teams often end up rebuilding because the nonarchitectured structure couldn’t withstand the stresses. At the other extreme, some teams want to invest two years designing the architecture while forgetting that the primary purpose of a data warehouse is to solve business problems, not address any plausible (and not so plausible) technical challenge.
Q2) Selecting products for the system's architecture.
In many ways the architecture plan is similar to a shopping list. We then select products that fit into the plan’s framework to deliver the necessary functionality. We’ll describe the tasks associated with product selection at a rather rapid pace because many of these evaluation concepts are applicable to any technology selection. The tasks include:
Understand the corporate purchasing process. The first step before selecting new products is to understand the internal hardware and software purchase approval processes, whether we like them or not. Perhaps expenditures need to be approved by the capital appropriations committee (which just met last week and won’t reconvene for 2 months).
Develop a product evaluation matrix. Using the architecture plan as a starting point, we develop a spreadsheet-based evaluation matrix that identifies the evaluation criteria, along with weighting factors to indicate importance. The more specific the criteria, the better. If the criteria are too vague or generic, every vendor will say it can satisfy our needs. Common criteria might include functionality, technical architecture, software design characteristics, infrastructure impact, and vendor viability.
Conduct market research. We must be informed buyers when selecting products, which means more extensive market research to better understand the players and their offerings. Potential research sources include the Internet, industry publications, colleagues, conferences, vendors, and analysts (although be aware that analyst opinions may not be as objective as we’re lead to believe). A request for information or request for proposal (RFP) is a classic product-evaluation tool. While some organizations have no choice about their use, we avoid this technique, if possible. Constructing the instrument and evaluating responses are tremendously time-consuming for the team. Likewise, responding to the request is very time-consuming for the vendor. Besides, vendors are motivated to respond to the questions in the most positive light, so the response evaluation is often more of a beauty contest. In the end, the value of the expenditure may not warrant the effort.
Narrow options to a short list and perform detailed evaluations. Despite the plethora of products available in the market, usually only a small number of vendors can meet both our functionality and technical requirements. By comparing preliminary scores from the evaluation matrix, we should focus on a narrow list of vendors about whom we are serious and disqualify the rest. Once we’re dealing with a limited number of vendors, we can begin the detailed evaluations. Business representatives should be involved in this process if we’re evaluating data access tools. As evaluators, we should drive the process rather than allow the vendors to do the driving (which inevitably will include a drive-by picture of their headquarters building). We share relevant information from the architecture plan so that the sessions focus on our needs rather than on product bells and whistles. Be sure to talk with vendor references, both those provided formally and those elicited from your informal network. If possible, the references should represent similarly sized installations.
Conduct prototype, if necessary. After performing the detailed evaluations, sometimes a clear winner bubbles to the top, often based on the team’s prior experience or relationships. In other cases, the leader emerges due to existing corporate commitments. In either case, when a sole candidate emerges as the winner, we can bypass the prototype step (and the associated investment in both time and money). If no vendor is the apparent winner, we conduct a prototype with no more than two products. Again, take charge of the process by developing a limited yet realistic business case study. Ask the vendors to demonstrate their solution using a small sample set of data provided via a flat file format. Watch over their shoulders as they’re building the solution so that you understand what it takes. As we advised earlier with proof of concepts, be sure to manage organizational expectations appropriately.
Select product, install on trial, and negotiate. It is time to select a product. Rather than immediately signing on the dotted line, preserve your negotiating power by making a private, not public, commitment to a single vendor. In other words, make your choice but don’t let the vendor know that you’re completely sold. Instead, embark on a trial period where you have the opportunity to put the product to real use in your environment. It takes significant energy to install a product, get trained, and begin using it, so you should walk down this path only with the vendor from whom you fully intend to buy; a trial should not be pursued as another tire-kicking exercise. As the trial draws to a close, you have the opportunity to negotiate a purchase that’s beneficial to all parties involved.
Q3) Metadata for architecture
Metadata is all the information in the data warehouse environment that is not the actual data itself. Metadata is akin to an encyclopedia for the data warehouse. Data warehouse teams often spend an enormous amount of time talking about, worrying about, and feeling guilty about metadata. Since most developers have a natural aversion to the development and orderly filing of documentation, metadata often gets cut from the project plan despite everyone’s acknowledgment that it is important. Metadata comes in a variety of shapes and forms to support the disparate needs of the data warehouse’s technical, administrative, and business user groups. We have operational source system metadata including source schemas and copybooks that facilitate the extraction process. Once data is in the staging area, we encounter staging metadata to guide the transformation and loading processes, including staging file and target table layouts, transformation and cleansing rules, conformed dimension and fact definitions, aggregation definitions, and ETL transmission schedules and run-log results. Even the custom programming code we write in the data staging area is metadata. Metadata surrounding the warehouse DBMS accounts for such items as the system tables, partition settings, indexes, view definitions, and DBMS-level security privileges and grants. Finally, the data access tool metadata identifies business names and definitions for the presentation area’s tables and columns as well as constraint filters, application template specifications, access and usage statistics, and other user documentation. And of course, if we haven’t included it already, don’t forget all the security settings, beginning with source transactional data and extending all the way to the user’s desktop. The ultimate goal is to corral, catalog, integrate, and then leverage these disparate varieties of metadata, much like the resources of a library. Suddenly, the effort to build dimensional models appears to pale in comparison. However, just because the task looms large, we can’t simply ignore the development of a metadata framework for the data warehouse. We need to develop an overall metadata plan while prioritizing short-term deliverables, including the purchase or construction of a repository for keeping track of all the metadata.
From the book The Data Warehouse Lifecycle Toolkit, 2nd Edition Author: Ralph Kimball, Margy Ross, Warren...
From the book The Data Warehouse Lifecycle Toolkit, 2nd Edition The authors separate architecture into two groups, "Back Room" and "Front Room". In this discussion area, each student is to provide (3) posts discussing "Front Room" architecture system and characteristics. Your posts may be general, listing all front room topics or may focus on one specific topic's details.