In more detail, a DW can be seen as a process composed by three steps:
Data extraction
: it is a batch operation carried out periodically (every day/week/month); it produces summary data by scanning operational data. This step is usually automatized by using tools that are able to select useful information and map them into the DW framework (calculating possibly derived data).Data storage and reorganization
: Data are stored within a strongly denormalized and indexed database. In comparison with a conventional Information System, a DW entails limited transactional issues. In fact, only read-only accesses are allowed to the users; the primary target in DW applications is to provide results in a really short time.Information evalutation:
: different classes of tools can be used. In particular, OLAP tools are widely recognized as the most common way to access DW information. Newertheless, some kinds of analysis require an enourmous amount of data that cannot be analyzed by a human user. To overcome this problem, DATA MINING tools have been introduced. These tools are able to automatically analyze the data stored in a DW searching for correlations and clusters.
The database comunity is devoting increasing attention to the research themes concerning DW; in fact,
the development of decision-support systems will probably be one of the leading issues for the next years.
The enterprises, after having invested a lot of time and resources to build huge and complex information
systems, ask for support in obtaining quickly summary information wich may help managers to acquire and
integrate information from different sources and to query efficiently very large databases.
Due to the importance of speeding up information retrieval DW-related research mostly concern physical design.
The lack of interest in issues concerning conceptual design can be explained by examining the DW history.
In fact, the early trials in organising the enterprise data so that they could be easily accessed are due
to the practical requirements of firms; this kind of forum is obviously interested in short period results
and is not worried about the medium/long-period problems which will arise due to an inadequate conceptual design.
The word DW is really young and not all the issues and the potentialities related to it have been discovered.
We claim that the problem of design will assume main relevance in the next years. Experience shows that complex systems
can survive only if the design phase has been carried out accurately.
Our considerations are confirmed by the results coming from United States
where significant investments have been made in DW Systems: leading figures of this reality agree on
the necessity of adding a DW system to 'conventional' information Systems in order to better exploit data.
Nevertheless, the high number of failures in terms of results/investments results/time rates poses many questions.
Statistics show that the main cause of DW project failures can be found in the design phase.
In the following we will point out the factor thats we consider fundamental for the success of a DW project.
Choosing the design metodology:
Early DWs were designed using the TOP-DOWN metodology in order to obtain systems caracterized
by high consistance and completeness.
The entire information system was involved at the same time. This approach requires strong investments
and cannot produce results within reasonable time (6 months or 1 year).
These drawback causes lack of interest and mistrust in the project, that eventually is often abandoned.
A more careful approach follows the BOTTOM-UP metodology: a first data-mart is developed and then
an incremental approach is adopted. Also this solution presents many risks, in fact it could determine
partial views of the reality hiding inconsistencies and incompatibilities between modules.
Choosing the first data mart is fondamental in order to achieve a compromise between technical
and political intersts:
System design: During design, the specific aims of the DW must be taken into account. These are totally different with respect to a conventional information system and thus they require an ad hoc approach. In particular let us consider the following table:
| 90% Extemporary queries | 90% Predefined queries |
| Read only accesses | Read/Write accesses |
| Denormalized | Normalized |
| Maintains historical versions of data | Not always maintains historical versions of data |
| Optimized for accesses that involve a great part of the data | Optimized for accesses that involve a little part of the data |
| Contains mostly numerical data | Contains both numerical and alphanumerical data |
| Based on synthesis data | Based on elementary data |
These fundamental differencies affect strongly the designer choices,
the sequence of steps and the relevant information.
Tool selection:
A data warehouse is a process and not only a collection of software. This means that, in order to built a complete system, several tools are necessary. Usually, they are produced
by different companies and it not always easy to integrate them.
Choosing a component means to adapt the system to the company characteristics;
this could be a very hard work since scalabity requirements must also be taken into account.
In fact, as resulting from some statistics, the DW workload usually becomes, within a few years,
4-5 times the one extimated during design.