Bill Inmon Alin first coined data Warehouse in 1990. According to him, a data warehouse Is a subject-oriented, integrated, time-variant, and non-volatile collection of data. Data warehouse is kept separate from organization operational database and it can be said that a data warehouse is a more extensive form of DBMS data.
- There is no requirement for frequent updating of data on the warehouse.
- These data help analysts to make informed decisions in an organization.
- Data warehouses provide generalized and combined data in a multidimensional view
- Data warehouse also provides us with online analytical processing(OLAP) tools Which help in interactive and effective analysis.
Characteristics of Data warehouse
- Time variant
Read More About: Characteristics of Data warehouse
- KDD stands for Knowledge discovery from data.
- KDD refers to the overall process of discovering useful knowledge from data.
Data cleaning is defined as the removal of noisy and irrelevant data from the collection.
- Cleaning missing values
- Cleaning noisy data
Data integration is defined as the heterogeneous data from multiple sources combined into a common source.
Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data warehouse.
Transformation is defined as the process of transforming data into the appropriate required form required by the mining procedure.
Data objects and attributes types
- Data sets are made up of data objects.
- A data object represents an entity.
Data objects are described by attributes in the database.
Attributes: A data field, represents a characteristic or feature of a data object.
Types of attributes
- Nominal attributes:
- Relating to names
- Each value represents some kind of category, code, or state.
- Also referred to as categorical attributes.
- Binary attributes:
- Nominal attributes with only two categories of state(0 and 1).
- Data Warehouse Interview Questions
- Components of Data Warehouse
- Data Warehouse Tools
- Difference Between Data Warehouse and Data Mart
Data Warehousing MCQ
Identify the correct option which defines Datamart.
State whether True or False: Data warehouse is generally updated in real-time.
Identify the options below that a data warehouse can include.
Identify among the following for which system of data warehousing is mostly used.
Where is data warehousing used?
Small logical units where data warehouses hold large amounts of data is known as _____
Choose the incorrect property of the data warehouse.
Identify the operation which can be performed in the data warehouse.
On what is data warehouse based?
Identify the term used to define the multidimensional model of the data warehouse.
DSS in data warehouse stands for _____________.
What is the time horizon in the data warehouse?
Where can the data be updated?
What is the source of all data warehouse data known as?
Identify the most common source of change data in refreshing a data warehouse.
Who is responsible for running queries and reports against data warehouse tables?
How many approaches are there in data warehousing to integrate heterogeneous databases?
From where are classification rules extracted?
Identify whether True or False: Data in operational systems are typically fragmented and inconsistent.
Among the following which is the specialized data warehouse database?
Choose the option on which database architecture is based.
What is the percent of data redundancy between environments?
Which of the following is not a clustering method.
ETL stands for ____________
Which of the following is included in in@active. data warehouse architecture?
Identify the type of relationship between fact and dimension table in a star schema.
What does a null value indicate?
What is the use of data cleaning?
Which of the following technology is not well suited for data mining?
What does OLTP stand for?
______ is an object-oriented, integrated, time-variant, and nonvolatile collection of data in support of management decisions.
The data is stored, retrieved and updated in ____________.
What is known as the heart of the warehouse?
Under which of the following does the pattern evaluation issue fall?
Which of the following maps the core warehouse metadata to business concepts, familiar and useful to end-users.
What is a query tool made for?
What does DMQL stand for?
What does data mining system classification consist of?
From where are classification rules extracted?
Which of the following is a method of incremental conceptual clustering?
What is a multidimensional database also known as?
From where does the source data from the warehouses come?
What is the technology area associated with CRM known as?
MDDB stands for ________________
Identify the correct definition of Reconciled data.