1) What is Data warehouse && why Data warehouse?
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis
Workload from transaction workload and enables an organization to consolidate data from several sources.
Data warehouse is a subject oriented, integrated, time-variant and non-volatile data.
Subject oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?"
Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time period.
Non-volatile: Data is stable in a data warehouse. More data is added but data is
never removed. This enables management to gain a consistent picture of the
business.
2) Difference between OLTP and OLAP?
|
OLTP |
OLAP |
|
1) Online
transactional processing 2) Contains only current data 3) Insert, update and delete operations
can
Perform 4) Data is volatile in nature 5) Many joins will be there 6) The data is in normalized form 7) The aggregation data is very rare |
1) Online
Analytical processing 2) Contains current + historical data 3) Select operations can perform here.
4) Data is non-volatile in nature 5) Few joins will be there 6) The data is in de normalized form. 7) The aggregation data is common here. |
3)
What is data mart?
Data mart is the subset of Data warehouse. A data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, the data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources.
4) What is data mining?
Data Mining is used for the estimation of future. For example if we take a company/business organization by using the concept of Data Mining we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders etc.
Generally data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue cuts costs or both.
5) What are dimension and fact tables??
Dimension table:
A dimension table can provide additional and descriptive information (dimension) of the field of a fact table.
If I want to know the number of resources used for a task, my fact table will store the actual measure (of resources) while my Dimension table will store the task and resource details.
Some of examples of dimension tables -
Customer dimension --- Which will contain all the details of customer.
Product dimension --- Which will contain all the details of product.
Store dimension --- Which will contain all the details of store.
Time dimension --- Which will contain all the details of time.
What was the cost of no consumable goods
sold in the northeast region in 1999?
What was the cost of kitchen products in New York City in the third quarter of 1999?
Fact
Table:
Fact table will consist of facts or measures. All the primary keys of the dimension tables are used as foreign keys in fact table. Fact table consists of two sections key and measures.
We have 3 types of facts (measures)
a)
Additive measures: Additive facts are the facts that participate in arithmetic
Calculations.
b)
Non Additive measures: Non additive facts are the facts that do not participate
in
Arithmetic calculations.
c)
Semi Additive facts: Semi additive facts are the facts that will participate in
Arithmetic calculations of some columns.
6)
What are Junk and DEGENERATE dimensions?
Junk dimension:
Junk dimensions are dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table.
Degenerate dimension:
A degenerate dimension is data that is dimensional in nature but stored in a fact table.
Best example is "Invoice Number" "bill number" "PO Number" these all are degenerated dimensions in the Transaction Tables they can be maintained in the fact table itself instead of crating separate dimensions for this.
7) What is fact less fact table? Different types of facts??
Fact table without measures is called fact less fact table.
Fact less fact table useful connecting two fact tables.Factless facts can be used as a bridge tables or to store measures that are used directly without aggregation(for e.g. max() min() attributes)
8) What is Normalisation? Different types of Normalization?
Normalization is the process of
splitting the data into two or more tables from single table.
First normal form: A table is said to be in first normal form if all the columns data is
Atomic or Unique.
Second normal form: A table is in second normal form if all its non_key fields fully
dependent on the whole key.
This means that each field in a table must
depend on the entire key.
Those that do not depend upon the combination key, are moved to another table on whose key they depend on.
Third normal form: A table is said
to be in third normal form if all the non key fields of the table are
independent of all other non key fields of the same table.
9)
What are slowly changing dimensions (SCD) and types of SCD’s?
SCD are of 3 Types.
SCD1: Contains only current data.
SCD2: Contains current + full historical data.
SCD3: Contains current + some time history data.
10) What is star and snow flake schemas?
A fact table surrounded by N number of dimension tables and the structure looks like star, so we will call this type of arrangement as Star schema.
Difference between snow flake and star schemas
Star--facts and dimension tables are demoralized in nature.
Snow--dimensions are normalised in nature and facts are demoralized in
nature.
Advantages of star schema:
No need to use complicated joins.
Queries results fastly
Advantages of snow flake schema:
We can easily do modification directly in the tables.
Dis advantages of snow flake schema:
We have to use complicated joins since we have more tables.
There will be some delay in processing the Query.
11)
What is ODS?
ODS stands for Operational Data Store.
It
is the final integration point in the ETL process before loading the data into
the Data Warehouse. It can support DSS
processing.
12) What is Granularity?
In simple terms, level of granularity defines the extent of detail. As an example, let us look at geographical level of granularity. We may analyze data at the levels of COUNTRY, REGION, TERRITORY, CITY and STREET.
In this case, we say the highest level of
granularity is STREET.
13)
What is surrogate key in DWH??
Surrogate key is a substitution for the natural primary key in Data Warehousing.
It is just a unique identifier or number
for each row that can be used for the primary key to the table.
If we have many sources, it is better to create new surrogate key instead of using the natural primary keys of different sources.
No comments:
Post a Comment