Data reduction strategies applied on huge data set. Complex data and mining on huge amounts of data can take a long time, making such analysis impractical or infeasible.
Data reduction techniques can be applied to obtain a reduces data should be more efficient yet produce the same analytical results.
Strategies for data reduction include the following-
1 Data cube aggregation, where aggregation operations are applied to the data in the construction of a data cube.
2 Attribute subset selections, where irrelevant, weakly relevant or redundant attributes or dimensions may be detected and removed,
3 Dimensionality reduction, where encoding mechanism are used to reduce the data set size.
4 Numerosity reductions, where the data are replaced or estimated by alternative, smaller data representations such as parametric models or non parametric method such as clustering, sampling, and the use of histograms.
5 Discretization and concept hierarchy generation, where raw data values for attributes are replaced by range or higher conceptual levels. Data discretization is a form of numerosity reduction that is very useful for the automatic generation of concept hierarchies. Discretization and concept hierarchy generation are powerful tools for data mining, in that they allow the mining of data at multiple levels of abstraction.