Data mining,Data warehouse, Data Warehouse Overview,Data mart,Online analytical processing (OLAP),Online transaction processing (OLTP),Predictive analysis,Machine learning,Clustering,Bayesian networks,Genetic algorithms,
Data mining
Data mining is an interdisciplinary subfield of computer science. It is the computational process of discovering patterns in largedata sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model andinference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
Data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting anddata analysis, and is considered as a core component of business intelligence environment. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analysis.The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right). The data may pass through an operational data store for additional operations before it is used in the DW for reporting.
Data Warehouse Overview
Data mart
A data mart is a simple form of a data warehouse that is focused on a single subject (or functional area) hence, they draw data from a limited number of sources such as sales, finance or marketing. Data marts are often built and controlled by a single department within an organization. The sources could be internal operational systems, a central data warehouse, or external data. Denormalization is the norm for data modeling techniques in this system. Given that data marts generally cover only a subset of the data contained in a data warehouse, they are often easier and faster to implement.
Online analytical processing (OLAP)
OLAP is characterized by a relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems, response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. OLAP databases store aggregated, historical data in multi-dimensional schemas (usually star schemas). OLAP systems typically have data latency of a few hours, as opposed to data marts, where latency is expected to be closer to one day.The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives. The three basic operations in OLAP are : Roll-up (Consolidation), Drill-down and Slicing & Dicing.
- Online transaction processing (OLTP)
- OLTP is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). OLTP systems emphasize very fast query processing and maintainingdata integrity in multi-access environments. For OLTP systems, effectiveness is measured by the number of transactions per second. OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model (usually 3NF).Normalization is the norm for data modeling techniques in this system.
- Predictive analysis
- Predictive analysis is about finding and quantifying hidden patterns in the data using complex mathematical models that can be used to predict future outcomes. Predictive analysis is different from OLAP in that OLAP focuses on historical data analysis and is reactive in nature, while predictive analysis focuses on the future. These systems are also used for CRM (customer relationship management).
- Machine learning
- Machine learning is a subfield of computer science (more particularly soft computing) that evolved from the study of pattern recognition and computational learning theory in artificial intelligence. In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions.
- Machine learning tasks are typically classified into three broad categories, depending on the nature of the learning "signal" or "feedback" available to a learning system. These are
- Supervised learning: The computer is presented with example inputs and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
- Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning).
- Reinforcement learning: A computer program interacts with a dynamic environment in which it must perform a certain goal (such as driving a vehicle), without a teacher explicitly telling it whether it has come close to its goal. Another example is learning to play a game by playing against an opponent.
Between supervised and unsupervised learning is semi-supervised learning, where the teacher gives an incomplete training signal: a training set with some (often many) of the target outputs missing. Transduction is a special case of this principle where the entire set of problem instances is known at learning time, except that part of the targets are missing. Clustering
- Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that observations within the same cluster are similar according to some predesignated criterion or criteria, while observations drawn from different clusters are dissimilar. Different clustering techniques make different assumptions on the structure of the data, often defined by some similarity metric and evaluated for example by internal compactness (similarity between members of the same cluster) and separation between different clusters. Other methods are based on estimated density and graph connectivity. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis.
Bayesian networks
- A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional independencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms exist that perform inference and learning.
Genetic algorithms
- A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses methods such as mutation and crossover to generate new genotype in the hope of finding good solutions to a given problem. In machine learning, genetic algorithms found some uses in the 1980s and 1990s. Vice versa, machine learning techniques have been used to improve the performance of genetic and evolutionary algorithms.
trending keywords on this topic / related keywords / trending hashtags
Data mining
data mining concepts and techniques
data mining techniques
data mining tools
data mining and data warehousing
data mining and knowledge discovery
data mining and business intelligence
data mining and machine learning
data mining in healthcare
data mining in medical field
data mining in dbms
data mining in data warehouse
data mining vs data warehousing
data mining vs data analytics
data mining vs machine learning
data mining vs big data
data mining is used to aid in
data mining is also known as
Introduction to Data Mining
A definition for Data Mining
Applications of data mining
data mining-supervised vs. unsupervised learning
data mining strategies
unsupervised clustering using nearest neighbor algorithm
data mining stages
data pre-processing
Introduction to multidimensional data bases
data warehousing
OLAP
Basic Data Mining techniques
Decision tree building algorithm using information gain concepts
multilayer perceptions for regression and classification
Association rule learning
genetic learning
choosing the best model for a problem
analysis using confusion matrix
cross validation
classification of major clustering methods
Partition algorithms
Hierarchical methods, Density based methods, Grid based methods
Statistical techniques in data mining
Chi-square analysis-regression
techniques-principal component
analysis-Naïve Bayes
classifier-Support
Vector Machines-Lazy
classifiers-Rough set concepts
Time series analysis
Case studies in data mining using these classifiers
Advanced data mining techniques
Text mining
Web mining
spatial mining
temporal mining
Ensemble techniques
case studies using statistical packages
case studies using WEKA software package
Data mining vs Data warehouse
Data Warehouse Overview
Data mart
Online analytical processing (OLAP)
Online transaction processing (OLTP)
Predictive analysis
Machine learning
Clustering
Bayesian networks
Genetic algorithms
Data mining
data mining concepts and techniques
data mining techniques
data mining tools
data mining and data warehousing
data mining and knowledge discovery
data mining and business intelligence
data mining and machine learning
data mining in healthcare
data mining in medical field
data mining in dbms
data mining in data warehouse
data mining vs data warehousing
data mining vs data analytics
data mining vs machine learning
data mining vs big data
data mining is used to aid in
data mining is also known as
Introduction to Data Mining
A definition for Data Mining
Applications of data mining
data mining-supervised vs. unsupervised learning
data mining strategies
unsupervised clustering using nearest neighbor algorithm
data mining stages
data pre-processing
Introduction to multidimensional data bases
data warehousing
OLAP
Basic Data Mining techniques
Decision tree building algorithm using information gain concepts
multilayer perceptions for regression and classification
Association rule learning
genetic learning
choosing the best model for a problem
analysis using confusion matrix
cross validation
classification of major clustering methods
Partition algorithms
Hierarchical methods, Density based methods, Grid based methods
Statistical techniques in data mining
Chi-square analysis-regression
techniques-principal component
analysis-Naïve Bayes
classifier-Support
Vector Machines-Lazy
classifiers-Rough set concepts
Time series analysis
Case studies in data mining using these classifiers
Advanced data mining techniques
Text mining
Web mining
spatial mining
temporal mining
Ensemble techniques
case studies using statistical packages
case studies using WEKA software package
Data mining vs Data warehouse
Data Warehouse Overview
Data mart
Online analytical processing (OLAP)
Online transaction processing (OLTP)
Predictive analysis
Machine learning
Clustering
Bayesian networks
Genetic algorithms