Date of Award

2008

Degree Type

Dissertation

Department

Nance College of Business Administration

First Advisor

Lin, Chien-Hua (Mike)

Subject Headings

Data mining, Computer networks, Data mining, Knowledge discovery, Data imputation, Neural networks, Transfer functions, Sigmoid

Abstract

The purpose of this research is to investigate the impact of Data Imputation Methodologies that are employed when a specific Data Mining algorithm is utilized within a KDD (Knowledge Discovery in Databases) process. This study will employ certain Knowledge Discovery processes that are widely accepted in both the academic and commercial worlds. Several Knowledge Discovery models will be developed utilizing secondary data containing known correct values. Tests will be conducted on the secondary data both before and after storing data instances with known results and then identifying imprecise data values. One of the integral stages in the accomplishment of successful Knowledge Discovery is the Data Mining phase. The actual Data Mining process deals significantly with prediction, estimation, classification, pattern recognition and the development of association rules. Neural Networks are the most commonly selected tools for Data Mining classification and prediction. Neural Networks employ various types of Transfer Functions when outputting data. The most commonly employed Transfer Function is the s-Sigmoid Function. Various Knowledge Discovery Models from various research and business disciplines were tested using this framework. However, missing and inconsistent data has been pervasive problems in the history of data analysis since the origin of data collection. Due to advancements in the capacities of data storage and the proliferation of computer software, more historical data is being collected and analyzed today than ever before. The issue of missing data must be addressed, since ignoring this problem can introduce bias into the models being evaluated and lead to inaccurate data mining conclusions. The objective of this research is to address the impact of Missing Data and Data Imputation on the Data Mining phase of Knowledge Discovery when Neural Networks are utilized when employing an s-Sigmoid Transfer function, and are confronted with Missing Data and Data Imputation methodologies

Included in

Business Commons

COinS