Learning to Detect
Data Abnormaties in Databases
2005 Research Experience for Undergraduates (REU)
The University of KansasDepartment of Computer Science and Electrical Engineering
Mentors: Drs. P. Gogineni, C. Tsatsoulis, and Miss. D. Lee
|Personal Page Research Paper (PDF)|
Software engineers at the University of Kansas have developed SmartXAutofill, an intelligent data entry assistant for predicting and automating inputs for eXtensible Markup Language (XML) and other text forms based on the contents of historical documents in the same domain. SmartXAutofill utilizes an ensemble classifier, which is a collection of a number of internal classifiers where each individual internal classifier predicts the optimum value for a particular data field. As the system operates, the ensemble classifier learns which individual internal classifier works better for a particular domain and adapts to the domain without the need to develop special classifiers. The ensemble classifier has proven that it performs at least as well as the best individual internal classifier. The ensemble classifier contains a voting and weighting system for inputting values into a particular data field.
Because the existing technology can predict, suggest, and automate data fields, the investigator contributed in testing whether the same technology can be used to identify incorrect data. Given existing data transmitted by sensors and other instruments, the investigator studied whether the ensemble technology can identify data abnormalities and correctness in future sensor data transmission. The solution would be applied in a project funded by the National Science Foundation, Polar Radar for Ice Sheet Measurements (PRISM), using innovative sensors to measure the thickness and characteristics of the ice sheets in Greenland and Antarctica, with the goal of understanding how the ice sheets are being affected by global climate change.
PRISM sensors continuously send
information that is collected and
catalogued. The ensemble classifier
will check the data for correctness
by predicting which values should
be there, and if the actual values
are different, it will flag the data
as possibly corrupted, and allow
an operator to later study it and
determine if it is correct or not.
This technology will allow the PRISM
intelligent systems to automatically
determine the correctness of sensor
and other data, and contributes to
the PRISM project by adding a level
of intelligence and prediction to
the sensor suite.