Datzilla

Error Reporting and Tracking for NOAA Data

Overview

Datzilla is a web-based system used to report and track errors in NOAA datasets and Data-Products. It is an adaptation of the software bug tracking system Bugzilla that was developed in 1998 by mozilla.org to track submission of bugs to their web browser software. Bugzilla has remained an actively developed product and is being used by hundreds of software development projects to track software bugs.

NOAA has, for a number of years, needed a methodology to track submission of reported data errors in their archive data sets. With the inception of new on-line Data-Product systems, errors in data and Data-Products are being exposed to a wider audience. A need has arisen to provide systems that can track reported errors from beginning to end. With these needs in mind, a conceptual model for tracking errors in data systems was devised based on the principles incorporated in Bugzilla. Modifications to the Bugzilla database schema were incorporated into this conceptual model and implemented as the data error reporting and tracking system, Datzilla.

Datzilla Features

Datzilla provides the ability for a data user to submit error reports against a defined set of Source Systems and Data-Products that are defined in a back end MySQL relational database. The Source System correlates to climate data delivery systems and/or data archives while Data-Products are individual datasets or products derived from climate datasets. Errors entered into the system are further defined by Problem Areas defined for each Source System / Data-Product pair. Data Managers are assigned to the reported error based upon each pairing of Source System and Data-Product. Additional information required during the submission of an error report includes a short Summary of the error and a longer, more detailed, error Description. The system also provides the user to assign a Severity and Priority to the report and to attach supporting documentation to describe the error or to suggest 'fixes' to the data problem. In the case of missing data the user can attach data files for extended data periods that will allow for easy inclusion of these data into permanent archives.

Each error reports submitted to the system is assigned to a Data Manager. The assignment is delivered via an embedded email system that is tied to the system database. Messages are created from the information submitted with the error report. The Data Manager uses Datzilla to document actions taken on the error report, with each action visible to the error report submitter and to the wider Datzilla user community. The error submitter has the option to be notified of each action via email or can opt to turn off email notification of actions. Any user has the option of monitoring the Status of actions taken on error reports using a form-driven query interface to all error reports. Several options for displaying reports include simple or extended query result listings, tabular reports of summarized queries, and graphical reports of summarized queries. These latter reporting capabilities are useful for documenting report activity and effort.

On-line Documentation

The documentation required to use the Datzilla system is maintained on-line and is accessible to error reporters and Data Managers. The documentation is available from on-line links within the Datzilla system. General information is available at the bottom of each page in the footer section while page-specific information is presented as 'context' links tied to specific selection menus.

Configuration Management

Datzilla is highly configurable. Using administrative tools, accessible via a web interface, it is possible to alter the parameters of the program to meet the business rules of an organization.

Archive Capability

Datzilla uses a MySQL relational database system to manage error reports, user accounts, and much of its site configuration. This database is easily exported to provide an archive capability for all errors entered into the system. This capability allows for recovery of system information in the event of system corruption or for migration of the system to another platform. Database backup can be performed as a scheduled event.

Storing information with a relational database management system also provides the capability to 'offload' error reports to a separate instance of a Datzilla database. Using this capability, error reports can be marked as 'closed' and moved to an archive of historical reports that may lead to improved system performance if the number of error reports become unusually large.

The Datzilla system of programs are a collection of Perl scripts and configuration files that can be easily copied and maintained in an archive that also allows for recovery of system information in the event of system corruption or for migration of the system to another platform.

Open Source Development

Datzilla development is performed in an Open Source environment and is freely, and without charge, available for public and private use. Source code is distributed under Open Source licenses and modifications for are permitted under this license. Datzilla relies on many other software packages that are also available from the Open Source community and include:

Platform Independence

The use of popular Open Source components and the use of an interpretive language (Perl) to control database transactions and dynamic web page displays provides deployment of Datzilla on a wide array of hardware and operating system platforms. Without modification the database tables and Datzilla code base can be transferred to any other platform that supports Perl, MySQL, a webserver, and Sendmail MTA. This list includes any hardware platform that supports Linux, UNIX, MAC OS, Windows, IRIX, Solaris, HP/UX, AIX, .......

Note: The Datzilla instance running at http://datzilla.srcc.lsu.edu/datzilla is a workstation class computer having an AMD 64-bit Athalon 3500+ processor, 1 GByte RAM, gigabit ethernet controller, a 260 GByte SATA hard drive, and a double-sided DVD read-write drive. These system resources are also being shared for use as a personal workstation.

An identical instance of Datzilla is also running on a server at the SRCC that utilizes dual 3.0GHz, 32-bit XEON processors. No modification of the Datzilla code was need to install this system on the backup machine.