Menu: Skip within this page

 

SBN Data Reviewers: Basic Battle Plan

This page presents a basic plan of attack for reviewing a dataset. We assume you have already downloaded and unpacked the entire data set rather than just a few selected files, although these comments may also provide some ideas for working your way through a data set online without downloading the entire collection.

As always, if you have any questions or problems with the data, please give us a call ASAP.

Here are the basic steps:

  1. Understand the directory structure.
  2. Read the catalog files.
  3. Note the documentation.
  4. Check for additional browsing help.
  5. Rummage through the data.
  6. Inspect any remaining directories.
  7. Present your results.

Understanding the directory structure

Each data set is organized as a directory tree. Most of the directory names are dictated by the PDS Standards Requirements. The directories you will most likely encounter are:


Read the catalog files

Start in the catalog/ directory with the catinfo.txt file. This file will tell you what is in all the other files in the directory - so you can find the instrument catalog file, for example, by looking up its name in the catinfo.txt file. You should carefully review the data set catalog file (almost always called dataset.cat in SBN data sets), which should provide a good, high-level overview of the data set. Next, you should read the instrument catalog file, which should provide an overview of the instrument and its operations. Since these files are intended to provide high-level information, it is common for them to refer to other documents, either in the literature or included with the data in the document/ directory, that contain more detailed descriptions. You should report any omissions or inadequacies you find in these files.

After that there are a number of other files that contain either additional high-level information, like the mission and instrument host files, or very low-level detail, like the full citations for references mentioned in the various label and catalog files. These are less critical to the data set review, but we would appreciate any comments you can supply on these as well.

Note for Windows Users

The files of interest in here have an extension of .cat and are simple text files. If you are a Windows user you may have to force Windows to open them with a specific file editor, because Windows has reserved the .cat extension for security catalog files and gets nervous when ordinary users start messing with .cat files. Use the "Open with" right-click option in your Windows Explorer window to select an editor for viewing the files. Notepad and Wordpad usually work well.

If you're trying to look at a .cat file in Internet Explorer, you will be stymied by the browser. You will have to download the file (right-click and "Save target as") to your hard disk and then open it with any text editor. Other browsers may be happy to display these files directly and without complaint (Google Chrome is known to do so, for example).


Note the documentation

The docinfo.txt file provides a quick guide into the contents of the document/ directory. This is the place to look for things like detailed instrument descriptions, calibration instructions, observers logs, or anything that serves to describe or explain the data and isn't an observational record. Not all the files will be relevant to the data set you are reviewing. We appreciate your comments on any and all documents you have the opportunity to read or reference during your review.

In general, our reviewers tend to focus on the data/ directory and consult the document/ directory as questions arise, rather than attempting to read every document file in advance or trying to determine which document files are relevant to the data set in question.


Check for additional browsing help

Very large data sets frequently have a top-level directory called browse/ which can provide a useful entry point for exploring the data. This directory may contain thumbnail representations of the data files, HTML files that can be opened in a browser for searching through the data, or specialized browsing indices (for use in Web browsers or spreadsheets, for example). When it exists, the contents of this directory can provide a useful overview of the data set and help focus your examination of the data.


Rummage through the data

At this point you should have enough information to dive into the data and know where to look if you have questions about what you find in the data files and PDS labels. The data/ directory itself may be subdivided in many ways, depending on the size of the data set. Small data sets may have no subdirectories; large mission data sets may have subdirectories for date or mission phase; and so on.

In the data directories you will likely see files with an .lbl extension. These are detached PDS label files; they are plain text and can be displayed in any text editor. If you don't see any .lbl files, the PDS labels will appear at the top of the data files themselves. The labels, attached or detached, will have internal pointers to the data, a structural description of the observational data (usually at the bottom), and a series of keyword = value statements that should provide all the information essential to understanding and working with the data. Definitions for the keywords used in PDS labels are either in the Planetary Science Data Dictionary database or in a local data dictionary provided by the mission which should be included in the document/ directory of the data set.

In examining the data files, you should check that all the keywords and the data themselves make sense. Try to use the data files to answer some observational question or scientific query. Apply any sanity checks that you can think of to the values you find in the label and data. The goal in reviewing the data files is to try to use them as an outside scientist would to do research. Verify that you have all the information needed to answer a scientific question based on the data and, if practical, work through to that answer.

If the data set is very large, say thousands of images, we realize you won't be able to examine every image; but you should be able to carefully examine a few select files from various parts of the data set. Then you should spot-check a variety of other data files, e.g., by running a small script to display them in succession or to perform some calculation for which you have some idea what the answer should be.


Inspect any remaining directories

Occasionally there are other directories included with the data that provide additional supporting information. For example:

Comments on the usefulness, completeness, etc., of the contents of these additional directories are encouraged.


Present your results

Many of our reviewers prepare a PowerPoint summary of their findings to present to the review team. We find that this is often very helpful when provided. In any case, we need some sort of written report of your findings, preferably before you leave the review. Major findings will be discussed in the review, while minor points (editorial points, typos, etc.) can be submitted in writing without discussion as desired.