Report and Recommendations of the
D. G. Sibeck and T. Kucera (Co-Chairs)
J. Byrnes, B. Fortner, S. Fung, B. Giles, J. Gurman, T. Herder,
G. Le, B. Labonte, W.D. Pesnell, and A. Szabo
a. Introduction and Objectives
b. Current Status of Data Management within the SEC Discipline
c. Principles Guiding the Development of an LWS Science Data System
d. Responsibilities within the LWS Data System
e. Items for Immediate Action
In response to a request by LWS Senior Project Scientists Drs. Richard Fisher (GSFC) and Larry Zanetti (JHU/APL), the LWS Data System Planning Team held a series of biweekly meetings throughout 2001 to discuss the nature of the forthcoming LWS Science Data System. This report (1) reviews current data management practices within the Sun-Earth Connection (SEC) discipline, (2) lists principles that should guide the development of an LWS data system, (3) outlines the distribution of responsibilities within an LWS data system, and (4) recommends immediate and near-term actions facilitating the formation of an LWS data system.
The LWS Science Data System Planning Team hopes that this summary may serve as a starting point for future workshops directed towards community consensus.
Cost constraints and the need to provide services based on user needs dictate that any future LWS Data System will grow from existing services. The LWS Data System Planning team therefore began by surveying data management practices and resources in the various Sun-Earth Connection subdisciplines. It then considered a set of common problems faced by each subdiscipline.
Solar Physics. Numerous domestic (e.g. NASA, NOAA, NSF, and DoD) and foreign agencies (e.g., ESA, ISAS) support projects that provide solar observations. Several WWW sites maintain comprehensive links to data sets held on-line by principal investigators and designated archives. Projects within the solar community generally provide research quality observations to the scientific community and general public within minutes to hours after the observations have been made. The solar physics community benefits from the widespread use of a single software analysis package (SolarSoft) to examine data exchanged in a single format (FITS). The community maintains a software tree and sponsors the development of new tools within it, including tools that enable the importation of observations in other formats.
Heliospheric Physics. Both NASA- and foreign-sponsored missions have provided and will provide important heliospheric observations. No WWW site maintains comprehensive links to heliospheric data sets held within the research community and at designated archives. Although the NSSDC’s COHOWeb and OMNIWeb provide hourly-averaged plasma and magnetic field measurements from a variety of heliospheric spacecraft, many valuable data sets are held off line. Lag times for new heliospheric observations to be processed, validated, and placed on line range from days to months. Some data sets are never made available on line. Heliospheric physicists use a number of analysis tools to inspect observations in a wide variety of formats, often proprietary. Although the Solar & Heliospheric SR&T program has imposed a requirement that new proposals may only use publicly available data, the means of enforcing (or funding) and enabling this initiative have not been specified.
Magnetospheric Physics. Several federal (e.g., NASA, NOAA, DoE, and DoD), and foreign space agencies (ESA, ISAS, RSA), sponsor missions that provide magnetospheric observations. No WWW site maintains comprehensive links to magnetospheric data sets held within the research community and at designated archives. Although the SPDF’s CDAWeb provides key parameter observations from a variety of spacecraft, numerous valuable data sets are held off line within the community or in archives. Many are in danger of being lost permanently following the termination of the ISTP project. Lag times for new magnetospheric observations to be placed on line range from minutes to months. Magnetospheric physicists use a number of analysis tools to inspect observations in a wide variety of formats, often proprietary. The NSSDC’s SSCWeb provides information such as the location of the spacecraft and conjugate points on the ground. Satellite ephemeris data are needed for this service.
Ionospheric and Thermospheric Physics.. A variety of federal agencies (e.g., DoD, NSF, NOAA, and NASA), as well as numerous foreign governments, support projects that provide ionospheric observations. No WWW site maintains comprehensive links to ionospheric data sets held within the research community and at designated archives. However, NSF maintains a WWW site with extensive links to all the projects that it sponsors, and NOAA provides an archive for magnetometer data, solar indices, and other datasets related to solar-terrestrial connection research. Nevertheless, the ability to integrate these observations into comprehensive views of the ionosphere remains absent. Lag times for new observations to be placed on-line range from minutes to months. As in the case of heliospheric and magnetospheric physics, ionospheric physicists use a number of self-written analysis tools to inspect observations stored in a wide variety of data formats, often proprietary.
Possible Approach to Problems Common to Each Subdiscipline.
As discussed above, there is no single entry point on the WWW that comprehensively catalogues the various data sets currently available for LWS-type studies or provides the tools needed to conduct such studies. Consequently, researchers must often search the WWW for required data sets, translate formats, and prepare both graphical and analytical software to achieve their research goals. As many researchers have similar objectives, there is considerable duplication of effort.
Thanks to their use of a single software tree and a single data format, data management practices are most advanced within the Solar Physics community. Although this will greatly facilitate the solar community’s transition to LWS-type studies, researchers within other subdisciplines will not be able to make use of SolarSoft without the preparation of further introductory material and simple web-based tools, e.g., a tool for inter-comparison of images from disparate instruments.
Instead, most researchers within the heliospheric, magnetospheric, and ionospheric communities will desire software tools specifically adapted to their own research interests and more rapid access to validated data sets than has been the case to date. This will require a paradigm shift towards rapid data availability, the use of a limited number of data exchange formats (or more common use of format converters), and the development of standard software analysis and display tools. Among the tools currently available to these scientific disciplines, COHOWeb (for hourly-averaged heliospheric observations), OMNIWeb (for hourly-averaged near-Earth heliospheric and geomagnetic observations), CDAWeb (for higher time resolution magnetospheric and some ionospheric observations) and SSCWeb (for ephemeris) provide potential foundations for an incipient LWS data system. These have the advantage of utilizing and building upon existing data management infrastructure that is already serving the space physics community by providing ISTP data sets relevant to LWS. While some correlative tools may be developed at the request of the LWS data system, most should originate from and be developed by individual PIs, who understand their own observations best, and by other members of the scientific community. The LWS data system can play an important role in disseminating these tools.
Simulations and empirical models will play important roles in the LWS program. Simulations test the degree to which we understand the underlying physics linking the Sun to the Earth, whereas data intensive empirical models help specify the space environment. Both can help fill in gaps resulting from incomplete observational coverage. Continuing advances in simulation techniques and data assimilation have brought the possibility of accurate space weather forecasts within sight. For the models to be further improved, extensive comparisons with observations will be necessary. With the notable exception of the CCMC, current data systems do not facilitate such comparisons. The construction of accurate empirical specification models (such as the NASA AE/P-8 trapped radiation models and the International Reference Ionosphere [IRI] model) requires intensive data processing. The LWS data system must facilitate both model-data comparisons and the development of empirical models from its inception.
In summary, there is considerable room for improvement in data management policies, developing and disseminating correlative analysis tools, and cataloging and providing access to existing and future LWS-relevant data sets and simulation results. Advocating a paradigm shift towards more effective data management will be an important first step in the development of an LWS data system.
The development of any data system begins with lessons learned from past experiences. The LWS Data System Planning Team noted that (1) the imposition of overly ambitious comprehensive data systems can result in costly systems that do not address basic needs; (2) valuable data sets are currently in danger of being lost because their delivery to designated archives was neither required nor funded; (3) a combined effort of designated archives and dedicated PIs will be needed to extract the full scientific return from publicly available data sets; and (4) periodic competitions encourage innovations and help control costs. Consequently, the team adopted the following guiding principles:
a. The most useful and feasible LWS science data system would be a meta-system tying together many heterogeneous sets of data distributed among different institutions. Such a data system should identify and allow for access to essential data sets and model output from other NASA and non-NASA projects, sponsored by the US and other countries. The resulting LWS data system will very likely be distributed and virtual.
b. The LWS data system design must not solidify too early or be imposed from outside the scientific community, but should initially be based on existing services and then evolve in response to clearly-identified user needs and project guidelines.
c. The LWS data system must provide for end-to-end management of all research-quality data sets returned by LWS missions and models.
d. Both PI teams and designated archives should manage and maintain the usability of the data system. Whereas the former provide expertise to ensure proper processing of individual full resolution data sets, the latter support the project by establishing data archiving and accessing protocols and developing and providing the services needed to locate and retrieve multiple data sets for correlative analysis.
e. Peer-reviewed proposals in response to directed AOs provide the most cost-effective means for initiating and improving the LWS data system.
f. The time to begin developing the metadata standards and access methods for an LWS data system is now, because this will afford an opportunity to identify the data sets and services needed, familiarize potential users with available tools and conventions, support the ongoing LWS TMDA program, and take advantage of technology developed with the support of NASA’s OSS AISRP.
Responsibilities within the LWS Data System
In view of the fact that the final set of spacecraft and instruments remain to be determined, it might be thought premature to begin designing an LWS data system. Solutions adopted now may become outdated by the time the LWS missions are launched. On the other hand, the LWS science data system is more than a traditional data system designed to serve only a single project. It will serve all LWS data product providers and users, connect different LWS program elements, and provide coherence to the LWS program. Because of its paramount importance to the LWS program, it is not too early to begin identifying tasks that any data system must accomplish, allocating responsibilities, and reaching community consensus. A prototype LWS data system can begin examining and testing possible solutions, salvaging relevant data sets, and supplying both current and heritage data sets to researchers, particularly those currently funded by the LWS Targeted Research and Technology Program. This section describes our views of how the functions assigned to the LWS Data System should be distributed amongst NASA management, LWS Data System managers, the PIs, and designated data centers.
Role of LWS Project Management. Because the LWS program emphasizes cross-disciplinary and correlative studies, NASA management must provide adequate resources and expertise for end-to-end data management. NASA must ensure that Announcements of Opportunity (AOs) for LWS missions include requirements for Project Data Management Plans (PDMPs) and that proposals submitted to the program include satisfactory responses. PDMPs must be based on NASA’s open data policy and LWS program objectives, which require timely delivery of scientifically meaningful observations together with metadata and supporting documentation to data users, relevant real-time observations to operational forecasters, and data products of general interest to the public.
NASA must allocate sufficient funds to ensure the successful completion of data management tasks. NASA managers must not allow instrument and spacecraft operations to terminate abruptly with these tasks left unfinished. They may rely upon LWS data system managers and the scientific community to monitor progress.
Role of LWS Data System Management. The LWS Data System will require a small management and administrative staff. Working together with PIs, designated archives, and interested members of the scientific community, the Data System Managers will direct data system activities and ensure proper communication between NASA headquarters, members of the data system, users, and affiliated non-NASA data sources.
Role of Designated Archives. With the exception of deep archiving, archive functions within the LWS Data System should be periodically competed. Designated archives must maintain the data sets returned by the LWS project once individual missions have ended, and catalogue their holdings. Furthermore, they must provide the various user communities (researchers, forecasters, educators, and the general public) with comprehensive and comprehensible WWW interfaces to LWS data sets. It is likely that they will develop value-added services, data products, and functionality.
Role of PI teams. PI teams will play a key role within the LWS data system. They possess the unique knowledge required to interpret the observations and develop the software to interpret high-resolution observations. They should be asked to accept a paradigm shift towards full and free access to their data following an initial brief validation period. In contrast to the situation that has hitherto prevailed, they should
a. Provide access to the most recent versions of research quality data, processing software, metadata, and documentation.
b. Develop WWW-based tools that grant both team members and outsiders similar views and access to the data.
c. Produce and deliver key parameters to the designated LWS data system archives.
d. Respond to questions concerning data quality and interpretation.
e. Carry out their responsibilities as stipulated in the PDMPs.
PI funding for research should be based substantially on how well they serve the larger community in these ways. The LWS Program Scientist should ensure (1) that all AO’s include a clear statement indicating that the PI status is a public trust, and (2) that Project Scientists receive the resources and power to reward the PI’s who do the most for the community.
1. Establish a Prototype LWS Science Data System
2. Require Project Data Management Plans (PDMPs)
a. Instrument and spacecraft AOs must require end-to-end PDMPs.
b. Review panels must explicitly evaluate proposal PDMPs.
c. Routinely monitor and reward compliance with PDMPs.
d. Taper (rather than cut) off funding at the end of projects so that they can properly archive their high-resolution scientific data sets.
e. Do not rely on free help from interested scientists and archivists to preserve valuable data sets.
3. Establish a firmer LWS WWW presence for the project and more specifically for the LWS Data System.
a. Improve visibility with readable descriptions of LWS objectives.
b. Provide frequent updates on LWS implementation plans.
c. Receive community comments on plans.
d. Provide contact points and describe roles of LWS program officials.
e. Encourage joint research by describing the projects funded by the LWS Targeted Research and Technology program.
f. Support the LWS Targeted Research and Technology program by establishing a prototype LWS Data System.
AISRP – Applied Information Systems Research Program
CCMC – Community Coordinated Modeling Center
CDAWeb – Coordinated Data Analysis Web
COHOWeb – COordinated Heliospheric Observations Web
FITS – Flexible Image Transport System
NSSDC – National Space Science Data Center
PDMP - Project Data Management Plan
SPDF – Space Physics Data Facility
SSCWeb – Satellite Situation Center Web