NASA Heliophysics
Science Data Management Policy
Version
1.0
25 June 2007
Change Log
6/25/07 Release of version 1.0

To: The Heliophysics Community
The Heliophysics Science Data Management Policy is an important step forward in the evolution of the Heliophysics Data Environment that is the public face of NASAÕs Heliophyscis Great Observatory. This policy provides a blueprint for the HPDE, tracing the data lifecycle from measurements to final archives. Our new environment, in which data are efficiently served through Virtual Observatories from distributed active and (longer-term) resident archives, should provide the infrastructure needed to achieve our scientific goals and objectives. Multi-instrument and multi-mission studies of the Sun and its effects on the heliosphere and on the magnetospheres and ionospheres of the planets will be facilitated by the approach presented here, thus enabling the attack on the next generation of HP science problems, including the understanding needed for robust space weather prediction and the related exploration of our solar system.
This data policy is vital to our research community and it incorporates the communityÕs input throughout. This is a living document, to be modified as needed as our science program evolves. We welcome your feedback, as only through such interaction will the HPDE continue to be responsive to community needs.
Richard
R. Fisher
Director, Heliophyics Division
Contents
Executive Summary
1. Introduction, Purpose, and Scope
2. The Components of the Data Environment and their Roles
3. The Mission/Data/Review Lifecycle
4. The Role of Standards: Formats, Data Model
5. Final Archiving and Continued Serving of Data
6. Plans for HP Science Data Management Policy Review and Revision
References
Appendix A: The Heliophysics Data Environment Rules of the Road
Appendix
B: A Framework for Space and Solar Physics Virtual Observatories
(Executive Summary)
Appendix C: A
Space and Solar Physics Data Model from the SPASE Consortium (Executive Summary)
Appendix D: Project Data Management Plans
Appendix E: Mission Archive Plans
Appendix F: Resident Archive Functions
Figure 1: Heliophysics Mission Data Lifecycle
Executive Summary
Heliophysics (HP) research seeks to determine and model the nature and dynamical interactions of the Sun, the heliosphere, and the plasma environments of the planets based on data from a fleet of spacecraft termed the ÒHeliophysics Great Observatory.Ó Achieving the desired understanding requires easy access to data and tools from a distributed set of active archives, each of which has its own architecture and formats. This Policy document provides HP policy and guidelines for preparing, accessing, using, and archiving HP data throughout its lifetime. The basic principles for the HP Data Environment (DE) are the involvement of scientists in each stage of the process, and the acceptance of the goal of openly accessible data that are independently scientifically usable. The data environment described here is guided by a Òtop-downÓ vision provided by HQ with community input, but it is implemented from the bottom up, built from peer-reviewed data systems driven by community needs and founded on community-based standards. Consistent with this approach, data providers and data users share responsibility for the quality and proper use of the data for research.
Spaceflight projects (ÒmissionsÓ) are the core of the HPDE. The proposals for these define the science goals that determine the required data products. The production and serving of the data from missions is governed by the Project Data Management Plan (PDMP) until after the termination of the active phase of a mission. A Mission Archive Plan, initially formulated at the first Senior Review for an extended mission and in effect until the missionÕs end, will guide the preparation of lasting data products. After a mission ends, its data will typically remain accessible through a Resident Archive that maintains easy access to data and to expertise for its use. Permanent archiving, with continued but unsupported access, is a natural endpoint in the data cycle rather than a special task.
The data from missions is to be made available by them both directly and via Virtual Observatories (VOs) that will provide one-stop access to data from many missions along with tools for cross-mission analysis and visualization. NASA HQ will provide the vision for the DE and insure, with community input from the HP Data and Computing Working Group (HPDCWG) and Senior Reviews of both missions and Data Centers, that the vision is carried out. NASA Data Centers will provide various services, including archives and cross-disciplinary access to data and services. All components of the HPDE will involve competitive selections and reviews to assure the best quality.
This document provides an overview of the components of the HPDE and their relationships, a timeline of significant events in the data lifecycle, guidelines for the preparation of Project Data Management Plans and Mission Archive Plans, guidelines for the long-term serving and archiving of data, and a plan for keeping this Data Policy updated in light of changing technology and community needs. The document is intended for all those who deal with HP data including people developing or managing missions, anyone providing or serving HP data (such as missions, virtual observatories, and data centers), those proposing for missions or data environment components, and users of the data.
1. Introduction, Purpose,
and Scope
NASAÕs Science Mission
Directorate conducts scientific exploration that is enabled by access to space
[1]. The NASA Strategic Objective for Heliophysics, in particular, is to Òexplore
the Sun-Earth system to understand the Sun and its effects on Earth, on the
Solar System, and on the space environment conditions that will be experienced
by explorers, and demonstrate technologies that can improve future operational
systemsÓ [2]. Our solar system is
governed by the Sun through gravity, radiation, and through streams and gusts
of solar wind and magnetic fields that interact with the fields and atmospheres
of planetary bodies. The space
weather produced by the solar effects are seen in the ozone layer; in climate
change; and in effects on radio and radar transmissions, electrical power
grids, and the electronics of spacecraft.
HP seeks to understand how and why the Sun varies, how planetary systems
respond, and how human activities are affected. As we reach beyond the confines of Earth, this science will
enable the space weather predictions necessary to safeguard the outward journey
of human and robotic explorers.
A key step toward the goals of the HP missions is the production and analysis of high-quality data from space platforms. This is no longer a matter of a PI team executing an experiment in space; our research goals now require an integration of data from the many instruments and missions comprising the ÒHeliophysics Great ObservatoryÓ as well as complimentary sources of data used to perform Heliophysics research. The success of this effort depends on having scientific involvement in all stages [3, 4] of data production, dissemination, and archiving, with a close collaboration between scientific and technical teams. Two overarching principles also essential to achieving the goals of current Heliophysics programs are:
This view involves the data users from the general science community as responsible partners in the improvement of the data environment and of the data products themselves, as detailed in the HP Data Environment Rules of the Road (Appendix A). To assure responsiveness to the community and high quality, all aspects of the HPDE will involve competition through proposals or periodic reviews that include both quality assessment and consideration of plans for and the use of community feedback, and thus the HPDE is Òmarket-driven.Ó
The data lifecycle for HP missions envisioned here starts with the science goals for a new mission. These goals yield a set of constraints on what quantities need to be measured and at what sensitivity and resolution to answer the questions posed. The measurements are embodied in a set of data products, first articulated in the mission proposal, and later made more precise in the PDMP that forms the basis for data gathering, reduction, and serving for the active mission. In the early phases of an active mission, data reduction routines may change continually, and the best products may only be produced for time intervals of high interest. The experience gained here leads to the ability to routinely produce high-quality, carefully documented data, served to the community using easily understood formats and delivery mechanisms. As the mission ages, and especially in the extended phase, usage typically decreases and the argument for further data refinement becomes less compelling. At this stage, a Mission Archive Plan will guide the preparation of a final set of best products and documentation that will continue to serve the community well beyond the end of the mission. The maintenance and serving of the mission archive can be performed by a ÒResident ArchiveÓ (RA) associated with mission data experts. At some point the support of the datasets through an RA may no longer be cost effective, but the data product files will still be useful as served from a permanent archive. The final transition should be quite simple when the other steps in the process are performed well.
Technological advances and increasing data volumes lead
naturally to having a distributed data environment, with many remote data
archives at provider and other archive sites linked through software services
known as Virtual Observatories (VOs).
The VOs do not replace the primary repositories, but they enhance the
ability to obtain and use data efficiently across a broad range of
observatories, instruments, and data formats. VOs will allow scientists to access data from many missions
from one Web location, or even from their own software applications directly.
The use of internet-based services founded on
community-evolved practices and standards allows for a Òbottom-upÓ implementation
of specific, peer-reviewed VOs and other services all working toward a
Òtop-downÓ vision provided by NASA HQ with community input. To be effective, the VO groups need to
work together and with the data providers and users to establish and maintain standards
that allow effective communication and interoperability. Traditionally, collaboration with other
nations and agencies has fostered the inclusion of essential non-NASA holdings
in the HP Data Environment, and NASA HP will continue to encourage such
interactions.
The purpose of this document is to present the HP Science Data Management Policy, which is based on the above principles and overview. This policy describes the philosophy for science data management for the science programs sponsored by NASAÕs HP Division, and its scope encompasses all phases of the mission and data life cycles. In particular, it provides:
2. The Components of the
Data Environment and their Roles
The HPDE is that portion of NASAÕs HP Division concerned with the processing, serving, storing, and preserving of data and associated data analysis tools from HP missions and the broader HP community. It will increasingly involve the ÒdataÓ produced by models as well, which is also essential to the HP goals outlined above, but this will not be the focus of this document. This section provides an overview of the relevant HP components and their role in the data environment.
2.1 Heliophysics Division
Overview
As stated in the Heliophysics Roadmap [2], the HP endeavor
relies on a number of major mission programs. The Solar Terrestrial Probes
focus primarily on fundamental science questions. Living With a Star missions
and partnerships target knowledge of processes that directly affect life and
society. The flexible Explorer
program provides an efficient means of achieving urgent strategic goals that is
responsive to new knowledge, technology and priorities. Challenging Flagship
and Partnership missions address important goals that cannot be funded in the
baseline program. The New
Millenium Program provides essential tests of new technologies using low-cost
spaceflight missions. The
Heliophysics Great Observatory coordinates new and existing mission elements to
confront broader problems.
Complementary to the missions, scientific research programs provide data
analysis and theory to understand the data from the missions as well as rapid
access to space measurements through the rocket and balloon programs.
What follows spells out the specific components of the HPDE
and their roles and responsibilities in enabling the data flowing from the
above missions to be used effectively for scientific research.
2.2 Spaceflight Projects
Spaceflight projects (ÒmissionsÓ) are the core of the HPDE,
as the providers of the data that lead to understanding. The missions should manage data so as
to facilitate achievement of the missionÕs science goals. For the success of the DE, missions
must provide open community access to the products required to attain the
science goals of the proposed mission as soon as practicable, as well as to
higher-level data products. This
should be through an active archive of the missionÕs design, made to be
compatible with the overall HPDE framework.
Each mission is expected to manage its own data and systems;
this may be done using their own Òin houseÓ systems, or in collaboration with
other data systems or centers. The
creation and maintenance of the data archive will be the responsibility of each
mission during the prime mission phase (Phase E). The assignment of maintenance responsibilities to the
mission elements (including instrument teams) will be chosen to promote
efficient processing and distribution of science data as a means of meeting the
missionÕs Level-1 requirements.
These assignments are made early in the missionÕs development and
documented in the Project Data Management Plan PDMP (see below). Missions need to create and provide
access to supporting material (documentation, software) required to ensure
independent data usability.
Missions should adhere to such data and metadata standards as
practical. (See the note on
standards below.)
These standards can evolve over the life of the
mission. Furthermore, the uses of
the data evolve as the mission matures.
The underlying information technology that hosts a missionÕs data
environment evolves in ways not envisioned during a missionÕs development
phases. Capturing the evolved
mission data system will be carried out by creating, in time for the first
Senior Review for an extended mission,
a Mission Archive Plan (MAP) and adhering to it thereafter. The MAP focuses on the content of the
data and metadata files at the end the mission. The MAP will depict the status of the missionÕs science data
(science quality, documentation, formats, standards, and essential data
analysis tools) in the final mission archive. The MAP should show the path to creating the missionÕs
Resident Archive(s). The MAP will
be updated as the mission progresses into and through its extended mission
phase.
2.3 Virtual Observatories
A recent development in the HPDE is the introduction of Virtual Observatories (VOs). The idea of VOs started as a ÒDigital SkyÓ project intended to give astronomers searchable, virtual (electronic) access to all observations of any region in all wavelengths through the use of the Internet to reach archives of data located worldwide. This project became the National Virtual Observatory in the US, and the International Virtual Observatory Alliance worldwide. Solar and space physics are now embarking on a similar unification of access to the many HP Great Observatory spacecraft and ground-based instruments. The HP VOs will primarily be intended to provide simple, uniform access to data from distributed, heterogeneous sources, but they will also enable services, such as visualization or format translation, that enhance the use of these data.
Accomplishing this unification requires a coordinated effort to link data and service providers to scientific users through software that uses nearly universal language descriptions to give a uniform face to an underlying heterogeneous and distributed set of resources. The software services that accomplish this task of linking users to data and services have now become generally known as VOs. The VOs do not typically hold data, but constructing VOs requires strong interaction with data providers to achieve the desired VO goal of uniform descriptions of data products and seamless access to them. The access to products may involve the development of visualization and query services beyond those needed for basic browse and access, but the primary goal is to make the data from many sources easily found and available in convenient form for scientific use.
The HP VOs are being initiated through calls in the annual ROSES solicitations for discipline specific VxOs (e.g., x = M for Magnetospheric or S for Solar); these proposals define the scope of each specific effort. Having VxOs will allow subfields to organize their data and approach in a way that best suits that field.. NASA HQ is promoting and guiding the overall integration of the VxOs. There is now a community-wide data model (the ÒSPASEÓ Data Model; see below and Appendix C) as the essential set of uniform terms needed to ensure interoperability. The VxOs are expected to work together and with the other elements of the HPDE. Conversely, new NASA missions should work with the VxOs to promote the distribution of their data. Appendix B presents the Executive Summary of a community-based defining document for VOs, along with a link to the full document.
2.4 Resident Archives
The concept of resident archives recognizes that the data from a mission are often of great current interest long after the mission has ceased actively collecting data. Formalizing this post-mission phase will also allow for a smoother transition to a useful permanent archiving of data products. In particular, the role of the RA will be primarily to securely hold and serve data effectively, providing support to the community for the data use. This ÒcomponentÓ of the HPDE is not really separate from the others, in that the team involved would usually involve a subset of the original mission team. RAs are described in more detail in Appendix E. The HP RA concept parallels to some extent the Planetary Data System (PDS) Òdata nodesÓ and Astrophysical Science Archive Research Centers (SARCs) but with an emphasis on keeping data near its producers, and without a preset determination of where particular datasets will reside. The serving of datasets from one RA that cross multiple missions—united, for example, by measurement type, common team members, or subfield—will be a natural part of the HPDE, depending on what is most effective for the particular data.
2.4 NASA Data Centers
The NASA Data Centers form an important aspect of the HPDE. These consist of the Solar Data Analysis Center (SDAC), the Space Physics Data Facility (SPDF), and the National Space Science Data Center (NSSDC). The latter has the responsibility for assuring that NASA data are preserved over the long term, and their role includes archiving data from Planetary Science and Astrophysics missions in addition to HP missions. For HP missions, this role will in many cases be changing from one of physically gathering and storing data to one of providing assurance that the data are preserved through the management of RAs. Part of this is expected to involve knowing what data are and should be available across the HP disciplines, based on PDMPs, VxO registries, and other input; this information should be made readily accessible. As the process of serving data during active missions becomes more uniform and better documented with the help of VOs and related activity, the problem of archiving for the long-term should become relatively easy. Resident Archives may hold data for longer times than has conventionally been the case with inactive missions, but if the RA becomes no longer cost effective, then the physical data will pass to a facility determined in collaboration with the NSSDC.
The SDAC and SPDF, which traditionally have been active data repositories, have increasingly become centers for excellence in providing multi-project, cross-disciplinary access to data and tools to support the broad range of science possible with the ÒHeliophysics Great Observatory.Ó They also produce and maintain such things as the Common Data Format (CDF) software for making and using files in a self-documenting format and the SolarSoft set of routines for solar data analysis. The role of these two centers in ingesting data directly will be decided in the future on a case by case basis, in collaboration with particular projects that may find it beneficial to use the existing expertise to solve data storage and serving problems. Among other similar roles, the SPDF functions as the active archive, effectively acting as a Resident Archive in many cases, for Space Physics Data held or to be held at the NSSDC. SDAC will serve data from some of the new solar missions in addition to serving, e.g., much of the SOHO image data; it, too, may ultimately take on a Resident Archive role in such cases.
Likewise, SDAC and SPDF can assist in the long-term maintenance of the VxO infrastructure that is now just emerging. In the solar case the core functions of VSO, the first of the VxOs, is now funded as a part of the SDAC functions. The VxOs, selected originally via the competed programs of ROSES, may need to be sustained in the longer term by a periodically reviewed arrangement, yet to be determined. One example of such a transition is the transfer of basic support of the Virtual Space Physics Observatory to SPDF. The model for other VOs has yet to be worked out, but HQ will assure that VxO functions will persist beyond the current startup grants.
It is useful to note that many other data centers exist that provide data relevant to HP science goals. In the US, the PDS, for example, provides access to NASA Planetary data; the NGDC provides a link to essential NOAA data; and CEDARWeb provides key ground-based data from NSF. Worldwide there are data centers associated with ESA (e.g., the CDPP in France) in Europe, the CSA and related efforts such as GAIA in Canada, and JAXA (e.g., the DARTS active archive) in Japan. There are virtual observatory efforts such at the European EGSO and other projects underway that will link VOs and these data centers to each other to produce a world-wide analogue of NASAÕs HP Great Observatory, and the HPDE is working to make this larger goal a reality.
2.5 NASA HQ, HQ Program
Offices, and Program Scientist
The role of NASA HQ has been mentioned above in various
places, and obviously includes approving and overseeing missions and
proposals. NASA HQ will continue
to develop, with input from community groups, the overall philosophy and
direction of the HPDE, as expressed in this Data Policy document. HQ will also do all it can, subject to
available resources, to support the HPDE, ensuring an architecture through
which data and supporting material will be community-accessible and preserved
for the long term, and that will evolve as technologies and requirements
evolve. HQ will be responsible for
convening and using the results from Senior Reviews and NRA reviews to
establish projects and priorities for the HPDE.
The NASA Program Offices oversee the design and
implementation of missions, and thus are the primary point of contact between
the missions and HQ. Each Program
Office (e.g., LWS or STP) allocates budgets and oversees contracts with the
projects that make up the program.
The contracts include, in addition to the primary hardware deliverables
and related measurement requirements, the requirements for data provision,
access, and delivery. This Data
Policy is designed to provide guidelines consistent with the contractual
requirements, but provides, in addition, recommendations designed to lead to
the best return on the investment in the data.
HPDE program integration is currently being facilitated by
an HPDE Program Scientist who oversees the competition for DE components such
as VxOs, RAs, and related tools, as well as the progress of the resulting
selected projects. The PS
maintains an HPDE Web site with overviews of the DE, current events of interest,
and links to the HPDE component activities. He reports to and gets input from the HPDCWG on the
directions of the HPDE, including on the definition and maintenance of
standards such as a Data Model and formats; ensures that VxOs, data
repositories, and other interested parties meet and otherwise interact
frequently enough to assure an integrated data environment; and works with
other nations and agencies to assure the HPDE includes all components needed
for success. In the longer term,
the role of the HPDE Program Scientist may be changed or replaced as the
structure of the DE evolves.
2.6 HP Data and Computing
Working Group
The HP Data and Computing Working Group is the principle community group discussing the HPDE. (The HPDCWG also addresses high-performance computing for HP, but this is not dealt with here). In light of their assessment of community needs, the HPDCWG will review this Data Policy; hear presentations of and comment on PDMPs and MAPs; review the progress of components of the DE, such as VxOs, as needed; consider the content of calls for DE-related SR&T and TR&T related work; and provide findings to help the HPDE be responsive to the community. The HPDCWG reports findings to the HPDE Program Managers and Scientist. The Chair of the HPDCWG reports to the Heliophysics Science Subcommittee.
2.7 Heliophysics Community:
NRA-based work
The general community of
Heliophysics researchers is naturally involved in the HPDE as the user of data
and services, and thus it is the ultimate arbiter of what works or needs
improvement. This will be true for
VxOs, where the different initial interfaces and services will thrive or
decline based on use, and each VO can learn from the others. In addition, the open data policy makes
the community into a collective source of validation and verification of data
quality. While it is the
responsibility of the instrument teams to produce high quality data, the users
of the data will be essential critics, and research projects where reputations
are at stake provide strong motivation for correctness. While some users will use data assuming
it is all valid, others will be appropriately wary and will be one of the best
sources of data quality improvement, as has been found by many teams who have
operated this way. The data quality and validity will also be addressed by the
community as part of mission Senior Reviews and RA reviews.
In addition to fulfilling their
role of maintaining data quality, community members may also help by providing
specific products or services. All
aspects of the HPDE involve elements of competition that maintains the quality
of the DE, but certain aspects of the HPDE will be competed through the
standard NRA process (ROSES), which facilitates the broadest possible
involvement. The NRA-initiated
work will include the establishment of VxOs and initial RAs, the development of
value-added services (e.g, data mining services), the construction of general
data analysis and visualization tools, and the restoration of datasets or their
preparation for easy Internet access.
When tools or services prove to be of general value to the community, they can be transitioned to the support by Data Centers or other long-term means. The SPDF and the SDAC may serve, in some cases, as the mechanism of this long-term support. The Senior Reviews for the Data and Modeling Centers will review the appropriateness and effectiveness of each of the ongoing elements of the HPDE funded in this way.
3. The Mission/Data/Review
Lifecycle
The data lifecycle of a mission is shown in Fig. 1. At the inception of new scientific mission, the goals of the investigations are laid out. From this, the concepts for the scientific instruments and the mission operations scenarios are developed. These steps lead directly to the specifications of the instruments and the data type, sensitivity, and resolution requirements. The projectÕs Data Management Plan (PDMP) captures the architecture and implemention of the processing and distribution of mission data. [See Appendix D.]
The schedule of significant data-related events in the life
cycle of a spacecraft project is as follows:
NASA HQ convenes, usually at two-to-four year intervals,
senior reviews for HP missions and for its Data and Modeling Centers. These have become distinct review
processes, as the nature of the activity is different in the two cases. These reviews provide community input
through a panel selected for its relevant expertise. Based on these reviews, other NASA priorities, and the
realities of the funding situation, HQ enters into contracts with each of the
missions or data centers.
Other reviews may be needed if the above processes do not
provide sufficient oversight of RAs and VxOs. Reviews of these activities, or subsets of them, will be
convened by HQ as needed.
4. The Role of Standards;
Formats, Data Model
The most important ÒstandardÓ for the HPDE is a standard of
behavior, namely, the acceptance of the need for open, independently useable
data. In an era of distributed
datasets and heterogeneous infrastructures for different missions, it will also
be essential that each part of the DE be committed to working together with the
other parts. Competitive proposals
for DE components, and reviews of the components, should strongly take into
account the degree to which an effort takes a collaborative view and engages
the community in making improvements.
The HPDE will benefit greatly from more conventional
standards, but experience has shown that if these are imposed by bodies without
community input they tend to be ignored. Thus the standards adopted in this DE
will be based on utility as determined by actual implementation. In many areas, such as the communication
between VxOs or between VxOs and repositories, the standards are negotiated
within context of their use.
The formats for processing and storing of data by a PI team
are prescribed to meet mission needs.
Files for distribution to the scientific community at large should
employ a common, supported, easily-used format. There are a number of data formats in common use, such as
HDF-5 (primarily in Earth Science; netCDF is now related to this), FITS (e.g.,
in Astronomy and Solar Physics), CDF (increasingly common in Space Physics),
and various forms of ASCII files with headers and/or independent
documentation. XML-based formats
may arise, such as used in the VOTables of the astronomical National Virtual
Observatory and elsewhere, although none of these have been used as yet for
large NASA HP data archives.
The terms used in self-documenting files or even internal to
a VxO need not be standardized, but mappings to the SPASE Data Model
terminology for use by VOs and other services should be provided, usually with
the help of an appropriate VxO. The SPASE data model has been the result of a broad
consortium of space and solar physics scientists and technologists, and has
been agreed upon by the initial VxOs as a language for interoperability. Similar models have been in existence
for a long time for the PDS and more recently for the NVO. The maintenance of this community-based
standard will be through the existing consortium (see www.spase-group.org),
which has open membership and welcomes input. The consortium now has an official release and mechanisms
for timely updates. Modest funding
to support SPASE efforts was initially provided through an NRA grant, but will
be continuing as a part of the NSSDC budget.
5. Final Archiving and
Continued Serving of Data
As mentioned above, the NSSDC manages permanent archiving facilities for HP (and other) data. Older data are useful for long-term studies and for unique characteristics such as specific instrumentation or regions sampled. Thus, it is useful to have not just a place for data to be preserved, but also to be served. The NSSDC has responsibility for assuring long-term data preservation and distribution. The NSSDC will ensure the maintenance of the permanent archive; the physical arrangements for such storage will be made in whatever manner is most economical, secure, and accessible.
Each mission will consult with the NSSDC concerning the
final archiving of mission data.
The path can include the holding of mission data in a Resident Archive,
but eventually the data may only be served by a data facility under the
supervision of or in coordination with the NSSDC.
The full science potential data of a mission, not irreversibly transformed, should be archived along with tools for its reduction to science products and documented algorithms for this process. Relevant engineering and ÒhousekeepingÓ data should also be preserved. However, much more important is the archiving of the calibrated, useable best products from a mission and the associated documentation. Long-term archiving and serving of data cannot be based on serving products using on-the-fly data reduction. In general, but especially for long-term and nonspecialist use, it is desirable to have data products that are ready-for-use, and thus despiked, corrected for backgrounds, etc., and not dependent on specialized software packages. Lower level products and the software and algorithms to use them should be archived, but these become increasingly difficult to use. Other products, such as browse plots and event lists, provide increased utility and their archiving is encouraged.
6. Plans for HP Science Data Management Policy Review and
Revision
This Policy document should be posted publicly and be reviewed on the same timescale as the Data and Modeling Centers, or as deemed necessary by HQ in consultation with the HPDCWG. At such times the proposed revisions should be submitted for comment and suggestions to the HPDCWG, the HP missions, and to the HP community at large, including partners from other organizations. The final decision on changes rests with HQ management.
References
1. NASA
Science Plan 2007, available at http://science.hq.nasa.gov/strategy/index.html;
direct link http://science.hq.nasa.gov/strategy/Science_Plan_07.pdf
2. Heliophysics
Roadmap available at http://sec.gsfc.nasa.gov/sec_roadmap.htm;
direct link http://sec.gsfc.nasa.gov/Roadmap_FINALscr.pdf
3. Data Management and Computation Volume 1: Issues and Recommendations, Committee on Data Management and Computation (CODMAC), R. Bernstein, et. al., Nation Academy Press, 1982.
4. Report
and Recommendations of the LWS Science Data System Planning Team, D. G.
Sibeck and T. Kucera, January 2002.
Available at http://hpde.gsfc.nasa.gov;
direct link http://hpde.gsfc.nasa.gov/LWS_Data_System_Final.html
The Heliophysics Data Environment website (http://hpde.gsfc.nasa.gov) provides a great deal of background on the data environment, with descriptions of recent activities, and annotated links to significant documents and to VOs.
Appendix A: The Heliophysics Data Environment
ÒRules of the RoadÓ
(In what follows, ÒPIÓ may
actually be an instrument lead in the case of PI-class missions.)
1. The Principal Investigators
(PI) shall, in a timely manner, make available to the science data user
community (Users) data and access methods to reach the scientifically useful
data and provide analysis tools equivalent to the level that the PI uses.
2. The PI shall make available
appropriate data products to the public that assist the PIÕs EPO
responsibilities.
3. The PI shall assure all
scientifically important data and supporting material are archived to ensure
long-term accessibility of the data and their correct and independent
usability.
4. The PI or the appropriate VxO
shall inform Users of updates to processing software and calibrations via
metadata and other appropriate documentation.
5. Users should consult with the
PI to ensure that the Users are accessing the most recent available versions of
the data and analysis routines.
VxOs should facilitate this, serving as the contact point between PI and
users in most cases.
6. Browse products are not intended
for science analysis or publication and should not be used for those purposes
without consent of the PI.
7. Users should acknowledge the
sources of data used in all publications, presentations, and reports. In some journals, this can now be done
through formal citation of the data product in the reference list.
8. Users are encouraged to provide
the PI a copy of each manuscript that uses the PIÕs data upon submission of
that manuscript for consideration of publication. On publication the citation should
be transmitted to the PI and any other providers of data. (The community needs to work to find
ways to make this easy/automatic.)
9. Users are encouraged to make
tools of general utility and/or value added data products widely available to
the community. Users are encouraged to notify the PI of such utilities or
products. The User should also clearly label the product as being different
from the original PI-produced data product.
Appendix B: A Framework for
Space and Solar Physics Virtual Observatories
Results from a Community Workshop
sponsored by NASA's Living With a Star Program, 27-29 October 2004
(Complete document at:
http://hpde.gsfc.nasa.gov/VO_Framework_7_Jan_05.doc)
Executive
Summary
The new challenges in solar and space physics, including linking solar phenomena to human consequences as studied in NASA's Living With a Star program, will require unprecedented integration of data and models across many missions, data centers, agencies, and countries. Accomplishing this requires a coordinated effort to link data and service providers to scientific users through software that uses nearly universal language descriptions to give a uniform face to an underlying heterogeneous and distributed set of resources. Such three-part entities—front-end software linked to repositories and services through "gateways" or "brokers"—represent a generalization of the ideas behind the "virtual observatory" (VO) intended to give astronomers virtual access to all observations of the sky. This workshop, held in Greenbelt, MD on 27-29 October 2004, brought together nearly 100 space and solar physicists and technologists, along with Earth scientists and astronomers, to come to basic agreements on how to proceed to build a robust data environment for future space and solar physics research based on the virtual observatory paradigm. Some of the main ideas had been in the community by other names for over a decade, but new Internet connectivity, greater emphasis on global problems to be solved with multiple spacecraft and models, and increased support by agencies has brought us to a point where the need and means are clearer for realizing an integrated data environment.
The workshop consisted of a set of plenary talks (available on a link at http://hpde.gsfc.nasa.gov, which also includes many presented posters and other background) that gave an overview of current efforts and issues, followed by 1-1/2 days of working groups and plenary sessions designed to clarify and elaborate the vision and plans. The above three-part VO structure was followed by the existing VOs, although the details differed. There are beginnings of integration of the current efforts, and the connections are becoming more direct. The workshop agreed on the need for agreement on at least a core of common "data model" terms, such as presented by SPASE and EGSO, although all agreed that specific communities, represented by "VxOs" ("x" being the community), would have some terms specific to their needs. Data models are much farther along in describing data products than services. The roles of the resource providers and the VOs were delineated at the workshop, with VxOs being mainly responsible for uniformity of access across providers and for higher level services and the providers for basic data quality and access, although the VO data environment should provide considerable flexibility in what tasks are performed by which parts of the system. It was agreed that a level of metadata management services, generally invisible to users, would be essential. Science services, such as format and coordinate translations, visualization, and higher-order queries, were seen as highly desirable but not part of the central core services which consisted of finding and accessing resources in uniform ways.
It was agreed that the current VO groups should continue to coordinate their efforts. In the short term this will be on an informal basis, but longer term there should be a coordinating group consisting of VO representatives (scientific and technical) and users from the scientific and broader community. Initially these may be agency specific groups, but interagency and international coordination, which is a natural outgrowth of much current work, will be needed and should continue to be part of workshops and other efforts. While the data environment is becoming established, data and service providers can be describing their resources in uniform data model terms and providing feedback to groups working on the data models; making data and services machine-accessible with APIs or other means as current resources allow; and linking to current VOs or making VxO alliances. In addition to continuing to coordinate their efforts, the VOs should seek community feedback on current VO interfaces and other issues.
Appendix
C: A Space and Solar Physics Data Model from the SPASE Consortium
(Complete document at: http://www.spase-group.org/data/doc/spase-1_1_0.pdf)
Executive Summary
The Solar and Space Physics communities need improved
methods and procedures to facilitate finding, retrieving, formatting, and
obtaining basic information about data essential for their research. With the
increasing requirement for data from multiple sources, this need has become
increasingly important. It has been long recognized that a uniform method to
describe data and other resources is the key to taking the next steps in
improving the data environment.
The SPASE (Space Physics Archive Search and Extract) Data Model provides
a basic set of terms and values organized in a simple and homogeneous way, to
facilitate access to Solar and Space Physics resources. The SPASE Data Model is
comparable to the data models developed by the Planetary Data System (PDS) and
the International Virtual Observatory Alliance (IVOA) for planetary and
astronomical data, respectively. The SPASE Model will provide the detailed
information at the parameter level required for Solar and Space Physics
applications.
The SPASE consortium is an international team of space and
solar physicists and information scientists. It first examined many existing
data models, but found none to be adequate. A set of terms based on a
half-dozen or so of the most complete of such models was refined based on
applying the model at various levels of detail to a large number of existing
products to arrive at the current version. The major creators of SPASE-based
product descriptions are expected to be individual data and model providers,
data centers and domain-based Virtual Observatories (ÒVxOsÓ). The SPASE Data
Model will continue to evolve in a controlled way as data and service providers
and benefiting researchers suggest improvements to extend its framework of
common standards. Success of the model will be measured by the extent of
community support and use.
The present Data Model provides enough detail to allow a
scientist to understand the content of Data Products (e.g., a set of files for
3 second resolution Geotail magnetic field data for1992 to 2005), together with
essential retrieval and contact information. A typical use would be to have a
collection of descriptions stored in one or more related internet-based
registries of products; these could be queried with specifically designed
search engines which link users to the data they need.
The Data Model also provides constructs for describing
components of a data delivery system. This includes repositories, registries
and services. This document provides potential users of SPASE with the Data
Model for review and use. The
document has an overview of the origins and the concepts of the data model, and
presents the set of elements in a hierarchy that shows the natural
relationships among them. Also
included are usage suggestion, pedagogic examples, and a complete set of
definitions of terms and enumerated lists.
Appendix D:
Project
Data Management Plans
The Project Data Management Plan (PDMP) is the interface document between NASA, the mission systems, and the instrument teams that describes the science and ancillary data associated with the mission and how the data will be managed. This document describes how the mission will meet the Level-1 requirements that address the preparation and distribution of processed science data for the general community.
The science teams (instrument
providers and Project Scientists) for each mission will develop a PDMP that
defines the data, processing approach and implementation, data and
documentation products, data availability, and storage and archival
strategies. It will also define
the access method(s) for the HP scientific community.
Signers of these documents will include the Project Manager, Project Scientist, and each Principal Investigator or Instrument Lead. Others may also need to sign this plan, depending on the Project-specific situation.
Typically, the PDMP will be available in draft form at the time of Preliminary Design Review for the mission, and signed at the time of Critical Design Review. The PDMP may be revised from time to time, but it will be augmented and eventually superceded by a Mission Archive Plan. The HPDCWG will generally review and comment on the PDMP.
Each data provider will be expected to generate and make available metadata and other supporting material on the data products, spacecraft, and instrumentation appropriate to their investigation. The details of these will be defined during discussions with the Project and Program personnel during the drafting of the PDMP. The intent of such metadata and materials will be to make the data correctly and independently useable for science investigations.
Each PDMP will:
Examples of information that are appropriate for each data provider to include in a PDMP are:
á Data flow from telemetry to science level products
á Specifications of data (including levels as defined by the mission) and estimates of data volume and frequency
á Proposed data distribution capacity
á Identification of data that are made available to the public
á Description of the means that data are made available to the public
á Schedule for making these data available
á Definitive orbit and attitude data disposition (generation, capture, distribution and storage).
á Engineering telemetry disposition (e.g., capture, distribution, archive)
á Calibration data disposition
á Description of documentation to be provided on datasets, instruments, and spacecraft relevant to data usability
á Metadata schemes to be employed, and the relation to the SPASE Data Model
á Data format specification or references (e.g., FITS, CDF, HDF, Documented ASCII, etc.)
á Processing and analysis tools to be made available, and the method for this
á Reprocessing strategy, if appropriate
á Catalogues of data or events that will be produced
á Technical support that will be provided for data use
á Back-up strategy to be implemented (routine and catastrophic)
á Plan for long-term data serving and preservation
Appendix
E: Mission Archive Plan
A Mission Archive Plan (MAP) will be prepared by a mission
team before it enters into its extended phase. The purpose of the plan is to lay out those steps needed to
be completed by the mission team to ensure that the appropriate mission data
archives have been prepared prior to the termination of the mission. The plan will be able address advances
in Information Technology that have occurred since preparation of the PDMP and
the development of its data system.
Also the plan will be able to adopt new developments in the architecture
of HPDE.
The plan will describe the current state of the missionÕs
scientifically relevant data products and describe the steps needed to complete
the mission archive, including the final list of products. The MAP should contain a roadmap for
creating or using existing resident archives of mission data in the post-operations
phase. The implementation of the
MAP will be completed prior to planned termination of the mission or soon as
possible after an unplanned termination of the mission. The MAP will be submitted as part of a
missionÕs proposal to the senior reviews of the Heliophysics operating missions. Once reviewed by the senior review, the
subsequent oversight of the implementation of the plan will be made by the
missionÕs project scientist and Mission PI (if applicable). The plan will be updated periodically
during the extended phase.
The MAP should include:
á
An assessment of the status of existing data,
ephemiris, attitude, engineering, and any other (e.g, browse, higher-level,
event list, or combined) products, and of the documentation associated with the
production and validation of these.
á
An assessment of the status of the relevant
documentation of the spacecraft, instruments, and instrument calibrations.
á
A realistic plan and schedule for producing a set of
final data and ancillary products (not Òlevel zero plus softwareÓ), with a
complete list of these products and their formats. Any provision of other than calibrated, highest resolution
products (in addition to possible lower-resolution or higher level products)
should be justified.
á
A listing of the documentation to be provided on all
products, instruments, and calibrations, and a plan for providing these to
users such that they will be able to assess the utility of the scientific data.
The relationship of metadata to the SPASE data model should be discussed.
á
A listing of all analysis tools to be provided to the
community, and details of how they are to be served.
á
Details of how the data are to be served, including
through VOs, and how this serving can be maintained for the long term through
Resident and/or Permanent Archives.
Appendix
F: Resident Archive
functions
A resident archive (RA) will be created by a once-active
mission to continue to serve mission data or a subset of a missionÕs data (e.g.
data products for a single instrument) after the mission has ended. This arrangement is intended to keep
those most familiar with the data and its caveats involved such that a user
will have access to expert assistance in using the data for research. Typically, a mission PI or instrument
lead would be the PI on an RA proposal, but there is no restriction on who can
apply or on possible arrangements with other RAs or data centers.
The prioritized functions of the resident archive are as
follows:
1. Ensure
that the mission data are served to the general community in an efficient and
scientifically useful manner consistent with the community data environment
guidelines (including as stated here and elsewhere in this Data Policy);
2.
Maintain the integrity of the data by safeguarding against
data loss; this could be achieved by the use of mirror sites (e.g., the NSSDC,
etc.), as well as with such tools as checksums;
3.
Provide expert assistance with data issues;
4.
Preserve and serve documentation of the data (including
mission, instrument, and PI information) as required to maintain independent
usability;
5.
Obtain community feedback to ensure success;
6. Make sure the data will be archived and available after the RA is no longer deemed useful or cost effective (e.g., transfer to another RA or a long-term repository by agreement with the NSSDC).
The data services provided by the resident archive must be coordinated with the HP virtual observatories and be vital to on-going Heliophysics research activities or important to future research activities.
The data services to be maintained include the open, electronic distribution of RA data, the serving of the metadata for the RA data sets, and the provision of documentation describing the resident data including calibration and validation procedures and methodologies. The maintenance of RAs includes ensuring adequate security is provided to preclude the irreversible loss of mission data. The RA is expected to provide user support for both virtual observatories and general members of the research community at a level commensurate with the budget limitations. The resident archive should include software tools with user documentation needed for accessing and displaying the resident data.
The RA should maintain reserves such that if the RA maintenance award is not renewed or is subsumed under another RA structure, the RA would transfer the data to the other RA or ensure permanent archiving by arrangement with the NSSDC. The RA proposal should include a plan for such transfer to a another archive in a manner that will still allow basic data access.
RAs are not intended to generate significant upgrades to the data sets, reprocess data, upgrade data processing algorithms, or provide new data products derived from the resident data. These types of post-mission data activities need to be funded from other sources. On the other hand, maintaining a resident archive could include ÒloadingÓ newly derived data products into the archive with appropriate changes to metadata, documentation, web interfaces, etc.

Figure 1: HP Mission Data Lifecycle
Acronym list
AA Active Archive (serves data during the active mission lifetime)
API Application Programmer Interface
CDF Common Data Format (self-documenting format commonly used in space physics)
DE Data Envionment
FITS Flexible Image Transport System (self-documenting format commonly used in solar physics and astronomy)
HDF Hierarchical Data Format (self-documenting format commonly used in earth science)
HP Heliophysics
HPDCWG Heliophsyics Data and Computing Working Group
IVOA International Virtual Observatory Alliance (astronomy)
MAP Mission Archive Plan (post PDMP plan for data product generation, etc.)
netCDF network Common Data Format (self-documenting format used in some areas of space and earth science)
NSSDC National Space Science Data Center (NASAÕs permanent archiving facility)
NVO National Virtual Observatory (astronomy)
PDMP Project Data Management Plan
PDS Planetary Data System
RA Resident Archive (serves data after the end of a mission)
SARC Science Archive Research Center (astronomical NASA data center)
SDAC Solar Data Analysis Center (NASA solar data center)
SPASE Space Physics Archive, Search, and Extract (provider of a community Data Model)
SPDF Space Physics Data Facility (NASA Data Center for space physics)
VO Virtual Observatory
VxO VO for the ÒxÓ community