Discovering Services with a VO Registry

In the preceding section, Data Access Services, we showed how you could retrieve data from various archives using standard VO services they provide. To search a particular archive, one needs the base URL of that archives service. This section shows how to discover VO services (and their base URLs) using what is known as a VO Registry service.

A VO Registry is a on-line database of all data collections and services know to the Virtual Observatory. Using the pyvo.registry module, you can search discover archive services based on type, waveband, and topic keywords.

Simple Service Discovery

The most common use of the registry is to find archives with VO data access services, and the simplest way to do this is to use the regsearch() function. For example, to find data collections that contain searchable infrared images:

>>> services = vo.regsearch(servicetype='sia', waveband='infrared')
>>> len(services)
15

Aha! Perhaps you didn’t realize there were that many. What collections are these?

>>> for service in services:
... print coll.title
Chandra X-ray Observatory Data Archive
Chandra Source Catalog
Swift BAT All-Sky Survey:   keV
GRANAT/SIGMA
HEAO 1A
ROSAT High Resolution Image Pointed Observations Mosaic: Intensity
INTEGRAL/Spectral Imager Galactic Center Survey
Nine Year INTEGRAL IBIS  keV Galactic Plane Survey:
PSPC summed pointed observations, 1 degree cutoff,
PSPC summed pointed observations, 2 degree cutoff,
PSPC summed pointed observations, 0.6 degree cutoff,
ROSAT All-Sky X-ray Survey  Band:
ROSAT All-Sky X-ray Background Survey: Band
RXTE Allsky  keV
ROSAT Survey and Pointed Images

As you can gather, each record in the registry search results represents a different service (in this case, an image service). Included in the record is the all-important base URL for the service:

>>> services[0].access_url
'http://cda.harvard.edu/cxcsiap/queryImages?'

However, it’s not necessary to keep track of that URL because you can now search that collection directly via the registry record:

>>> images = services[0].search(pos=(350.85, 58.815), size=0.25)
>>> len(images)
474

(See Simple Image Access (SIA) to learn what to do with image search results.)

Other types of services via the servicetype parameter:

set servicetype to... ...to find:
sia Simple Image Access (SIA) services
ssa Simple Spectral Access (SSA) services
conesearch Simple Cone Search (SCS) services
slap Simple Line Access (SLA) services
tap Table Access Protocol (TAP) services

For example, to find all known Cone Search services:

>>> cats = vo.regsearch(servicetype='conesearch')
>>> len(cats)
18189

Wow, that’s a lot of catalogs. (Most of these are from the Vizier Catalog Archive; every Vizier catalog that includes a position is available as a Cone Search service.) For just catalogs related to blazars:

>>> cats = vo.regsearch(keywords=['blazar'], servicetype='conesearch')
>>> len(cats)
146

How about blazars observed with Fermi?

>>> cats = vo.regsearch(keywords=['blazar', 'Fermi'], servicetype='conesearch')
>>> len(cats)
244

Sometimes you may be looking for a particular catalog or image collections that you already know exists, and you just need to learn the base URL for the service. The keywords parameter can be used to find it. For example, suppose you want to get cutout images from the NRAO VLA Sky Survey (NVSS):

>>> colls = vo.regsearch(keywords=["NVSS"], servicetype='sia')
>>> for coll in colls:
...     print coll.res_title
NVSS
Sydney University Molonglo Sky Survey

Obviously, the first record is the NVSS image archive. The SUMSS collection was matched as well because its description in the registry happens to include the string, “NVSS”.

If you want to limit the search results to a certain datamodel, include the datamodel parameter:

>>> obscores = vo.regsearch(datamodel="obscore")

Registry Search Result Data

As you can see from the examples above, a search will often return more than one record, and so sometimes you need to review some of the resource metadata to determine which one or ones you want. You may have noticed that the results behave similarly to the results from the data access services (see Simple Image Access (SIA)). Like them, registry search results are returned as a RegistryResults instance, and each record is represented as a RegistryResource instance.

A RegistryRecord record acts like a dictionary where the keys are the column names from the results table; using our NVSS example from the previous section,

>>> nvss = colls[0]
>>> nvss.keys()
['cap_index', 'res_description', 'intf_type', 'standard_id', 'cap_index_',
 'url_use', 'res_type', 'intf_role', 'cap_description', 'wsdl_url',
 'source_format', 'res_version', 'ivoid__', 'content_level', 'source_value',
 'std_version', 'updated', 'short_name', 'query_type', 'creator_seq',
 'intf_index', 'content_type', 'harvested_from', 'res_title',
 'region_of_regard', 'created', 'rights', 'waveband', 'reference_url', 'ivoid',
 'cap_type', 'access_url', 'ivoid_', 'result_type']
>>> nvss['waveband']
('Radio',)

Some of the more useful items are available as properties:

ivoid the IVOA identifier for the resource.
res_type the resource types that characterize this resource.
short_name the short name for the resource
res_title the title of the resource
content_levels a list of content level labels that describe the intended audience for this resource.
res_description the textual description of the resource.
reference_url URL pointing to a human-readable document describing this resource.
creators The creator(s) of the resource in the ordergiven by the resource record author
content_types the IVOA identifier of the service standard it supports
source_format the format of source_value.
region_of_regard numeric value representing the angle, given in decimal degrees, by which a positional query against this resource should be “blurred” in order to get an appropriate match.
waveband a list of names of the wavebands that the resource provides data for
access_url the URL that can be used to access the service resource
standard_id the IVOA standard identifier

If you are looking for a particular data collection or catalog, as we did above when we looked for the NVSS archive, often simply reviewing the titles is sufficient. Other times, particularly when you are not sure what you are looking for, it helps to look deeper.

The resource description, available via the res_description property, tends to be the most revealing. It contains a paragraph (or two) summarizing the catalog or data collection. It will often describe the scientific intent behind the collection.

The short_name can also be helpful, as well. This name is meant to be short–16 characters or fewer; consequently, the value is often includes the abbreviation for the project or observatory that produced the collection or catalog.

A selection of the resource metadata, including the title, shortname and desription, can be printed out in a summary form with the describe() function.

>>> nvss.describe()
Image Data Service
NVSS
Short Name: NVSS
IVOA Identifier: ivo://nasa.heasarc/skyview/nvss
Base URL: http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss&

The NRAO VLA Sky Survey is currently underway at the VLA and data is made
available to the public as soon as processed.  <i> SkyView </i> has copied the
NVSS intensity data from the NRAO FTP site.  The full NVSS survey data
includes information on other Stokes parameters. Note that <i> SkyView </i>
may be slightly out of date with regard to the latest releases of NVSS data.
The current information was copied in November 1997.

Observations for the 1.4 GHz NRAO VLA Sky Survey (NVSS) began in 1993
September and should cover the sky north of -40 deg declination (82% of the
celestial sphere) before the end of 1996.  The principal data products will
be: <ol> <li> A set of 2326 continuum map "cubes," each covering 4 deg X 4 deg
with three planes containing Stokes I, Q, and U images.  These maps were made
with a relatively large restoring beam (45 arcsec FWHM) to yield the high
surface-brightness sensitivity needed for completeness and photometric
accuracy.  Their rms brightness fluctuations are about 0.45 mJy/beam = 0.14 K
(Stokes I) and 0.29 mJy/beam = 0.09 K (Stokes Q and U).  The rms uncertainties
in right ascension and declination vary from 0.3 arcsec for strong (S > 30
mJy) point sources to 5 arcsec for the faintest (S = 2.5 mJy) detectable
sources.

<li>  Lists of discrete sources. </ol>

The NVSS is being made as a service to the astronomical community, and the
data products are being released as soon as they are produced and verified.
<P> The NVSS survey is included on the <b>SkyView High Resolution Radio
Coverage </b>map <http://skyview.gsfc.nasa.gov/images/high_res_radio.jpg>.
This map shows coverage on an Aitoff projection of the sky in equatorial
coordinates.

Subjects: NVSS
Waveband Coverage: radio

As the examples in this chapter suggest, queries to the registry are often done interactively. You will find the need to review the results by eye, to further refine the collections and catalogs that you discover. In the last section of this chapter, we present a few tips for working with the registry within scripts in a non-interactive context.

Working with Service Objects from the Registry

In the previous chapter, Data Access Services, we introduced the Service classes (e.g. SIAService). These are classes whose instances represent a particular service, and its most important function is to remember the base URL for the service and allow us to query it without having to pass around the URL directly. Further, in the section, Service Objects, we saw how we can create service objects directly from a registry search record. Here’s a refresher example, based on the NVSS example from the previous section:

>>> nvss = colls[0].service  # converts record to serviec object
>>> nvss.baseurl
'http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss&'
>>> query = nvss.create_query(size=0.25, format="image/fits")

Thus, not only does this service instance contain the base URL but it also includes all of the metadata from the registry that desribes the service. With this service object, we can either call its search function directly or create query objects to get cutouts for a whole list of sources.

Retrieving a Service By Its Identifier

Our discussion of service metadata offers an opportunity to highlight another important property, the service’s IVOA Identifier (sometimes referred to as its ivoid). This is a globally-unique identifier that takes the form of a URI:

>>> colls = vo.regsearch(keywords=["NVSS"], servicetype='sia')
>>> for coll in colls:
... print coll.identifier
ivo://nasa.heasarc/skyview/nvss
ivo://nasa.heasarc/skyview/sumss

This identifier can be used to uniquely retrieve a service desription from the registry.

>>> nvss = vo.registry.ivoid2service('ivo://nasa.heasarc/skyview/nvss')
>>> nvss.title, nvss.baseurl
('NVSS', 'http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=nvss&')
>>> # search the service in one call
>>> cutouts1 = nvss.search(pos=(148.8888, 69.065) size=0.2)
>>> nvssq = nvss.create_query(size=0.2)  # or create a query object
>>> nvssq.pos = (350.85, 58.815)
>>> cutouts2 = nvssq.execute()

Note

If you want to keep a reference to a single service (say, as part of a list of favorite services), it is better to save the identifier than the base URL. Over time, a service’s base URL can change; however, the identifier will stay the same.

As we end this discussion of the service objects, you can hopefully see that there is a straight-forward chain of discovery classes that connect the registry down through to a dataset. Spelled out in all its detail, it looks like this:

services = vo.regsearch(keywords=["NVSS"],
                    servicetype='sia')          # RegistryResults
nvss = services[0]                              # RegistryResource
nvsss = nvss.service                            # SIAService
nq = nvss.create_query(pos=(350.85, 58.815),
                    size=0.25,
                    format="image/fits")        # SIAQuery
images = nq.execute()                           # SIAResults
firstim = images[0]                             # SIARecord

Most of the time, it’s not necessary to follow all these steps yourself, so there are functions and methods that provide syntactic shortcuts. However, when you need some finer control over the process, it is possible to jump off the fast track and work directly with an underlying object.

Tips for Accessing the Registry from Scripts

As we’ve seen from the examples in this chapter, discovering and selecting services from the registry is often an interative process, particulary when you are not sure what you are looking for and you use the registry as a tool for exploration. In this mode, you will find yourself reviewing registry search results by eye to focus in on those data collections and services of interest.

However, there are a few use cases where non-interactive registry queries–i.e., queries that you can run blindly from a script–work well:

  1. Taking an inventory of all data available for particular postion and/or topic.
  2. Compiling a list of catalogs that include columns that contain particular kinds of data.
  3. Recalling a service of set of services by their IVOA identifiers.
  4. Look for new catalogs or data collections related to a particular topic and recently added to the VO.

The Data Inventory

The first example in the chapter, Getting Started With PyVO, is an example of creating an inventory of a available data. In that case, it was an inventory of available X-ray images of the Cas A supernova remnant. We didn’t actually download these images; instead, we created a table describing the images along with the URL for downloading them later, as desired.

The Hunt for Measurements

You may be creating your own catalog of objects selected for a particular science study. You may want to fill out the columns of your source table with attributes of interest, such as photometry measurements. To do this, you’ll need to find the catalogs that have this data. One simple recipe for doing this would be:

  1. query the registry for all catalogs related to your science using the keywords, waveband, and servicetype as applicable.
  2. For each catalog found, run a metadata search (which just returns an empty table).
  3. Search the columns of each table and find those where the name, ucd, or utype attributes contain the string “mag”.

The selection of columns is somewhat crude for more detailed kinds of data. Using the UCD label, it’s possible to identify columns with particular kinds of magnitudes (e.g J, V, bolometric, etc.) as well as of other types of quantities, such as redshift. See the CDS UCD Info page for a list of ucds that you can look for.

Recalling a Favorite Service

In the previous section, Retrieving a Service By Its Identifier, we discussed how one might create a list of favorite services which include their IVOA Identifiers. Each can be resolved into a service object using the ivoid2service() so that the service can be searched. You may, for example, want to re-search a set of archives periodically to determine if it has any new data since the last time you checked.

Discovering New Additions to the VO

In a similar vein, you may be interested in knowing when new catalogs or data collections, particularly any related to a topic of interest, become available in VO. Here’s a recipe for a script that you would run periodically which can accomplish this:

  1. Execute a registry query that looks for potentially interesting catalogs and collections.
  2. Extract the list of IVOA identifiers returned in the results.
  3. From disk, open the registry search results saved from the previous run of the script and extract the identifiers.
  4. Compare the two lists of identifiers, finding those that appear in the new results that are not in the previous results. These represent the new additions to the VO.
  5. Create a union of the two search result tables and save that as the latest result.
  6. Report the new additions.