What is HEALPix?
HEALPix stands for Hierarchical Equal Area isoLatitude Pixelisation and is an algorithm for dividing up the sky in equal-area pixels. Each pixel can in turn be partitioned to 4 equal-area pixels and so forth providing a progressively finer resolution with each level.
Diagram of progressively smaller HEALPix pixels. Credit: NASA JPL |
There are two main Python packages for working with HEALPix- healpy and astropy-healpix. I found the later easier to use, and the licensing is more permissive. It's fairly straightforward to go back and forth between astronomical coordinates and HEALPix integers and to calculate a set of pixels that are inside a search area:
from astropy_healpix import HEALPix, pixel_resolution_to_nside from astropy.coordinates import ICRS, SkyCoord from astropy import units as u # nside required for chosen resolution resolution = 10 * u.arcsec nside = pixel_resolution_to_nside(resolution, round='up') # HEALPix object with that resolution hp = HEALPix(nside=nside, order='nested', frame=ICRS()) # HEALPix to SkyCoord object coords = hp.healpix_to_skycoord([42]) print(coords) # Output SkyCoord (ICRS): (ra, dec) in deg [(44.99038696, 0.00932548)] # SkyCoord object to HEALPix coords = SkyCoord(ra=34*u.deg, dec=-23*u.deg) print(hp.skycoord_to_healpix(coords)) # Output 9542850888 # Example cone search coords = SkyCoord(ra=34*u.deg, dec=-23*u.deg) hp_to_search = hp.cone_search_skycoord(coords, radius=5 * u.arcmin) print(len(hp_to_search)) print(hp_to_search[0:10]) # Output 6988 [9542850888 9542850889 9542850891 9542850890 9542850847 9542850845 9542850839 9542850882 9542850883 9542850886]
In this example, I used a resolution element of at most 10 arcsecond to initiate my HEALPix object. The smaller the resolution, the larger the integer result for a given coordinate and the more values will be returned for a cone search. This particular case yielded nearly 7,000 healpix values for a cone search of 5 arcminutes. As mentioned earlier, I have a Jupyter notebook that walks you through a bit more detail on this.
Updating our NoSQL Database
An advantage of HEALPix is that it's a single integer that describes a position in the sky and, for a given resolution level, reference frame, and ordering scheme, is unique. As such, you can represent any coordinates you have as a single number (or list of numbers if you are describing an arbitrary shape) and this works very nicely when implementing as an index in any database. Since, we've been working with MongoDB, we'll update our BrownDwarf database we created in the first tutorial to include HEALPix values for all our documents.
Updating everything is actually fairly trivial:
import pymongo client = pymongo.MongoClient() # default connection (ie, local) db = client['test'] # database dwarfs = db.dwarfs # collection # Loop over those without coords.healpix and set the value cursor = dwarfs.find({'coords.healpix': {'$exists': False}}) for doc in cursor: coords = SkyCoord(ra=doc['coords']['ra']*u.deg, dec=doc['coords']['dec']*u.deg) healpix = int(hp.skycoord_to_healpix(coords)) dwarfs.update_one({'_id': doc['_id']}, {'$set': {'coords.healpix': healpix}}) # Create an index on the HEALPix values for faster queries if 'healpix' not in dwarfs.index_information(): dwarfs.create_index([('coords.healpix', pymongo.ASCENDING)], name='healpix', background=True)
And with that done, we can create a cone search function to easily implement a way to search for objects in our database by their astrophysical coordinates:
def cone_search(ra, dec, radius, collection=dwarfs, field='coords.healpix'): # Function to perform a simple cone search against a MongoDB collection # ra, dec in degrees; radius in arcseconds coords = SkyCoord(ra=ra*u.deg, dec=dec*u.deg) hp_to_search = hp.cone_search_skycoord(coords, radius=radius * u.arcsec) cursor = collection.find({'coords.healpix': {'$in': [int(h) for h in hp_to_search]}}) return cursor # Example use cursor = cone_search(181.9, -39.5, 180.) for doc in cursor: print(doc['source_id'], doc['coords']) # Output 11 {'ra': 181.889, 'dec': -39.548, 'healpix': 6443584085}
A fair warning: computing healpix values for a large region of the sky can be a bit time consuming, so you may want to implement cuts in your cone search functions to prevent large queries. Furthermore, you'll need to use the same healpix parameters as otherwise the values will be different and you won't match the correct entries in your database.
What About MOC?
Users familiar with HEALPix may also know about MOC- Multi-Order Coverage maps. These are representations of the spatial coverage of some field of view in a hierarchical format based on HEALPix. As a standard, they use nested numbering in ICRS equatorial coordinates. The key aspect is that if all healpix values at some level are included in the coverage map, MOC would store the lowest level possible. So a representation of MOC would be something like {10: [342, 343], 11: [1145, 1146, 1150]}. I've made up the numbers in this example, but what this means is that at level 10 it includes healpix values of 342 and 343, but at level 11 it has 1145, 1146, and 1150. Each level 10 pixel could be written as 4 level 11 pixels, but by writing it in this fashion it becomes far more compact. You effectively only save the largest possible pixels that encompass your area.
Unfortunately, this does not lend itself very well to the simple cone search functionality we devised above. While python packages exist to represent targets and fields of view as MOC, it's not clear to me that saving a JSON representation of a MOC in a MongoDB database would lead to an efficient search. In my mind, the only way to search for it would be to read out all stored values in the database, reconstruct the MOC objects, and then perform an intersection for your target. If you have millions of sources in your database, I imagine this can become an expensive operation. If you are an expert on MOC or are curious to try this out, let me know. It may be a worthwhile project to investigate and I have a hack day coming up as part of Python in Astronomy 2019. If you'll be attending and interested in NoSQL databases, feel free to touch base with me.
Final Thoughts
In this tutorial, we briefly covered HEALPix to get an integer representation of two-dimension coordinates in space. By doing this, we were able to update our MongoDB database to store these values for our targets and build a simple client-side function to retrieve documents that match a given set of HEALPix values. Because these are just integer numbers, it's trivial to set up in most databases and provides a fast way to implement cone searches. This works for more than just point sources too, since, if your database can store arrays (like MongoDB can), you can store a list of HEALPix values representing the area of your object. This could be used, for example, in our FITS header database to represent the sky coverage of each individual image.
Some databases, including MongoDB, have built-in methods to do spatial indexing. I hope to have a tutorial shortly about MongoDB's geospatial queries to guide you into using their methods to perform similar type of searches as the ones we worked through here.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.