Skip to the content.

Application

Proposal title

Astronomy Using Unevenly Sampled Data

Organization

OpenAstronomy(StingraySoftware)

Summary

Analyzing unevenly sampled data cannot be done with methods of the Fourier domain. We use the Lomb Scargle method to account for the unevenness of the data. I will be implementing the Lomb Scargle cross spectrum and power spectrum from scratch. Then I will be creating wrapper classes and helper functions that will provide a sensible API to the user. The Lomb Scargle cross spectrum implementation can be optimized by using JAX. Reason for choosing to do the power spectrum from scratch is that it allows us to get a raw PSD to which we can apply stingray’s methods of normalisation and astropy’s implementation does not have that option, and their normalisations are different from the ones used in stingray, and we can use the \(\mathcal{O} (N\log{}N)\) algorithm proposed by Press and Rybicki. There would also be a need of helper functions to create objects of these classes automatically from various input types similar to the existing PSDs. Extra parameter of maximum frequency is to be given as it cannot be inferred from the data as equal to 1/time between two data points. Further the frequency resolution cannot be inferred due to the same reason. The cross spectrum class must be compatible with statistic functions already existing in stingray such as the time lag, phase lag and coherence (which exist only for cross spectra) (if this turns out to not be possible then create new similar statistic functions).

Contributor Information

Background

I am a second year undergraduate student of Chaitanya Bharathi Institute of Technology studying Artificial Intelligence and Data Science. I have a lot of hobbies out of which some of them include astrophotography, Guitar(Rock and Pop Trinity Grade 8) and chess. Furthermore, I originally wanted to study a course in Computational Astronomy/Astrophysics as it is the confluence of all my interests but could not do it due to lack of availability of such a course in my country. I am passionate about using, propagating and contributing to open source software. I also like to do astrophotography in my free time, though I cannot really produce great images as I am limited by a Bortle 9 sky and lack of a tracker.

Relevant Experience

Interest in Open Astronomy

I’ve always been interested in astronomy. I also delved into astrophotography recently. Furthermore, I originally wanted to study astrophysics/astronomy along with computer science as my major and minor but the institutes in my country neither offer such flexibility nor the courses in one institute. Not only that, but I have done a few courses on Coursera on astronomy such as Data Driven Astronomy and Analyzing The Universe. Furthermore, I am passionate about using and propagating open source software, and I feel this is a multi-faceted way of giving back to the community that I like so much. Likewise, I feel my skill set is best suited for OpenAstronomy compared to any other project as I have a decent amount of domain as well as programming and software design knowledge. I am always looking to learn more. I view research work and work that aids in other’s research as a noble pursuit. Furthermore, I hope OpenAstronomy will not only be my journey into the world of FOSS but also to research, and it will help me gain experience for higher studies.

Detailed Description

The project involves time series analysis of unevenly sampled data. The goal is to create the classes that generates Lomb-Scargle Power spectrum and Crossspectrum of the data respectively and match these implementations with the structure of existing functionalities of Powerspectrum and Crosspectrum classes as well as the normalisations performed by them.

Time-lag, phase lag and coherence of light curves are also to be found out in the case of cross spectra.

The implementation of cross spectrum is to be done from scratch in Python as it only exists in MATLAB and FORTRAN courtesy of Dr. Jeffrey D. Scargle (Studies in astronomical time series analysis. III-Fourier transforms, autocorrelation functions, and cross-correlation functions of unevenly spaced data).

The implementation of power spectrum may not be required since a compatible implementation exists in scipy.signal.LombScargle. We can refer to the original paper by Jeffrey D. Scargle Studies in astronomical time series analysis. II-Statistical aspects of Spectral Analysis of Unevenly Spaced Data. But it is useful to implement it from scratch as we can use the optimizations introduced by William H. Press and George B. Rybicki as shown below.

A deep dive into Lomb Scargle Method and various optimizations

Referring Review paper by Jacob T. VanderPlas

The Lomb Scargle method uses the mean of the sampling intervals to account for the uneven sampling intervals. Lomb Scargle method is equivalently interpreted as a Fourier method as well as a least squares method. The generalized periodogram form ensures time shift invariance.

The least squares approach works by minimizing the chi^2 test statistic of a function \(y(t;f) = A_f \sin(2\pi f(t-\phi_f))\) where amplitude \(A_f\) and phase \(\phi_f\) vary with frequency.

\[\chi^2 = \sum_n (y_n - y(t_n;f))^2\]

The periodogram using this approach is given by \(P(f) = \frac{\chi^2-\chi^2(f)}{2}\) where \(\chi^2(f)\) is the \(\chi^2\) test statistic for the given frequency and \(\chi^2\) is the non varying reference model.

The periodogram can also be derived by extending the classical Schuster periodogram by generalizing it to unevenly sampled data which comes from the fourier domain of methods.

Given by \(P(f) = \frac{1}{2} [{ \frac{ (\sum_n{g_n\cos(2\pi f[t_n-\tau])})^2} {\sum_n\cos^2(2\pi f[t_n-\tau])} + \frac{(\sum_ng_n\sin(2\pi f[t_n-\tau]))^2} {\sum_n\sin^2(2\pi f[t_n-\tau]) }}]\) where \(\ \tau =\frac{1}{4\pi f} \tan^{-1} {\frac{\sum_n\sin(4\pi f_n)}{\sum_n{\cos(4\pi f_n)}}}\)

We can modify the least squares approach to account for Gaussian errors. Where \(\chi^2\) test statistic is modified as follows

\[\chi^2(f) = \sum_n \frac{(y_n - y_{model}(t_n;f))^2}{\sigma_n^2}\]

Another important modification made is to add an offset term to the sinusoidal model at each frequency in the least squares approach.

\[y_{model}(t;f) = y_0(f) + A_f \sin(2\pi f(t-\tau))\]

The \(y_{model}\) can be further generalized by adding K - 1 terms to it.

\[y_{model}(t;f) = A_f^0 + \sum{K}_{k=1} A_f^{k} \sin(2\pi kf(t-\phi_f^k))\]

These implementations are \(\mathcal{O}(N^2)\)

The above can be further optimized by using the optimizations introduced by William H. Press and George B. Rybicki which brings down the overall time complexity of the algorithm to \(\mathcal{O}(N\log{}N)\)

The optimizations basically include the following:

The following are the practical considerations when implementing the method

My approach

Proposed Structure

Filename Description
utils.py helper functions
LombScargle.py Implementation of the power spectrum and cross spectrum classes

Additions to utils.py

Helper functions like ACF, CCF that work with unevenly sampled data.

Lomb Scargle Cross Spectrum class

Wraps the Lomb Scargle cross spectrum in our own implementation. Handles input of two light curves, eventlists, timearray and light curve iterable. Needs to perform data validity checks and normalisation(leahy/absolute rms/fractional rms).

Class parameters

Parameter Description Default
data1 light curve or eventlist None
data2 light curve or eventlist None
power_type real or complex power all
norm Normalisation methods(frac RMS, abs RMS, leahy) Fractional RMS
dt time resolution only needed for eventlist objects Automatically selected sensible value
gti time intervals which have good data (if not none overrides common gtis from both light curves) Common gti from both light curves
skip_checks flag to skip checks False
maxfreq maximum frequency to check for Automatically chosen sensible value
df frequency resolution Automatically chosen sensible value

Class attributes

Attribute Description
freq array of mid bin frequencies that the lsft samples
power array of cross spectra (complex numbers)
power error uncertainties in power
m the number of averages cross spectra amplitudes in each bin
n the number of data points/time bins in one segment of the light curves
k the rebinning scheme if object has been rebinned otherwise is set to 1
nphots1 the total number of photons in light curve 1
nphots2 the total number of photons in light curve 2

NOTE : This is a rough outline and the attributes and parameters may change depending on further research at the time of implementation

Lomb Scargle Power Spectrum

Using our own implementation of Press and Rybicki’s fast Lomb Scargle power spectrum. Handles input of one light curve. Helper functions that allow creation of PSD from timearray, eventlists and light curve iterable must be created. And must perform validity checks and normalisation(leahy/absolute rms/fractional rms).

Class parameters

Parameter Description Default
data light curve or eventlist None
power_type real or all powers all
norm Normalisation methods(frac RMS, abs RMS, leahy) Fractional RMS
dt time resolution only needed for eventlist objects Automatically chosen sensible value
gti time intervals which have good data (if not None overrides existing gti in light curve) None
skip_checks flag to skip checks False
df frequency resolution Automatically chosen sensible value
maxfreq maximum frequency to check for Automatically chosen sensible value

Class attributes

Attribute Description
freq array of mid bin frequencies that the lsft samples
power array of cross spectra (complex numbers)
power error uncertainties in power
m the number of averages cross spectra amplitudes in each bin
n the number of data points/time bins in one segment of the light curves
k the rebinning scheme if object has been rebinned otherwise is set to 1
nphots the total number of photons in light curve

NOTE : This is a rough outline and the attributes and parameters may change depending on further research at the time of implementation

Deliverables

  1. Implementation of Lomb Scargle Cross Spectrum along with helper functions, statistical functions, user facing wrapper class, tests and documentation
  2. Implementation of Lomb Scargle Power Spectrum along with helper functions, statistical functions, user facing wrapper class, tests and documentation

Description and Timeline

Period Description
May 4th - 28th Getting to know mentors, reading documentation, setup development environment, contributing fixes to smaller issues to get familiar with the codebase and getting up to speed with working on the project.
May 29th - June 4th Implementing the Lomb Scargle Crossspectrum
June 5th - June 19th Implementing the Lomb Scargle Crossspectrum class
June 19th - July 2nd Implementing Lomb Scargle Power Spectrum class
July 3rd - July 16th Implementing the helper functions
July 14th Midterm evaluation deadline
July 31st - August 13th JAX optimization, tests and documentation : tests for the wrapper classes as well as the core implementation , documentation for the API and how to use the class
August 13th - August 21st Buffer period and improving code quality
August 21st - August 28th Final evaluation week
August 28th - September 4th Mentor final evaluation week
September 5th - September 24th Buffer period, comprehensive tests and documentation

GSoC

I have not participated previously in GSoC. I am not applying to any other projects as this project has piqued my interest the most and aligns the most with my knowledge and skill set.

Schedule availability

I don’t really have any plans for holidays during the time of the program. There are barely any holidays between my 4th and 5th semesters which would be around 10-15 days in late August early September. I would be willing to work during these days as well if need be. I would have slightly reduced output from 13th July to Mid/Late August due to semester end exams. My window of maximum productivity would be from May 29th ~ June 25th. I would not like to put behind any work for the buffer period as it clashes with my semester end exams. I would most likely be to work at full productivity from roughly 11th August onward assuming exam schedule would be similar to current semester. Likewise, I would try and cover the work of later weeks early if any unforeseen circumstances arise.