sciapy.regress

SCIAMACHY regression modelling

This module contains regression models and methods to analyse SCIAMACHY level 2 number density data (only NO for now).

sciapy.regress.load_data

SCIAMACHY data interface

Data loading and selection routines for SCIAMACHY data regression fits. Includes reading the solar and geomagnetic index files used as proxies.

sciapy.regress.load_data.load_scia_dzm(filename, alt, lat, tfmt='jyear', scale=1, subsample_factor=1, subsample_method='greedy', akd_threshold=0.002, cnt_threshold=0, center=False, season=None, SPEs=False)[source]

Load SCIAMACHY daily zonal mean data

Interface function for SCIAMACHY daily zonal mean data files version 6.x. Uses xarray [1] to load and select the data. Possible selections are by hemispheric summer (NH summer ~ SH winter and SH summer ~ NH winter) and exclusion of strong solar proton events (SPE).

[1]https://xarray.pydata.org
Parameters:
  • filename (str) – The input filename
  • alt (float) – The altitude
  • lat (float) – The longitude
  • tfmt (string, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear”
  • scale (float, optional) – Scale factor of the data (default: 1)
  • subsample_factor (int, optional) – Factor to subsample the data by (see subsample_method) (default: 1 (no subsampling))
  • subsample_method (“equal”, “greedy”, or “random”, optional) – Method for subsampling the data (see subsample_factor). “equal” for equally spaced subsampling, “greedy” for selecting the data based on their uncertainty, and “random” for uniform random subsampling. (default: “greedy”)
  • center (bool, optional) – Center the data by subtracting the global mean. (default: False)
  • season (“summerNH”, “summerSH”, or None, optional) – Select the named season or None for all data (default: None)
  • SPEs (bool, optional) – Set to True to exclude strong SPE events (default: False)
Returns:

(times, dens, errs) – The measurement times according to the tfmt keyword, the number densities, and their uncertainties.

Return type:

tuple of (N,) array_like

sciapy.regress.load_data.load_solar_gm_table(filename, cols, names, sep='\t', tfmt='jyear')[source]

Load proxy tables from ascii files

This function wraps numpy.genfromtxt() [2] with pre-defined settings to match the file format. It explicitly returns the times as UTC decimal years or julian epochs.

[2]https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
Parameters:
  • filename (str) – The file to read
  • cols (sequence) – The columns in the file to use, passed to numpy.genfromtxt’s usecols keyword. Should be at least of length 2, indicating time values in the first column, e.g., cols=[0, 1].
  • names (sequence) – The column names, passed as names to numpy.genfromtxt(). Should be at least of length 2, naming the proxy values in the second column, e.g., names=[“time”, “proxy”].
  • sep (str, optional) – The column separator character, passed as sep. Default: ` `
  • tfmt (str, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear”
Returns:

  • times (array_like) – The proxy times according to the tfmt keyword (UTC) as numpy.ndarray from the first column of “cols”.
  • values (tuple of array_like) – The proxy values combined into a structured array (numpy.recarray) with fields set according to “names[1:]” and “cols[1:]” passed above.

sciapy.regress.mcmc

SCIAMACHY data regression MCMC sampler

Markov Chain Monte Carlo (MCMC) routines to sample the posterior probability of SCIAMACHY data regression fits. Uses the emcee.EnsembleSampler [3] do do the real work.

[3]https://emcee.readthedocs.io
sciapy.regress.mcmc.mcmc_sample_model(model, y, beta=1.0, nwalkers=100, nburnin=200, nprod=800, nthreads=1, optimized=False, bounds=None, return_logpost=False, show_progress=False, progress_mod=10)[source]

Markov Chain Monte Carlo sampling interface

MCMC sampling interface to sample posterior probabilities using the emcee.EnsembleSampler [4].

[4]https://emcee.readthedocs.io
Arguments:
  • model (celerite.GP, george.GP, or sciapy.regress.RegressionModel instance) – The model to draw posterior samples from. It should provide either log_likelihood() and log_prior() functions or be directly callable via __call__().
  • y ((N,) array_like) – The data to condition the probabilities on.
  • beta (float, optional) – Tempering factor for the probability, default: 1.
  • nwalkers (int, optional) – The number of MCMC walkers (default: 100). If this number is smaller than 4 times the number of parameters, it is multiplied by the number of parameters. Otherwise it specifies the number of parameters directly.
  • nburnin (int, optional) – The number of burn-in samples to draw, default: 200.
  • nprod (int, optional) – The number of production samples to draw, default: 800.
  • nthreads (int, optional) – The number of threads to use with a multiprocessing.Pool, used as pool for emcee.EnsembleSampler. Default: 1.
  • optimized (bool, optional) – Indicate whether the actual (starting) position was determined with an optimization algorithm beforehand. If False (the default), a pre-burn-in run optimizes the starting position. Sampling continues from there with the normal burn-in and production runs. In that case, latin hypercube sampling is used to distribute the walker starting positions equally in parameter space.
  • bounds (iterable, optional) – The parameter bounds as a list of (min, max) entries. Default: None
  • return_logpost (bool, optional) – Indicate whether or not to return the sampled log probabilities as well. Default: False
  • show_progress (bool, optional) – Print the percentage of samples every progress_mod samples. Default: False
  • progress_mod (int, optional) – Interval in samples to print the percentage of samples. Default: 10
Returns:

samples or (samples, logpost) – (nwalkers * nprod, ndim) array of the sampled parameters from the production run if return_logpost is False. A tuple of an (nwalkers * nprod, ndim) array (the same as above) and an (nwalkers,) array with the second entry containing the log posterior probabilities if return_logpost is True.

Return type:

array_like or tuple