sciapy.regress¶

SCIAMACHY regression modelling

This module contains regression models and methods to analyse SCIAMACHY level 2 number density data (only NO for now).

sciapy.regress.load_data¶

SCIAMACHY data interface

Data loading and selection routines for SCIAMACHY data regression fits. Includes reading the solar and geomagnetic index files used as proxies.

sciapy.regress.load_data.load_scia_dzm(filename, alt, lat, tfmt='jyear', scale=1, subsample_factor=1, subsample_method='greedy', akd_threshold=0.002, cnt_threshold=0, center=False, season=None, SPEs=False)[source]¶

Load SCIAMACHY daily zonal mean data

Interface function for SCIAMACHY daily zonal mean data files version 6.x. Uses xarray [1] to load and select the data. Possible selections are by hemispheric summer (NH summer ~ SH winter and SH summer ~ NH winter) and exclusion of strong solar proton events (SPE).

[1]	https://xarray.pydata.org

Parameters:	filename (str) – The input filename alt (float) – The altitude lat (float) – The longitude tfmt (string, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear” scale (float, optional) – Scale factor of the data (default: 1) subsample_factor (int, optional) – Factor to subsample the data by (see subsample_method) (default: 1 (no subsampling)) subsample_method (“equal”, “greedy”, or “random”, optional) – Method for subsampling the data (see subsample_factor). “equal” for equally spaced subsampling, “greedy” for selecting the data based on their uncertainty, and “random” for uniform random subsampling. (default: “greedy”) center (bool, optional) – Center the data by subtracting the global mean. (default: False) season (“summerNH”, “summerSH”, or None, optional) – Select the named season or None for all data (default: None) SPEs (bool, optional) – Set to True to exclude strong SPE events (default: False)
Returns:	(times, dens, errs) – The measurement times according to the tfmt keyword, the number densities, and their uncertainties.
Return type:	tuple of (N,) array_like

sciapy.regress.load_data.load_solar_gm_table(filename, cols, names, sep='\t', tfmt='jyear')[source]¶

Load proxy tables from ascii files

This function wraps numpy.genfromtxt() [2] with pre-defined settings to match the file format. It explicitly returns the times as UTC decimal years or julian epochs.

[2]	https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

Parameters:

filename (str) – The file to read
cols (sequence) – The columns in the file to use, passed to numpy.genfromtxt’s usecols keyword. Should be at least of length 2, indicating time values in the first column, e.g., cols=[0, 1].
names (sequence) – The column names, passed as names to numpy.genfromtxt(). Should be at least of length 2, naming the proxy values in the second column, e.g., names=[“time”, “proxy”].
sep (str, optional) – The column separator character, passed as sep. Default: ` `
tfmt (str, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear”

Returns:

times (array_like) – The proxy times according to the tfmt keyword (UTC) as numpy.ndarray from the first column of “cols”.
values (tuple of array_like) – The proxy values combined into a structured array (numpy.recarray) with fields set according to “names[1:]” and “cols[1:]” passed above.

sciapy.regress.mcmc¶

SCIAMACHY data regression MCMC sampler

Markov Chain Monte Carlo (MCMC) routines to sample the posterior probability of SCIAMACHY data regression fits. Uses the emcee.EnsembleSampler [3] do do the real work.

[3]	https://emcee.readthedocs.io

sciapy.regress.mcmc.mcmc_sample_model(model, y, beta=1.0, nwalkers=100, nburnin=200, nprod=800, nthreads=1, optimized=False, bounds=None, return_logpost=False, show_progress=False, progress_mod=10)[source]¶

Markov Chain Monte Carlo sampling interface

MCMC sampling interface to sample posterior probabilities using the emcee.EnsembleSampler [4].

[4]	https://emcee.readthedocs.io

Arguments:	model (celerite.GP, george.GP, or sciapy.regress.RegressionModel instance) – The model to draw posterior samples from. It should provide either log_likelihood() and log_prior() functions or be directly callable via __call__(). y ((N,) array_like) – The data to condition the probabilities on. beta (float, optional) – Tempering factor for the probability, default: 1. nwalkers (int, optional) – The number of MCMC walkers (default: 100). If this number is smaller than 4 times the number of parameters, it is multiplied by the number of parameters. Otherwise it specifies the number of parameters directly. nburnin (int, optional) – The number of burn-in samples to draw, default: 200. nprod (int, optional) – The number of production samples to draw, default: 800. nthreads (int, optional) – The number of threads to use with a multiprocessing.Pool, used as pool for emcee.EnsembleSampler. Default: 1. optimized (bool, optional) – Indicate whether the actual (starting) position was determined with an optimization algorithm beforehand. If False (the default), a pre-burn-in run optimizes the starting position. Sampling continues from there with the normal burn-in and production runs. In that case, latin hypercube sampling is used to distribute the walker starting positions equally in parameter space. bounds (iterable, optional) – The parameter bounds as a list of (min, max) entries. Default: None return_logpost (bool, optional) – Indicate whether or not to return the sampled log probabilities as well. Default: False show_progress (bool, optional) – Print the percentage of samples every progress_mod samples. Default: False progress_mod (int, optional) – Interval in samples to print the percentage of samples. Default: 10
Returns:	samples or (samples, logpost) – (nwalkers * nprod, ndim) array of the sampled parameters from the production run if return_logpost is False. A tuple of an (nwalkers * nprod, ndim) array (the same as above) and an (nwalkers,) array with the second entry containing the log posterior probabilities if return_logpost is True.
Return type:	array_like or tuple