sciapy.regress¶
SCIAMACHY regression modelling
This module contains regression models and methods to analyse SCIAMACHY level 2 number density data (only NO for now).
sciapy.regress.load_data¶
SCIAMACHY data interface
Data loading and selection routines for SCIAMACHY data regression fits. Includes reading the solar and geomagnetic index files used as proxies.
-
sciapy.regress.load_data.
load_scia_dzm
(filename, alt, lat, tfmt='jyear', scale=1, subsample_factor=1, subsample_method='greedy', akd_threshold=0.002, cnt_threshold=0, center=False, season=None, SPEs=False)[source]¶ Load SCIAMACHY daily zonal mean data
Interface function for SCIAMACHY daily zonal mean data files version 6.x. Uses
xarray
[1] to load and select the data. Possible selections are by hemispheric summer (NH summer ~ SH winter and SH summer ~ NH winter) and exclusion of strong solar proton events (SPE).[1] https://xarray.pydata.org Parameters: - filename (str) – The input filename
- alt (float) – The altitude
- lat (float) – The longitude
- tfmt (string, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear”
- scale (float, optional) – Scale factor of the data (default: 1)
- subsample_factor (int, optional) – Factor to subsample the data by (see subsample_method) (default: 1 (no subsampling))
- subsample_method (“equal”, “greedy”, or “random”, optional) – Method for subsampling the data (see subsample_factor). “equal” for equally spaced subsampling, “greedy” for selecting the data based on their uncertainty, and “random” for uniform random subsampling. (default: “greedy”)
- center (bool, optional) – Center the data by subtracting the global mean. (default: False)
- season (“summerNH”, “summerSH”, or None, optional) – Select the named season or None for all data (default: None)
- SPEs (bool, optional) – Set to True to exclude strong SPE events (default: False)
Returns: (times, dens, errs) – The measurement times according to the tfmt keyword, the number densities, and their uncertainties.
Return type: tuple of (N,) array_like
-
sciapy.regress.load_data.
load_solar_gm_table
(filename, cols, names, sep='\t', tfmt='jyear')[source]¶ Load proxy tables from ascii files
This function wraps
numpy.genfromtxt()
[2] with pre-defined settings to match the file format. It explicitly returns the times as UTC decimal years or julian epochs.[2] https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html Parameters: - filename (str) – The file to read
- cols (sequence) – The columns in the file to use, passed to numpy.genfromtxt’s usecols keyword. Should be at least of length 2, indicating time values in the first column, e.g., cols=[0, 1].
- names (sequence) – The column names, passed as names to numpy.genfromtxt(). Should be at least of length 2, naming the proxy values in the second column, e.g., names=[“time”, “proxy”].
- sep (str, optional) – The column separator character, passed as sep. Default: ` `
- tfmt (str, optional) – The astropy.time “Time Format” for the time units, for example, “jyear”, “decimalyear”, “jd”, “mjd”, etc. See: http://docs.astropy.org/en/stable/time/index.html#time-format Default: “jyear”
Returns: - times (array_like) – The proxy times according to the tfmt keyword (UTC) as
numpy.ndarray
from the first column of “cols”. - values (tuple of array_like) – The proxy values combined into a structured array
(
numpy.recarray
) with fields set according to “names[1:]” and “cols[1:]” passed above.
See also
sciapy.regress.mcmc¶
SCIAMACHY data regression MCMC sampler
Markov Chain Monte Carlo (MCMC) routines to sample
the posterior probability of SCIAMACHY data regression fits.
Uses the emcee.EnsembleSampler
[3] do do the real work.
[3] | https://emcee.readthedocs.io |
-
sciapy.regress.mcmc.
mcmc_sample_model
(model, y, beta=1.0, nwalkers=100, nburnin=200, nprod=800, nthreads=1, optimized=False, bounds=None, return_logpost=False, show_progress=False, progress_mod=10)[source]¶ Markov Chain Monte Carlo sampling interface
MCMC sampling interface to sample posterior probabilities using the
emcee.EnsembleSampler
[4].[4] https://emcee.readthedocs.io Arguments: - model (celerite.GP, george.GP, or sciapy.regress.RegressionModel instance) – The model to draw posterior samples from. It should provide either log_likelihood() and log_prior() functions or be directly callable via __call__().
- y ((N,) array_like) – The data to condition the probabilities on.
- beta (float, optional) – Tempering factor for the probability, default: 1.
- nwalkers (int, optional) – The number of MCMC walkers (default: 100). If this number is smaller than 4 times the number of parameters, it is multiplied by the number of parameters. Otherwise it specifies the number of parameters directly.
- nburnin (int, optional) – The number of burn-in samples to draw, default: 200.
- nprod (int, optional) – The number of production samples to draw, default: 800.
- nthreads (int, optional) – The number of threads to use with a multiprocessing.Pool, used as pool for emcee.EnsembleSampler. Default: 1.
- optimized (bool, optional) – Indicate whether the actual (starting) position was determined with an optimization algorithm beforehand. If False (the default), a pre-burn-in run optimizes the starting position. Sampling continues from there with the normal burn-in and production runs. In that case, latin hypercube sampling is used to distribute the walker starting positions equally in parameter space.
- bounds (iterable, optional) – The parameter bounds as a list of (min, max) entries. Default: None
- return_logpost (bool, optional) – Indicate whether or not to return the sampled log probabilities as well. Default: False
- show_progress (bool, optional) – Print the percentage of samples every progress_mod samples. Default: False
- progress_mod (int, optional) – Interval in samples to print the percentage of samples. Default: 10
Returns: samples or (samples, logpost) – (nwalkers * nprod, ndim) array of the sampled parameters from the production run if return_logpost is False. A tuple of an (nwalkers * nprod, ndim) array (the same as above) and an (nwalkers,) array with the second entry containing the log posterior probabilities if return_logpost is True.
Return type: array_like or tuple