How to: use panaSKImg for analysis

pysimdamicm is a Python3 package for CCD skipper: both simulations and analysis. The initial scope of pysimdamicm was only simulations with the main objective of mimic the data acquisition process of DAMIC-M, as well as the [cluster] reconstruction process. The scope of this framework has been extended to analysis as well. pysimdamicm contains a set of tools to process CCD skiper images and to reproduce or mimic the response of a CCD-based detector.

This document presents only information for analysis. Find documentation for simulations here.

panaSKImg, the main python script for analysis, stands for Python tools for ANAlysis of SKipper IMaGes. It is a python3 script to study skipper images during comissioning, analysis, as well as for cluster reconstruction. It can be used to analyze a single image, a set of images, or even to run it as a DQM (data quality monitor) producer.

Framework Design

As in simulations, the phylosophy is to run a sequence of processes on an image with a specific objective. For instance, fitting the dark current, subtracting the pedestal, find clusters, etc. Each process (SKImageProcess abstract class) corresponds to the implementation of a single action on the input data.

Through a configuration file, the processes chain or sequence of processes is defined and configured, as well as the properties of the input image(s). The process manager (ProcessManager class) will configure each process of the chain according to the parameters listed in the configuration JSON file (Config class). Then, the process manager will apply sequentially each process listed in the sequence to the image (RawData class), in the same order they appear in the sequence. Note that the order of the process is really important, as the input of one process could rely on the output of a previous one. How to set the process sequence can be found here (search for sequence).

This can be seen in the following sketch of panaSKImg:


...

    # Set process manager
    pman = ProcessManager()
    # set configuration for all process listed in config (from JSON)
    pman.set_configuration(config)

    # instanciation of a RawData object: image
    rdata = rawdata.BuilderRawData(fits_name, config.configuration['input'])
    rdata.prepare_data()

    # apply sequentially a set of process on rdata
    _ = pman.execute_process(rdata)

    # manager for the output root file
    outman = OutputDataManager(out_cluster_file)
    # create a ROOT file with the clusters
    outman.fill_cluster_collection_tree("clustersRec",run_id,evt_clusters)
    # close output manager
    outman.close()

    ...

List of available process for panaSKImg

All availalbe process can be found at pysimdamicm.processes which habe been divided into four groups:

a) Low level process: skipper_comissioning (abstract class SKImageProcess)

A more detailed information for each of this processes can be found in the official web page of pysimdamicm.

  • FitCalibrationConstant: to measure linearity with single electron peaks info here)

  • RNvsNskipsPlot: to study the noise versus the number of skip measurements info here

  • FFTNoisePlot: to study the noise by uisng the FFT info here

  • ChargeLossPlot: to study the charge losses between single skip measurements info here

b) High level process: skipper_analysis (abstract class SKImageProcess)

A list of jupyter notebooks for these processes can be found at https://gev.uchicago.edu/compton/

  • CompressSkipperProcess: find notebook here

  • PedestalSubtractionProcess: see notebook here

  • CalibrationProcess (coming soon)

  • FitDarkCurrentProcess: find notebook here

c) Reconstruction process: reconstruction (abstract class DigitizeProcess)

A list of jupyter notebooks for these processes can be found at https://gev.uchicago.edu/compton/ and in the official web.

  • SignalPatternRecognition

  • ClusterFinder

  • CreateFitsImage

d) Detector Response process: detector_response (abstract class DigitizeProcess)

This classes are beyond the scope of this document, see sims documentation for more information.

How to run panaSKImg

There are several ways to run panaSKImg:

  1. analysis mode: run a sequence of process in one or several images
  2. DQM mode: process a set of images to create a collection of plots (and a pdf report) to control the data taking.
  3. HELP mode: to display help for the optional command lines, or to list the full set of available process

In the following sections more details on each running mode.

1. analysis mode

  1. Run a sequence of processes over an image and display all plots (--display) and messages (--verbose) done along the execution using a default json file (--json).

    [user@dm ~]$ panaSKImg --json config_file.json --display /data/compton/data/Image_489.fits
    
  2. There are a lot of config parameters from the json file that can also be informed by using a command line (see HELP mode). The values pass by the command line have preference over those given by the configuration file. Let's see an example: Same as before but using a different sequence (-s) of process than the one in the json file

    [user@dm ~]$ panaSKImg --json config_file.json --display /data/compton/data/Image_489.fits -s CompressSkipperProcess,PedestalSubtractionProcess
    
  3. Run the same process chain to all images in the directory /data/compton/data, except (--skip) those containing the string _598 or _631

    [user@dm ~]$ panaSKImg --json config_file.json "/data/compton/data/*.fits" --skip _598 _631
    

    Note the use of " to specify a file name pattern (that also contain which directory the images are in).

See json file documentation (pay spetial attention to the sequence parameter). Be careful, the default values in the default configuration JSON file (provided by pydimdamicm) are optimized for the Compton data, and sequence may not be appropiate for your data.

Additional useful command lines

In all the examples you can also use:

  • --display to display the final plot of each process in the sequence
  • --verbose to display the intermediate ones.
  • --save-img if you want to save the images created by each process
  • --save-plots to record the plots

Some processes may not have implemented any plot. The name of the plots is created automatically using the name of the image and the process class name and will be recorded in the directory given by option -o (being the current working directory the one by default).

An example:

Using a user json file (--json), instead of displaying all plots these will be recorded (--save-plots) in the directory "/data/compton/avgimg" (-o), where also the images will be recorded (--save-img, if any)

[user@dm ~]$ panaSKImg --json config_file.json -o /data/compton/avgimg --save-img --save-plots /data/compton/data/Image_489.fits

It does not matter the order of the command lines.

2. DQM mode

Mode to create a set of plots (called monitor elements) to control the quality of the data taking. The automatic publication of this plots in a web page is comming soon, as well as its store in a MongoDB (which will allow to compare results between runs).

See all plots generated by this mode at compton setup web site.

[user@dm ~]$ panaSKImg --dqm --run 59 --json config_file.json --skip _full_1.fits --image-HR skip_2000 --me-ref /path/to/data/run030/me/summary_run030.pkl  --dqm-data-dir /path/to/DataTaking/Run_59  "Image*fits" -o .

The point in -o . refers to the current working directory (as in the shell commands).

OUTPUTS

This mode creates the following structure of directories under the output directory given by the option -o:

outpu_directory
|
|_run059
    |
    |_avgimg                  where averaged images will be stored
    |_recon                   where the ROOT files for cluster reconstruction will be recorded
    |_logs
    |_others
    |_me                      where all plots will live
       |_dcfit               only plots from the dark current fit (see --image-HR) will live
          |
          |_051_MEFitDCMu0PerRow.png
          |_ ...
       |_pcd                 only plots from the gaussian fit to the single electron peak
          |
          |_MEFitDC_PCD_ovs_Image_Am241_Source_33_10.png
          |_ ...
       |_sed                 pixel charge distribution plot and related
          |
          |_043_MESinglePED.png
          |_ ...
       |
       |_001_MEMeanDCperRow_median.png
       |_001_MEMeanDCperRow.png
       |_...
       |_MEMaskedPixels_mask.png
       |_me_summary_run059.pkl             all information related with all ME in a pickle format
       |_mask_run059.fits                  mask of the run

OPTIONS

Let's see option by option:

  • --dqm: to also call the manager of the monitor elements pysimdamicm.dqm.dqm_manager. By now, all monitor elements defined at pysimdamicm.dqm.me will be done. There is no way to deactivate any of them (coming soon).

  • --run: run number (this mode assumes your data taking is organized in runs which contains a set of images taken under the same detector conditions)

  • --dqm-data-dir: directory where data for --run can be found.

  • [NOT MANDATORY] --json: the same as before, a configuration json file to properly set information on the processes, as well as on the input data properties.

  • [NOT MANDATORY] --skip: a list of strings to skip images

  • [NOT MANDATORY] --image-HR: some of the monitor elements does not apply to the full set of images because may require images with high resolution (i.e. large number of skip measurements). This option is then used to point to this image (in the compton setup this corresponds to the image with 2000 skips). This image is used to get the calibration constant (or gain) as well as to fit the dark current per row.

  • [NOT MANDATORY] --me-ref: point to a pickle me_summary_run059.pkl already generated by the DQM to be used as reference. In this case, all plots will contains results for the current run as well as for the this run, the run used as a reference.

  • [NOT MANDATORY] -o: where all outpus should be stored

A more detailed documentations for this running mode coming soon.

3. HELP mode

Which processes are currently available? This documentation may not be updated as the processes implemented.

To list all processes just do:


[user@dm ~]$ panaSKImg --list-processes help

     CalibrationProcess
     ChargeLossPlot
     ClusterFinder
     CompressSkipperProcess
     ContinuousReadout
     CreateFitsImage
     DarkCurrent
     Diffusion
     ElectronicNoise
     FFTNoisePlot
     FitCalibrationConstant
     FitDarkCurrentProcess
     PedestalSubtractionProcess
     PixelSaturation
     PixelizeSignal
     RNvsNskipsPlot
     SignalPatternRecognition

Available command lines related to the different processes.

As already mentioned, almost a command line exists for each parameter of the configuration json file. To see all these parameter and a shord description just run

[user@dm ~]$ panaSKImg --help

This will display a really long list of command line options and a short description for each of them. All this options will be explained in the howto document of the process to which they refer (see links at the beginning).

The options are grouped by processes as follows

...

*** For Process RNvsNskipsPlot (readout noise study) ************************** :
  --n-skips N_SKIPS_PER_BLOCK
                        RNvsNskipsPlot. Number of skips specifying the
                        incrementation
  --is-blank            RNvsNskipsPlot. Set if the input image is a blank
                        image (if not overscan region will be used)

*** For Process FitCalibrationConstant ************************** :
  --n-peaks N_PEAKS     Number of peaks to be fitted for the lineality study
                        (to estimate the calibration constant)
  --calibration CALIBRATION
                        Starting point for the calibration fitting process
                        (important when fit does not converge)

...

The line

*** For Process RNvsNskipsPlot (readout noise study) ************************** :

is used to separate the different groups and to display the scope of each command line, i.e. the name of the process class where the options live.

Here the full set of options display by the --help command line

usage: panaSKImg [-h] [--skip SKIP [SKIP ...]] [-o OUTPUT] [-e EXTENSION]
                 [-j JSONFILE] [--mask MASK] [--dqm]
                 [--dqm-data-dir DQM_DATA_DIR] [--image-HR IMAGE_HR]
                 [--run RUN] [--me-ref ME_REF] [--all-me] [--invert]
                 [--skip-start ID_SKIP_START] [--skip-end ID_SKIP_END]
                 [--row-start ID_ROW_START] [--row-end ID_ROW_END]
                 [--col-start ID_COL_START] [--col-end ID_COL_END]
                 [-s SEQUENCE] [--list-processes] [--save-img] [--save-plots]
                 [--display] [--verbose] [--cal CALIBRATION]
                 [--func-to-compress FUNC_TO_COMPRESS [FUNC_TO_COMPRESS ...]]
                 [--method METHOD] [--in-overscan] [--axis AXIS]
                 [--n-sigma-win-fit N_SIGMA_WIN_FIT]
                 [--n-sigma-to-mask N_SIGMA_TO_MASK] [--show-fit]
                 [--skip-id-list SKIP_ID_LIST [SKIP_ID_LIST ...]]
                 [--skip-id-baseline SKIP_ID_BASELINE] [--histequ]
                 [--gray-palette] [--n-skips N_SKIPS_PER_BLOCK] [--is-blank]
                 [--n-peaks N_PEAKS] [--calibration CALIBRATION]
                 [--dc-axis DC_AXIS] [--n-elec N_ELEC]
                 [--n-sigma-fit N_SIGMA_FIT] [--mu-gauss MU_GAUSS]
                 [--sigma-gauss SIGMA_GAUSS] [--lambda-poisson LAMBDA_POISSON]
                 [--fit-options FIT_OPTIONS] [--do-calibration]
                 infile

positional arguments:
  infile                Input CCD Image or a pattern file name for multiple
                        inputs, in this case use "" to quote the expression.
                        If extension is not 0, see -e

optional arguments:
  -h, --help            show this help message and exit
  --skip SKIP [SKIP ...]
                        List of strings (for instance '_full_1.fits'). When
                        `infile` is not a single file but a multiple inputs,
                        this option can be used to ignore a set offiles that
                        contains one of these stings
  -o OUTPUT, --ouptut OUTPUT
                        Directory for the outputs (plots and images will be
                        both recorded here)
  -e EXTENSION, --ext EXTENSION
                        Extension to load when more than one image is
                        available in the input file (for instance in nulti-
                        extension fits files)
  -j JSONFILE, --json JSONFILE
                        Configuration JSON file. Run `panaSKImg --json help`
                        to list all configuration parameters. Some of the
                        parameters in the json file can be also pass by
                        command line (see process options)

*** Options for CLUSTERING ************************** :
  --mask MASK           binary-data(1/0) fits file to mask data before
                        clustering

*** Options for DQM ************************** :
  --dqm                 Set to run as a DQM: creation of ME for a given run,
                        store plots and images.When running in DQM mode, a
                        pre-defined directory structure will be created under
                        the directory--output: <output>/runXXX/avgimg for
                        compressed image fits file, <output>/runXXX/me for the
                        ME outputs and <output>/runXXX/logs for the log output
                        files
  --dqm-data-dir DQM_DATA_DIR
                        Absolute or relative path pointing to the directory
                        where the data of the run is.In this case, the
                        mandatory argument (input file name) is used as a
                        pattern file name to select onlythose file under
                        --dqm-data-dir that follows this regular expression
                        (for instance *Source*fits).
  --image-HR IMAGE_HR   Image with high single-electron resolution, i.e. the
                        one with the highest numberof single skip measurements
                        to be used for the FitDarkCurrent (assuming it is in
                        thesame directory as data)
  --run RUN             Run number ID
  --me-ref ME_REF       Absolute or relative path to the me_summary pickle
                        file to use as reference
  --all-me              Set to RUN all monitors elements over all images
                        (Mostly affecting MEFitDC)

*** Related to the input data: JSON[input][image] ************************** :
  --invert              [BOOL] Use to invert the pixel charge (in ADUs) to be
                        proportional to the ionizing charge
                        ('correct_polarity' option in the json file)
  --skip-start ID_SKIP_START
                        First skip to start with (for all processes, if a
                        process use an specific starting point include
                        id_skip_start in its scope in the json file)
  --skip-end ID_SKIP_END
                        First skip to start with (for all processes, if a
                        process use an specific starting point include
                        id_skip_start in its scope in the json file). Note
                        that -1 means last value
  --row-start ID_ROW_START
                        First row to start with (for all processes, if a
                        process use an specific starting point include
                        id_row_start in its scope in the json file)
  --row-end ID_ROW_END  First row to start with (for all processes, if a
                        process use an specific starting point include
                        id_row_start in its scope in the json file). Note that
                        -1 means last value
  --col-start ID_COL_START
                        First col to start with (for all processes, if a
                        process use an specific starting point include
                        id_col_start in its scope in the json file)
  --col-end ID_COL_END  First col to start with (for all processes, if a
                        process use an specific starting point include
                        id_col_start in its scope in the json file). Note that
                        -1 means last value

*** Common to all PROCESS ************************** :
  -s SEQUENCE, --sequence SEQUENCE
                        Coma-separated list of process names (in this case,
                        sequence from json file will be ignored)
  --list-processes      List all available process names
  --save-img            [BOOL] Set to save intermediate images as fits files
  --save-plots          [BOOL] Set to save plots as eps and pdf files
  --display             [BOOL] Running in debug mode (all plots will be also
                        display)
  --verbose             [BOOL] Report extra information/plots during execution
                        (for all booked processes)
  --cal CALIBRATION     Calibration constant to start with (several process
                        has this parameter)

*** For Process CompressSkipperProcess ************************** :
  --func-to-compress FUNC_TO_COMPRESS [FUNC_TO_COMPRESS ...]
                        CompressSkipperProcess. List of functions to reduce
                        the single skipper images into a single one (functions
                        must exist in the numpy package)

*** For Process PedestalSubtractionProcess ************************** :
  --method METHOD       Method to use to compute the pedestal
  --in-overscan         Set to use the full image to estimate the pedestal,
                        instead of only the overscan region
  --axis AXIS           Axis in which the overscan should be computed:
                        row/col/both/none
  --n-sigma-win-fit N_SIGMA_WIN_FIT
                        Number of sigmas to define the spectral window to fit
                        a gaussian to single electron peaks
  --n-sigma-to-mask N_SIGMA_TO_MASK
                        Number of sigmas to define the maximum pixel charge to
                        take into account to estimate the pedestal
  --show-fit            Set to show up several extra plots and information

*** For Process ChargeLossPlot ************************** :
  --skip-id-list SKIP_ID_LIST [SKIP_ID_LIST ...]
                        List of skip index to be display to search for charge
                        loss/gain
  --skip-id-baseline SKIP_ID_BASELINE
                        Index of the single skip image to be used as baseline
                        image
  --histequ             The image will be displayed after equalization
  --gray-palette        Set palettte to gray colors

*** For Process RNvsNskipsPlot (readout noise study) ************************** :
  --n-skips N_SKIPS_PER_BLOCK
                        RNvsNskipsPlot. Number of skips specifying the
                        incrementation
  --is-blank            RNvsNskipsPlot. Set if the input image is a blank
                        image (if not overscan region will be used)

*** For Process FitCalibrationConstant ************************** :
  --n-peaks N_PEAKS     Number of peaks to be fitted for the lineality study
                        (to estimate the calibration constant)
  --calibration CALIBRATION
                        Starting point for the calibration fitting process
                        (important when fit does not converge)

*** For Process FitDarkCurrentProcess ************************** :
  --dc-axis DC_AXIS     Axis to fit the dark current.
  --n-elec N_ELEC       Number of single eletron peaks to use to fit the dark
                        current
  --n-sigma-fit N_SIGMA_FIT
                        Number of sigmas to define the spectral window to
                        estimate the initial values of all paramters to be fit
  --mu-gauss MU_GAUSS   Initial value for the position of the single electron
                        peak at 0 electrons
  --sigma-gauss SIGMA_GAUSS
                        Initial value for the electronic noise (e-/pix)
  --lambda-poisson LAMBDA_POISSON
                        Initial value for the dark current (e-/pix)
  --fit-options FIT_OPTIONS
                        Options for the fitting see ROOT::TGraph::Fit
  --do-calibration      Set if the calibration constant from user should be
                        used (in this case, use calibration option to pass its
                        value)
In [ ]: