The main objective of this document is to present the RawData
class without going into too much detail, but presenting all the most important skills in order to understand the philosophy of panaSKImg
.
For this howto it is important to have looked at the documentation related with the configuration parameters (see previous howto on this serie). It is not necessary to go into detail, but to know their existence, format and what they are used for.
Here we will learn the representation of a fits file image within pysimdamicm
. The data (i.e. the fits file image) is an instance of the RawData
class, which represents our understanding of a CCD skipper image:
RawData
¶There are a large set of parameters to properly interpret the data input. All of them have been introduced for the Config howto, see input
section of the configuration JSON file. In short, we would have
Ncols
Nrows
Nskips
Npbin
Nsbin
axis_to_compress
id_col_end
id_col_start
id_row_end
id_row_start
id_skip_end
id_skip_start
correct_leach_bug
correct_polarity
Some of the will be explained again throughout this document.
RawData
¶An instance of RawData
will have several data members, the most important are:
image
data member¶This is created during instanciation containing all pixel charge values in a 3D array: (rows, cols, skips). The third dimensions for the skipper measurements. The image is therefore an array of dimension 3, where this thrid dimensiono corresponds to the k-th measure of the pixel in the position (y,x), i.e.
image[row,col,k]
mask_image_active_region
¶mask_image_overscan_cols
, mask_image_overscan_rows
¶mask_image_prescan_cols
, mask_image_prescan_rows
¶These are the 5 regions that any CCD image can have. Corresponds to the active region, the overscan in columns and rows (where no exposure time have been taken providing an snapshot of the baseline or pedestan of an image) and prescan in columns and rows (respectively).
These data members are created for method RawData.prepare_data
being $\color{red}{\text{boolean arrays of dimension 2}}$ with shape number of rows per number of columns:
(Nrows,Ncols)
Note that all the data members has the same shape: they are just masks with True and False values, to define which pixels belongs to which region.
Along panaSKImg
masked arrays will be used to select any of these regions. For instance, to compute the average pixel charge on the overscan on columns is as simple as
numpy.ma.array( RawData.image, mask=RawData.mask_image_overscan_cols).mean()
In this howto there are two main sections:
Image as the instance of the RawData class
Learn a little bit more about the data members of RawData, and how to get the mask for each region
In this howto we will use the CompressionSkipperProcess
class to get the averaged class and play a little bit with the mask for the serveral regions.
This howto will only be done from a Python Interpreter, meaning that we will not use panaSKImg
, since the main objective is to learn a little about the data structure as an instance of RawData
.
Instanciation of RawData
image
data member
RawData
for already compressed images (averaged image)%matplotlib inline
import pysimdamicm as ccd
import numpy as np
from matplotlib import pyplot as plt
The data we will use along this serie of How to notebooks is sotred in fits files format with a pattern file name Image_Am241_Source_27_XXX_fits being XXX
the file number.
To access for instance to file number 25, we will use the function format
as follows:
path_to_raw_data
img_pattern_file_name
file_name
# where raw data is
path_to_raw_data = "/media/ncastello/WORK/damicm/compton/calidaq_backup/DataTaking/Am241/Run_55"
# patter file name
img_pattern_file_name = "Image_Am241_Source_55_{}.fits"
# absolute path to the file name
file_name = "{}/{}".format(path_to_raw_data,img_pattern_file_name)
# see how looks like the concatenation
print( file_name )
Note that, this string contains the symbol {} which is used to tell format
where its inputs should be placed.
See that with an example: the name for file number 25 can be build as follows
print( "Name for file number 25 is: ", file_name.format(25) )
RawData
¶BuilderRawData
function¶In this section we will learn how data from a fits file is structured as a RawData
object.
The raw data file can be loaded by using the BuilderRawData
function provided for pysimdamicm.io.rawdata
. There are two parametrized constructors for this class:
file_name
)file_config
).With the former there is no information about how data is recorded on the fits file, and necessary information to properly interpret the data will be missed. Like for instance
All this parameters can be defined in a dictionary, and pass to the contructor as a second argument (we will learn how to do that later on).
A way to see how looks like the contructor is by typing
help(ccd.io.rawdata.BuilderRawData)
Using the string file_name
pointing to a fits file data, we can load this existing FITS file:
rdata = ccd.io.rawdata.BuilderRawData( file_name.format(25) )
This will return an object from class FitsRawData
. BuilderRawData
is a general class that accepts different formats for the input data. For each format there is a class to properly read the input data.
In this example, the input file is recorded as a fits and BuilderRawData
will return a FitsRawData
object. A FitsRawData
is the highest level component of the RawData
structure, consisting of a serie of attributes (some of them coming from the header of the fits file, in this example) and a data array image
containing the data of the image.
type(rdata)
The Rawdata
has a useful method Rawdata.info()
, which summarizes the content of the loaded and semi-processed data fits file:
rdata.info()
From this long list, the most relevant attributes of rdata
are:
When a full row is on the serial register, we can just read more columns than the full real size of our CCD. If we start to read before reading real pixels, we will have a region called prescan on cols. Just after reading this unreal set of pixels, the real data from row r starts. And at the end, we can just continue to read a set of n more pixels, that contributes to the overscan on cols region. The same can be done on rows: read n-rows before starting the real data (known as prescan on rows) or just after reading the full real pixels (known as overscan on rows). This is an information that we must set to properly define our data set
n_cols_overscan
: to define which columns are the ones that correspondds to the overscan on cols, i.e. [full_data_size_on_cols - n_cols_prescan, full_data_size_on_cols)
n_cols_prescan
: to define which columns are the ones that corresponds to the prescan on cols, i.e. [0 , n_cols_prescan)
n_rows_prescan
: to define which rows are the ones that corresponds to the prescan on rows, i.e. [0, n_rows_prescan)
n_rows_overscan
: to define whih rows are the ones that corresponds to the overscan on rows, i.e. [full_data_size_on_rows - n_rows_overscan, full_data_size_on_rows)
For that need also to now the full size of the data set: [full_data_size_on_rows, full_data_size_on_cols]
. This is readed from the header of the fits file from
ncols_keyword
will tell us the name of keyword to get the number of columns from the fits header, this value is then stored as ncols
nrows_keyword
--> nrows
(same as before but for rows)If the data corresponds to an skipper images, we must now how many skips have been done and in which axis axis_to_compress
. This kind of information is also on the header of the fits file
nskips_keyword
pointing to a variable on the fits header to find the number of skips that will be transfered to nskips
, and create the variable nallcols
is axis_to_compress
is 1 (meaning single skips along columns) or to nallrows
if axis_to_compress
is 0 (meaning single skips are measured along rows).Another useful parameters are the ones that allow the user to use only a specific region of the data that do not corresond to the sensitive region, nether to over- or pre-scan region. And the same for the skips. These are
id_col_start
and id_col_end
id_row_start
and id_row_end
id_skip_start
and id_skip_end
There are two paramters that are setup dependent:
correct_leach_bug
: set to correct from a bug on the data taken (some of the columns are misplaced). Note that this is by default active, and data will be correct from that bug unless you specified the opposite.
correct_polarity
: set to invert signal (pixels with no signal are recorded with the maximum value of ADC, and a negatibe pulse occures in the passage of an ionizing particle, having then a pixel with charge lower values of ADCs)
All this parameters are the ones the user can just define and pass to BuilderRawData
in a dictionary format.
Our data file can be also loaded with the correct configuration.
We can have an example of how looks like the dictionary for the input rawdata configuration at
`ccd.__path`/json/panaSKImg_configuration.json. To load this configuration JSON file we can use the class Config
from pysimdamicm.utils.config
.
cfg_file = "{}/json/panaSKImg_configuration.json".format(ccd.__path__[0])
print( cfg_file )
cfg = ccd.utils.config.Config(cfg_file, False)
This warnings are deprecated and will be removed in the next release of pysimdamicm (so you can just ignore them).
This returns a Config
object, with the following attributes and data members:
print( [attr for attr in dir(cfg) if attr[0]!='_'] )
The one we care about for this session, is the dictionary cfg.configuration
where all parameters from the JSON file are loaded under the structure dictionary of python. There are two main sections:
cfg.configuration.keys()
Where input is the one relevant for interpreting correctly the input data file. This is another dictionary containing three more sections:
conventions: where all the necessary keywords should be inform (variables with relevant information that are recorded in the fits file header)
image: parameters to define over- and pre-scan region, skip direction, ...
scp: not relevant here
cfg.configuration['input'].keys()
cfg.configuration['input']['image']
cfg.configuration['input']['convention']
All this parameters are attributes that we already see at rdata.info()
. All of them became attributes of the RawData
object.
The file we will use in this example has:
cfg.configuration['input']['convention']
)So, to properly interpret the data on our fits file we will change this values on the configuration file
cfg.configuration['input']['image']['n_cols_prescan'] = 3
Check your changes are updated
cfg.configuration['input']['image']
Once our dictionary to interpret correctly the data is correctly updated load the data file with the second parametrized configuration of BuilderRawData
(see 1.2).
rdata = ccd.io.rawdata.BuilderRawData( file_name.format(25), cfg.configuration['input'])
If we display the relevant attributes of rdata
, we can see that this are different from the previous one:
rdata.info()
rdata.image
the raw data¶The data from the fits files is loaded as a numpy.ndarray
under the attribute image
of the rdata
object. From the output of rdata.info()
once can see if the data was loaded correctly.
In our example, the loaded data is interpreted as a np.ndarray
of 3 dimensions: 150 rows, 275 columns and 64 skips. See last line from the previous output or just type
rdata.image.shape
Up to this point, you only loaded the data as it is in the fits file, and reshpae that to add the skips in a extra dimension (2D --> 3D) to exploit the capabilities of numpy
.
For instance, let's said you want to know the mean of all skips values per pixel. In a standard way you just will do:
mean_pixel_charge_f = np.zeros(rdata.image.shape[:2], dtype=np.float64)
for r in range(rdata.image.shape[0]):
for c in range(rdata.image.shape[1]):
mean_pixel_charge_f[r,c] = np.mean( rdata.image[r,c,:] )
mean_pixel_charge_f.shape
a pixel in the image mean_pixel_charge_f
is the mean of all single skips done for this pixel (in our example, the mean of the 64 different measurements).
Or you can just take profit of the nature of the rdata.image
object
mean_pixel_charge = np.mean(rdata.image, axis=2)
#plt.figure("Comparing results of mean values per row from two different methods")
#plt.scatter( mean_charge_row, mean_charge_row_f )
mean_pixel_charge.shape
fig, ax = plt.subplots()
iax = ax.imshow( mean_pixel_charge_f - mean_pixel_charge, aspect='auto' )
fig.colorbar(iax)
From the image, we see that both procedure reach the same output. Another way to check is also by sum up all differences, and check it is zero
( mean_pixel_charge_f - mean_pixel_charge ).sum()
There is a process that does this operation of compressin the skippers to end with a single image of dimension 2: CompressSkipperImage. We will explore in more detail this class in the How to Process for Analysis.
Here an example of how to do it:
comp = ccd.processes.skipper_analysis.CompressSkipperProcess()
comp.info()
With this process we can just limit the region of the skips to take into account. For each statiscal function listed on the attribute func_to_compress
(statistic that must exist on numpy
) a new image will be created and returned as a new attribute on the object rdata
.
Let's said we are only want to consider skip measurements from skip id 10 up to skip id 63, using only the mean
function:
comp.id_skip_start = 10
comp.id_skip_end = 63
comp.func_to_compress = ['mean']
Now we only need to execute this process on our data:
comp.execute_process( rdata )
This process return the compressed image as a new attribute on our data: rdata.image_mean_compressed
rdata.image_mean_compressed.shape
When object mean_pixel_charge
was done, the full set of skips where used, but not here. In this example, the skippe compression process take into consideration only skip measurements from 10 to 63,
(rdata.image_mean_compressed - mean_pixel_charge).sum()
To get the same results just consider all skips
comp.id_skip_end = -1
comp.id_skip_start = 0
comp.execute_process(rdata)
(rdata.image_mean_compressed - mean_pixel_charge).sum()
And now we get the same result!
If you set the attribute comp.__verbose__
you will get an ouptu image:
comp.__verbose__
comp.__verbose__ = True
comp.execute_process(rdata)
The warning is just because one of the used axis was re-used (not important right now, and will be removed in the next version of pysimdamicm)
RawData
for already compressed images¶Up to now, we learn how to read our data properly using a configuration file. We have seen that the method used to load the image modifies its dimension to be an array of 3-dimensions: rows, cols and skips (in this order).
The keyword on the fits file header NCDMS
must point to 1 to properly load the data. So you can just change the parameter cfg.configuration['input']['convention']['Nskips']
to point to a variable with value int(1)
. If the image you use is the output of the previous process CompressSkipperImage
you do not need to do nothing beause this is already done!
For this, I will use an image that was compressed by using CompressSkipperImage
and stored by this process too.
As before I will use a variable to define the path, and the patter for the fits file name. You can just use a single variable with the absolute path to the fits file (I am using that in case we want to play with other images from the same path).
path_to_avgimg = "/data/workdir/compton/data/calidaq_backup/DataTaking/Am241/outputs/run027/avgimg"
patter_file_name = "Image_Am241_Source_27_{}_compressed.fits"
cfile_name = "{}/{}".format(path_to_avgimg,patter_file_name)
cdata = ccd.io.rawdata.BuilderRawData(cfile_name.format(205),cfg.configuration['input'])
cdata.image.shape
The data is then a 2D ndarray.
Note that this time a new image has also been created:
cdata.image_mean_compressed
this is because some process (see How to Process for Analysis) will search for that attribute (like for instance
cdata.image_mean_compressed.shape
Both images, image
and image_mean_compressed
are the same:
(cdata.image_mean_compressed - cdata.image).sum()
comp.execute_process(cdata)
As we should expect!
The number of skips in the input fits file header (NCDMS
) was changed to be 1. The data member image_header
of the RawData
object contains the header of the input fits file. Let's see the values for this variables
print(repr(cdata.image_header))
We can see here that the parameter 'NDCMS' is set to 1 and a new one has been included 'NSKIPS' pointing to the number of measurements of the original raw data.
Note that we did not change the configuration options, and 'Nskips' points to 'NDCMS' which now is set to 1, as should be.
Up to now we learn how to load our data with the correct configuration options. By default when a RawData
is created it only reads the header and the data from the fits files, and re-shape the data to properly diferenciate between rows, columns and skips (if necessary).
But, how do we access the different regions of the data? How to get only the overscan on rows? or the active region?
If the configuration options has the correct values for this regions you just need to run
rdata.prepare_data()
If the configuration options has not the correct one, and you do not want to re-load the data, you can just change the values on the rdata
object.
Let's assume that the overscan on cols comprises the last 30 columns, instead of the 15 we have reported through the configuration options.
rdata.n_cols_overscan
rdata.n_cols_overscan = 30
rdata.n_cols_overscan
rdata.prepare_data()
This method create several atributes on the object rdata
to properly point to the different regions. These are
mask_image_active_region
mask_image_overscan_cols
mask_image_overscan_rows
mask_image_prescan_cols
mask_image_prescan_rows
These objects are boolean np.ndarray
with the same shape as rdata.image
. Those pixels with value True corresponds to values that are masked pixels, while the ones with value False are not masked pixels.
type(rdata.mask_image_prescan_cols)
rdata.mask_image_prescan_cols
Internally when any function/process wants to access to some specific region of the CCD will create a masked
numpy array uisng this boolean arrrays as mask.
plt.figure("sensitive region")
plt.imshow(np.ma.array( rdata.image_mean_compressed, mask= rdata.mask_image_active_region))
print( " There are {} pixels that have been masked".format(rdata.mask_image_active_region.sum()))
plt.figure("overscan on cols")
plt.imshow(np.ma.array( rdata.image_mean_compressed, mask= rdata.mask_image_overscan_cols))
print( " There are {} pixels that have been masked".format(rdata.mask_image_overscan_cols.sum()))
Instead of using functions from the main package numpy
is recomended to use the module ma
from numpy
(i.e. numpy.ma
) to properly perorme any numerical operation without worrying about masked values.
Let's display only the first row of the overscan masked image
mimg = np.ma.array( rdata.image_mean_compressed, mask= rdata.mask_image_overscan_cols)
mimg[0,:]
This object has two main attributes: data and mask. The data for those masked pixels appears as '--', and will not be used.
Despite it has part of the data masked, it still contain the original shape
mimg.shape
mimg**0.5
np.mean( mimg )
np.median( mimg )
This warning tell us that function np.median
is not taking into account the masked values, and we will get a non desired value for the median of our masked image. The proper way to operate with masked array is then by using the module ma
, as already mentioned!
Wee see that some of the functions on numpy
are override to work with masked arrays, but not all of them. So be really careful, and when the input array is masked just use ma
functions instead of numpy
.
np.ma.median( mimg ) - np.median( mimg )
np.mean(mimg) - mimg.mean()
np.mean(mimg) - np.ma.mean(mimg)