Re: [aroma.affymetrix] Problems extracting intensity values after normalization

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [aroma.affymetrix] Problems extracting intensity values after normalization

Henrik Bengtsson
Hi.

On Tue, Sep 7, 2010 at 6:51 AM,  <[hidden email]> wrote:
> Hi Henrik,
>
> Thanks for your response.  Let me clarify what I'm trying to do:  For each
> array I want to be able to compute basic summary statistics (min, median,
> max, etc.) of the probe-level intensities, both before and after
> normalization.

Preprocessing is more than normalization of probe signals, it is also
probe summarization and for CRMA (v1 & v2) even normalization of
summarized signals.

>
> My goal here is to do some basic quality control.  The CRMAv2 vignette shows
> how to produce density plots for the probe-level intensities both before and
> after normalization, and clearly this is useful.  Using extractAffyBatch(),
> I can extract the probe-level intensities *before* normalization, and thus
> produce spatial plots of the intensities and compute the summary statistics.

Yes, extractAffyBatch() can be called on the raw CEL set ("dsR"), the
allelic cross-talk calibration CEL set ("dsC"), and the nucleotide
position sequence normalized CEL set ("dsN").  After the latter step,
probe signals are summarized over probe sets and you can no longer use
extractAffyBatch().

For probe-summarized signals (and beyond), I gave in my earlier reply
some suggestion on how to access such signals.

Also, it is not clear how one should compare probe signals with
probe-summarized signals.

>  However, I'd like to be able to get the same information after
> normalization.  Given that you're much more well-versed in these matters
> than I am, I'd be happy to hear any suggestions you have regarding quality
> control.

You can look at simple QC plots such as plotDensity(ds) and
boxplot(ds) where 'ds' are data sets.  Here both CEL set containing
probe signals and chip-effect sets containing probe summarized signals
works.

/Henrik

>
> Thanks again,
>
> Vonn
>
>> On Fri, Sep 3, 2010 at 10:47 AM, Vonn <[hidden email]> wrote:
>>>
>>> Hi All,
>>>
>>> I'm interested in doing some basic quality control with Affy SNP6.0
>>> chips.  In particular, I'd like to get summary statistics and plot
>>> densities of the intensity values for each array both before and after
>>> normalization (allelic crosstalk calibration and base position
>>> normalization).  Plotting the densities pre/post-normalization is no
>>> problem.  Moreover, I can use extractAffyBatch() to get the intensity
>>> values and summary statistics before normalizing.  However, I get an
>>> error about a corrupted CEL file when I try to use extractAffyBatch()
>>> post-normalization.
>>>
>>> Is there a way to extract the intensity values without using
>>> extractAffyBatch()?
>>
>> Actually, it does not make sense to try to extract an AffyBatch from a
>> ChipEffectSet.  The reason is that an AffyBatch should hold
>> probe-level signals, while a ChipEffectSet contains summarized
>> signals.  In the next release of aroma.affymetrix, extractAffyBatch()
>> will throw a more informative error message explaining this.
>>
>> I'd like to ask you what do you want to do in the next step, because
>> depending on what you want to do, there are different ways of
>> extract/working with/exporting the summarized signals.
>>
>> For instance, you may want to use the extractDataFrame() method, cf.
>>
>>  http://www.aroma-project.org/howtos/extractDataFrame
>>
>> Notes: This will load all requested data into memory, so you probably
>> want to subset by units and/or samples when using that method.  Also,
>> the first time you use this method for a new chip type, it will take
>> some time, because it first has to do some formatting of the
>> annotation data.  Just be patient; you can use verbose=-50 to see
>> what's going on.
>>
>> See also extractTheta(), cf.
>>
>>  http://www.aroma-project.org/howtos/extractTheta
>>
>> If you want to export your data to tab-delimited files, there is also
>> writeDataFrame(), cf.
>>
>>  http:/www.aroma-project.org/howtos/writeDataFrame
>>
>> There are more options, but I leave it there until I know what you are
>> really after.
>>
>> Hope this helps
>>
>> Henrik
>>
>>>
>>> My code, traceback, and sessionInfo are below.
>>>
>>> Thanks in advance,
>>>
>>> Vonn
>>>
>>>> #Load aroma
>>>> library(aroma.affymetrix)
>>>
>>> Loading required package: R.utils
>>> Loading required package: R.oo
>>> Loading required package: R.methodsS3
>>> R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3
>>> for help.
>>> R.oo v1.7.3 (2010-06-04) successfully loaded. See ?R.oo for help.
>>> R.utils v1.5.0 (2010-08-04) successfully loaded. See ?R.utils for
>>> help.
>>> Loading required package: R.filesets
>>> Loading required package: digest
>>> R.filesets v0.8.3 (2010-07-06) successfully loaded. See ?R.filesets
>>> for help.
>>> Loading required package: aroma.core
>>> Loading required package: R.cache
>>> R.cache v0.3.0 (2010-03-13) successfully loaded. See ?R.cache for
>>> help.
>>> Loading required package: R.rsp
>>> R.rsp v0.3.6 (2009-09-16) successfully loaded. See ?R.rsp for help.
>>>  Type browseRsp() to open the RSP main menu in your browser.
>>> Loading required package: matrixStats
>>> matrixStats v0.2.1 (2010-04-05) successfully loaded. See ?matrixStats
>>> for help.
>>> Loading required package: aroma.light
>>> aroma.light v1.16.1 (2010-06-23) successfully loaded. See ?aroma.light
>>> for help.
>>> aroma.core v1.7.0 (2010-07-26) successfully loaded. See ?aroma.core
>>> for help.
>>> Loading required package: aroma.apd
>>> Loading required package: R.huge
>>> R.huge v0.2.0 (2009-10-16) successfully loaded. See ?R.huge for help.
>>> Loading required package: affxparser
>>> aroma.apd v0.1.7 (2009-10-16) successfully loaded. See ?aroma.apd for
>>> help.
>>> aroma.affymetrix v1.7.0 (2010-07-26) successfully loaded. See ?
>>> aroma.affymetrix for help.
>>>>
>>>> log = verbose = Arguments$getVerbose(-8, timestamp = TRUE)
>>>> options(digits = 4)
>>>>
>>>> #Test to make sure things are working
>>>> cdf = AffymetrixCdfFile$byChipType("GenomeWideSNP_6", tags = "Full")
>>>> print(cdf)
>>>
>>> AffymetrixCdfFile:
>>> Path: annotationData/chipTypes/GenomeWideSNP_6
>>> Filename: GenomeWideSNP_6,Full.cdf
>>> Filesize: 470.44MB
>>> Chip type: GenomeWideSNP_6,Full
>>> RAM: 0.00MB
>>> File format: v4 (binary; XDA)
>>> Dimension: 2572x2680
>>> Number of cells: 6892960
>>> Number of units: 1881415
>>> Cells per unit: 3.66
>>> Number of QC units: 4
>>>>
>>>> gi = getGenomeInformation(cdf)
>>>> print(gi)
>>>
>>> UgpGenomeInformation:
>>> Name: GenomeWideSNP_6
>>> Tags: Full,na30,hg18,HB20100215
>>> Full name: GenomeWideSNP_6,Full,na30,hg18,HB20100215
>>> Pathname: annotationData/chipTypes/GenomeWideSNP_6/
>>> GenomeWideSNP_6,Full,na30,hg18,HB20100215.ugp
>>> File size: 8.97 MB (9407867 bytes)
>>> RAM: 0.00 MB
>>> Chip type: GenomeWideSNP_6,Full
>>>>
>>>> si = getSnpInformation(cdf)
>>>> print(si)
>>>
>>> UflSnpInformation:
>>> Name: GenomeWideSNP_6
>>> Tags: Full,na30,hg18,HB20100215
>>> Full name: GenomeWideSNP_6,Full,na30,hg18,HB20100215
>>> Pathname: annotationData/chipTypes/GenomeWideSNP_6/
>>> GenomeWideSNP_6,Full,na30,hg18,HB20100215.ufl
>>> File size: 7.18 MB (7526452 bytes)
>>> RAM: 0.00 MB
>>> Chip type: GenomeWideSNP_6,Full
>>> Number of enzymes: 2
>>>>
>>>> acs = AromaCellSequenceFile$byChipType(getChipType(cdf, fullname =
>>>> FALSE))
>>>> print(acs)
>>>
>>> AromaCellSequenceFile:
>>> Name: GenomeWideSNP_6
>>> Tags: HB20080710
>>> Full name: GenomeWideSNP_6,HB20080710
>>> Pathname: annotationData/chipTypes/GenomeWideSNP_6/
>>> GenomeWideSNP_6,HB20080710.acs
>>> File size: 170.92 MB (179217531 bytes)
>>> RAM: 0.00 MB
>>> Number of data rows: 6892960
>>> File format: v1
>>> Dimensions: 6892960x26
>>> Column classes: raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw,
>>> raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw, raw,
>>> raw
>>> Number of bytes per column: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
>>> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
>>> Footer: <createdOn>20080710 22:47:02 PDT</
>>> createdOn><platform>Affymetrix</platform><chipType>GenomeWideSNP_6</
>>> chipType><srcFile><filename>GenomeWideSNP_6.probe_tab</
>>> filename><filesize>341479928</
>>> filesize><checksum>2037c033c09fd8f7c06bd042a77aef15</checksum></
>>> srcFile><srcFile2><filename>GenomeWideSNP_6.CN_probe_tab</
>>> filename><filesize>96968290</
>>> filesize><checksum>3dc2d3178f5eafdbea9c8b6eca88a89c</checksum></
>>> srcFile2>
>>> Chip type: GenomeWideSNP_6
>>> Platform: Affymetrix
>>>>
>>>> #Load CEL files in the folder "testing."  Note that inside the working
>>>> directory we have the
>>>> #path rawData//testing//GenomeWideSNP_6, which contains the appropriate
>>>> CEL files.
>>>> csR = AffymetrixCelSet$byName("smalltesting1", cdf = cdf)
>>>> print(csR)
>>>
>>> AffymetrixCelSet:
>>> Name: smalltesting1
>>> Tags:
>>> Path: rawData/smalltesting1/GenomeWideSNP_6
>>> Platform: Affymetrix
>>> Chip type: GenomeWideSNP_6,Full
>>> Number of arrays: 9
>>> Names: sample1, sample10, ..., sample9
>>> Time period: 2010-04-20 13:23:11 -- 2010-04-20 18:43:03
>>> Total file size: 592.90MB
>>> RAM: 0.01MB
>>>>
>>>> #Create AffyBatch object for csR
>>>> ab = extractAffyBatch(csR)
>>>
>>> Loading required package: affy
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>>  Vignettes contain introductory material. To view, type
>>>  'openVignette()'. To cite Bioconductor, see
>>>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>>
>>> Attaching package: 'Biobase'
>>>
>>> The following object(s) are masked from 'package:matrixStats':
>>>
>>>    anyMissing, rowMedians
>>>
>>>
>>> Attaching package: 'affy'
>>>
>>> The following object(s) are masked from 'package:aroma.light':
>>>
>>>    plotDensity
>>>
>>>
>>>        The following object(s) are masked _by_ package:Biobase :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_
>>> package:aroma.affymetrix :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:aroma.apd :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.huge :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:aroma.core :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:aroma.light :
>>>
>>>         .Depends plotDensity
>>>
>>>
>>>        The following object(s) are masked _by_ package:matrixStats :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.rsp :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.cache :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.filesets :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.utils :
>>>
>>>         .Depends
>>>
>>>
>>>        The following object(s) are masked _by_ package:R.oo :
>>>
>>>         .Depends
>>>
>>> Loading required package: genomewidesnp6,fullcdf
>>> Warning messages:
>>> 1: In fcn(...) : Packages reordered in search path: package:affy
>>> 2: In extractAffyBatch.AffymetrixCelSet(csR) :
>>>  CDF enviroment package 'genomewidesnp6,fullcdf' not installed. The
>>> 'affy' package will later try to download it from Bioconductor and
>>> install it.
>>>>
>>>> #########################
>>>> #########################
>>>> #####Now for normalization.  As per CMRAv2.0, we perform allelic
>>>> #####crosstalk calibration and normalization for nucleotide-position
>>>> #####probe sequence effects.
>>>> #########################
>>>> #########################
>>>> acc = AllelicCrosstalkCalibration(csR, model = "CRMAv2")
>>>> csC = process(acc, verbose = verbose)
>>>
>>> 20100903 13:35:10|Calibrating data set for allelic cross talk...
>>> 20100903 13:35:11| Already calibrated
>>> 20100903 13:35:11|Calibrating data set for allelic cross talk...done
>>>>
>>>> bpn = BasePositionNormalization(csC, target = "zero")
>>>> csN = process(bpn, verbose = verbose)
>>>
>>> 20100903 13:35:11|Normalization data set for probe-sequence effects...
>>> 20100903 13:35:11| Already normalized
>>> 20100903 13:35:11|Normalization data set for probe-sequence
>>> effects...done
>>>>
>>>> #Now repeat the above QC steps based on the normalized data
>>>> abN = extractAffyBatch(csN)
>>>
>>> Loading required package: genomewidesnp6,fullcdf
>>> Error in read.affybatch(filenames = l$filenames, phenoData = l
>>> $phenoData,  :
>>>  It appears that the file probeData/smalltesting1,ACC,ra,-XY,BPN,-XY/
>>> GenomeWideSNP_6/sample1.CEL is corrupted.
>>> In addition: Warning message:
>>> In extractAffyBatch.AffymetrixCelSet(csN) :
>>>  CDF enviroment package 'genomewidesnp6,fullcdf' not installed. The
>>> 'affy' package will later try to download it from Bioconductor and
>>> install it.
>>>>
>>>> traceback()
>>>
>>> 5: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,
>>>       ref.cdfName, dim.intensity, verbose, PACKAGE = "affyio")
>>> 4: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
>>>       description = l$description, notes = notes, compress =
>>> compress,
>>>       rm.mask = rm.mask, rm.outliers = rm.outliers, rm.extra =
>>> rm.extra,
>>>       verbose = verbose, sd = sd, cdfname = cdfname)
>>> 3: ReadAffy(filenames = filenames, sampleNames = sampleNames, ...,
>>>       verbose = as.logical(verbose))
>>> 2: extractAffyBatch.AffymetrixCelSet(csN)
>>> 1: extractAffyBatch(csN)
>>>>
>>>> sessionInfo()
>>>
>>> R version 2.11.1 (2010-05-31)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252
>>> [2] LC_CTYPE=English_United States.1252
>>> [3] LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C
>>> [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods
>>> base
>>>
>>> other attached packages:
>>>  [1] Biobase_2.6.1          aroma.affymetrix_1.7.0
>>> aroma.apd_0.1.7
>>>  [4] affxparser_1.18.0      R.huge_0.2.0
>>> aroma.core_1.7.0
>>>  [7] aroma.light_1.16.1     matrixStats_0.2.1
>>> R.rsp_0.3.6
>>> [10] R.cache_0.3.0          R.filesets_0.8.3
>>> digest_0.4.2
>>> [13] R.utils_1.5.0          R.oo_1.7.3
>>> affy_1.24.2
>>> [16] R.methodsS3_1.2.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.14.0        preprocessCore_1.8.0 tools_2.11.1
>>>>
>>>
>>> --
>>> When reporting problems on aroma.affymetrix, make sure 1) to run the
>>> latest version of the package, 2) to report the output of sessionInfo() and
>>> traceback(), and 3) to post a complete code example.
>>>
>>>
>>> You received this message because you are subscribed to the Google Groups
>>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>>> To post to this group, send email to [hidden email]
>>> To unsubscribe and other options, go to
>>> http://www.aroma-project.org/forum/
>>>
>>
>> --
>> When reporting problems on aroma.affymetrix, make sure 1) to run the
>> latest version of the package, 2) to report the output of sessionInfo() and
>> traceback(), and 3) to post a complete code example.
>>
>>
>> You received this message because you are subscribed to the Google Groups
>> "aroma.affymetrix" group with website http://www.aroma-project.org/.
>> To post to this group, send email to [hidden email]
>> To unsubscribe and other options, go to
>> http://www.aroma-project.org/forum/
>>
>>
>
>
> --
> When reporting problems on aroma.affymetrix, make sure 1) to run the latest
> version of the package, 2) to report the output of sessionInfo() and
> traceback(), and 3) to post a complete code example.
>
>
> You received this message because you are subscribed to the Google Groups
> "aroma.affymetrix" group with website http://www.aroma-project.org/.
> To post to this group, send email to [hidden email]
> To unsubscribe and other options, go to http://www.aroma-project.org/forum/
>

--
When reporting problems on aroma.affymetrix, make sure 1) to run the latest version of the package, 2) to report the output of sessionInfo() and traceback(), and 3) to post a complete code example.


You received this message because you are subscribed to the Google Groups "aroma.affymetrix" group with website http://www.aroma-project.org/.
To post to this group, send email to [hidden email]
To unsubscribe and other options, go to http://www.aroma-project.org/forum/