case school of engineering | - data science study protocols for investigating lifetime ... · 2016....

5
Data Science Study Protocols for Investigating Lifetime and Degradation of PV Technology Systems Nicholas R. Wheeler 1 , Yifan Xu 3 , Abdulkerim Gok 2 , Ian Kidd 2 , Laura S. Bruckman 2 , Jiayang Sun 3 and Roger H. French 1,2 1 Department of Macromolecular Science and Engineering, Case Western Reserve University 2 Department of Material Science and Engineering, Case Western Reserve University 3 Department of Epidemiology and Biostatistics, Case Western Reserve University Abstract — The reliability of photovoltaic (PV) technology sys- tems is a major concern to the PV industry, and the focus of much recent research activity. To ensure that these efforts return the maximum value for the resources invested, it is critical to design a study protocols that follow good statistical design principles, en- suring simplicity, avoidance of biases, minimization of missing data, reproducibility, relevance to the sceintific objective, and replication with an adequate sample size. With pre-knowledge of certain as- pects of a proposed study, data science study protocols may be spec- ified that aim to determine required sample size to adequately ad- dress the research objective. We describe the process of designing such a study protocol for an example PV technology, based upon ex- pected uncertainties calculated from a pilot study. This represents a methodological approach to defining scientific studies that balances cost against potential information yield. Index Terms — photovoltaic systems, regression analysis, enter- prise resource planning, knowledge management I . I NTRODUCTION AND BACKGROUND The reliability of PV technology systems is of the utmost im- portance to the continued growth and well–being of the PV industry.[1, 2, 3] Research efforts in this area are expand- ing correspondingly[4, 5, 6, 7, 8], under the combined in- fluence of many major international interests.[9] Furthermore, long–established reliability determination methods are under question[10], encouraging a myriad of unique and novel PV degradation studies to be conducted outside of pre-existant frameworks. While this divergence from traditional methods represents great opportunity, there is also risk. Something that can be over- looked when planning the collection of scientific data is the de- sign of a proper statistically informed study protocol, especially regarding sample size determination for the experiments. Often times data may be first collected without justification of the sam- ple size or other experimental parameters, and then afterwards analyzed in a variety of ways in hopes that something signifi- cant will become apparent. An unbiased approach of searching for unexpected significance is essential, and performing experi- ments where the outcome is not known beforehand is key to sci- entific progress. However, creating an arbitrarily defined dataset from a haphazardly planned study is risky. Lack of data may lead to failure of detecting statistically significant differences, and an overabundance of data beyond what is needed to address research questions is waste of valuable human and material resources. In either case, these non-optimally structured studies would repre- sent an opportunity to reallocate resources for valuable efficiency gains. In the field of clinical research and epidemiology, where the results of resource intensive experiments inform critically im- portant decisions, statistical power analysis is often a standard and essential step for study design.[11] Statistical power analy- sis is used to find how likely an appropriate statistical test is to detect an effect of a reasonable size, given the number of sub- jects available and an estimate of the varability in the data. In this way it can be used to estimate beforehand the likelihood of a study to produce scientifically meaningful results which success- fully address the intended research question. In many epidemio- logical studies, two or more well defined groups of subjects are compared and statistically informed conclusions are drawn about whether there is a significant difference in certain characteris- tic(s) between them. A quintessential example of an epidemio- logical medical study is the evaluation of a newly proposed treat- ment through the comparison of its theraputic efficacy to that of a well established pre-existing treatment. Two groups of randomly selected subjects, each receiving one of the two treatments be- ing compared, are monitored over time to generate the data used for analysis. This is analogous to typical PV reliability studies, where new technologies are evaluated against pre-existing coun- terparts, or the degradation resulting from accelerated environ- mental exposure tests in the laboratory is compared to degrada- tion caused by real world use. Given these similarities, statistical power analysis techniques that are used to plan epidemiological studies can aslo be applied to PV reliability studies. The statistical power of a study is the probability of correctly rejecting the null hypothesis and detecting a statistically signifi- cant result (not committing a type II error). Estimated statistical power is an established metric for evaluating proposed medical studies, and statistical power analysis is a major topic of interest in the field of statistics research.[12, 13] Different study designs require different statistical power analysis. In the study shown in this article, the statistical power is connected to the sample size through other values such as desired significance level α, data variance, frequency of data acquaintances, and the effect size (essential difference) that is scientifically or operationally important to detect. A pilot study is used to estimate variances from which the statistical power of the study is calculated (see Figure 1). With statistical power analysis in mind, data science study pro- tocols for investigating PV lifetime and degradation science[6, 7] may be developed that are explicitly designed to balance ade- quate sample size and sufficient statistical power, reducing the

Upload: others

Post on 08-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Data Science Study Protocols for Investigating Lifetime andDegradation of PV Technology Systems

    Nicholas R. Wheeler1, Yifan Xu3, Abdulkerim Gok2, Ian Kidd2,Laura S. Bruckman2, Jiayang Sun3 and Roger H. French1,2

    1Department of Macromolecular Science and Engineering, Case Western Reserve University2Department of Material Science and Engineering, Case Western Reserve University

    3Department of Epidemiology and Biostatistics, Case Western Reserve University

    Abstract — The reliability of photovoltaic (PV) technology sys-tems is a major concern to the PV industry, and the focus of muchrecent research activity. To ensure that these efforts return themaximum value for the resources invested, it is critical to designa study protocols that follow good statistical design principles, en-suring simplicity, avoidance of biases, minimization of missing data,reproducibility, relevance to the sceintific objective, and replicationwith an adequate sample size. With pre-knowledge of certain as-pects of a proposed study, data science study protocols may be spec-ified that aim to determine required sample size to adequately ad-dress the research objective. We describe the process of designingsuch a study protocol for an example PV technology, based upon ex-pected uncertainties calculated from a pilot study. This represents amethodological approach to defining scientific studies that balancescost against potential information yield.

    Index Terms — photovoltaic systems, regression analysis, enter-prise resource planning, knowledge management

    I. INTRODUCTION AND BACKGROUND

    The reliability of PV technology systems is of the utmost im-portance to the continued growth and well–being of the PVindustry.[1, 2, 3] Research efforts in this area are expand-ing correspondingly[4, 5, 6, 7, 8], under the combined in-fluence of many major international interests.[9] Furthermore,long–established reliability determination methods are underquestion[10], encouraging a myriad of unique and novel PVdegradation studies to be conducted outside of pre-existantframeworks.

    While this divergence from traditional methods representsgreat opportunity, there is also risk. Something that can be over-looked when planning the collection of scientific data is the de-sign of a proper statistically informed study protocol, especiallyregarding sample size determination for the experiments. Oftentimes data may be first collected without justification of the sam-ple size or other experimental parameters, and then afterwardsanalyzed in a variety of ways in hopes that something signifi-cant will become apparent. An unbiased approach of searchingfor unexpected significance is essential, and performing experi-ments where the outcome is not known beforehand is key to sci-entific progress. However, creating an arbitrarily defined datasetfrom a haphazardly planned study is risky. Lack of data may leadto failure of detecting statistically significant differences, and anoverabundance of data beyond what is needed to address researchquestions is waste of valuable human and material resources. Ineither case, these non-optimally structured studies would repre-

    sent an opportunity to reallocate resources for valuable efficiencygains.

    In the field of clinical research and epidemiology, where theresults of resource intensive experiments inform critically im-portant decisions, statistical power analysis is often a standardand essential step for study design.[11] Statistical power analy-sis is used to find how likely an appropriate statistical test is todetect an effect of a reasonable size, given the number of sub-jects available and an estimate of the varability in the data. Inthis way it can be used to estimate beforehand the likelihood of astudy to produce scientifically meaningful results which success-fully address the intended research question. In many epidemio-logical studies, two or more well defined groups of subjects arecompared and statistically informed conclusions are drawn aboutwhether there is a significant difference in certain characteris-tic(s) between them. A quintessential example of an epidemio-logical medical study is the evaluation of a newly proposed treat-ment through the comparison of its theraputic efficacy to that of awell established pre-existing treatment. Two groups of randomlyselected subjects, each receiving one of the two treatments be-ing compared, are monitored over time to generate the data usedfor analysis. This is analogous to typical PV reliability studies,where new technologies are evaluated against pre-existing coun-terparts, or the degradation resulting from accelerated environ-mental exposure tests in the laboratory is compared to degrada-tion caused by real world use. Given these similarities, statisticalpower analysis techniques that are used to plan epidemiologicalstudies can aslo be applied to PV reliability studies.

    The statistical power of a study is the probability of correctlyrejecting the null hypothesis and detecting a statistically signifi-cant result (not committing a type II error). Estimated statisticalpower is an established metric for evaluating proposed medicalstudies, and statistical power analysis is a major topic of interestin the field of statistics research.[12, 13] Different study designsrequire different statistical power analysis. In the study shownin this article, the statistical power is connected to the samplesize through other values such as desired significance level α,data variance, frequency of data acquaintances, and the effectsize (essential difference) that is scientifically or operationallyimportant to detect. A pilot study is used to estimate variancesfrom which the statistical power of the study is calculated (seeFigure 1).

    With statistical power analysis in mind, data science study pro-tocols for investigating PV lifetime and degradation science[6, 7]may be developed that are explicitly designed to balance ade-quate sample size and sufficient statistical power, reducing the

  • Figure 1: Left: Definitions of terminologies in a statistical test. Right: An illustration of power and significance level in a simplestatistical test, where the left and right bell curves are the densities of the test statistics under the null hypothesis and the alternativehypothesis, respectively.[14]

    probability of type II error to an acceptable level. Appropriatenumbers of samples and measurements for essential differencesensitivities of interest can be estimated and optimized, definingstudies expected to generate reasonable and justifiable datasetsfor addressing PV reliability research questions.

    II. EXPERIMENTAL METHODS

    Towards the goal of designing a PV reliability study with goodstatistical practices, a pilot study has been performed on abenchmark PV backsheet material (insert tradename and detailshere) provided by DuPont. 10 samples apiece were exposed to"ASTMG154" (ASTM g154 cycle 4)[15] and "HotQUV" (basedon ASTM g154 cycle 4, without the condensing humidty step:1.55W/m2/nm at 340 nm, 70◦C) conditions for 2000 hours withmeasurements every 500 hours. The yellowness index (YI), cal-culated by ASTM E313 using D65/10[16], of the samples wasrecorded at every measurement period. The linear portion (from500-2000 hours) of this sample response was analyzed to esti-mate variances for use in statistical power analysis.

    The parameter of interest used to calculate the essential differ-ence was the rate of change of the YI response, which describesspectral characteristics relating to the yellowness in the visiblecolor of the samples. This rate of change was obtained from thepilot study for each exposure type via linear regression. The ’be-tween subject variance’ and ’within subject variance’ for the YIdata were calculated using a linear mixed effect model. Thesevariances, combined with the essential difference and other pro-posed study parameters (total time, number of measurements,sample size) allow the estimated statistical power of a proposedstudy to be calculated.

    Figure 2: Data from a pilot study of a benchmark PV backsheetmaterial under the two exposure conditions "ASTMG154" (bluetriangles), and "HotQUV" (red circles). The linear regressionmodels used to estimate the essential difference in rate of changeof the measured YI values are overlaid as dotted lines.

    III. RESULTS

    The YI values of the pilot study samples under both exposureconditions between 500-2000 hours are shown in Figure 2. Thedifference between the two YI slopes observed in the pilot studyis 0.18, which roughly suggests a range of essential differencethat the planned study should be sensitive enough to detect.

    A subsequent study using the same study parameters as thepilot study is estimated to have a statistical power of 0.58. Thisis quite low, and indicates that a study with the parameters of thepilot study would not be statistically powerful enough to detectthe seemingly observed effect with a high degree of probability.

  • Figure 3: Two dimensional statistical power curves relating the number of samples and measurements to the essential difference,for a series of statistical powers. A point representing the parameters of the pilot study (estimated statistical power of 0.58) isshown as a small red dot which falls within the larger red circle located on the curve for 0.60 statistical power. Other colored circleshighlight adjustments to single parameters of the pilot study that would increase the estimated statistical power outcome to fall onthe correspondingly colored curves.

    IV. DISCUSSION

    The pilot study is meant to give an estimate of the variances andessential difference for the purposes of designing a more statisti-cally appropriate study. Statistical power analysis can be used toindicate changes in study parameters that would adjust the esti-mated statistical power of the study into the desired range. Withthe variances estimated from the pilot study and effect size of0.18, 45 samples are required to achieve a 0.90 statistical power,which can be rounded up to 46 total samples, or 23 per exposurecondition.

    In our study, statistical power, effect size, sample size, alpha,observation frequency and total experiment time are interrelated.One is determined given values of the others, and so the relation-ships between parameters can be investigated by varying two pa-rameters while leaving the rest constant. Curves of essential dif-ference versus number of samples for a series of given statisticalpowers are shown in Figure 3. As would be expected, a highernumber of samples is required for a higher statistical power atany given essential difference.

    Another parameter that may be selectively varied is the num-ber of measurements taken over the total study time. Examin-ing curves of number of measurements versus number of sam-ples (Figure 3) shows that for these experiments a study withmore frequent measurements over the same time period wouldachieve a higher statistical power with the same number of sam-ples. For the variations estimated from this pilot study, calcu-lating the required number of measurements required for 0.90statistical power results in an estimate of 13 equally spaced mea-surement steps.

    It is interesting to note that for lower essential differences therequired number of samples or measurements between one sta-tistical power curve and the next is pronounced. Increasing thenumber of samples and measurements has a lesser effect on sta-tistical power while looking at smaller essential differences. This

    reflects the intuition that larger effect sizes are easier to detect.

    Figure 4: A surface relating the total number of samples andnumber of measurements to corresponding essential differencesat a statistical power of 0.90. The red point corresponds to theparameters of the pilot study, and the magenta points to the pa-rameters of hypothetical studies where single parameters havebeen adjusted to increase the estimated statistical power to 0.90.

    When other variables are fixed, the balance between samplesize, essential difference and measurement frequency for a de-sired statistical power can be comprehensively mapped to pro-vide insight into how varying the parameters of a study impactsits estimated statistical power. This is shown in Figure 4, where athree dimensional surface has been generated that corresponds to0.90 statistical power. Given the results of the pilot study, an op-timized data science study protocol of this benchmark PV back-

  • sheet material under these two exposure conditions (aiming for astatistical power of 0.90) could feature 23 samples per exposurecondition, or 13 measurements. Additionally, adjusting the es-sential difference of the study to 0.27 would acheive 0.90 statisti-cal power with the current amount of samples and measurements,or two or three of these parameters could be adjusted simultane-ously. To reach a fully optimized protocol, the desired statisti-cal power and essential difference sensitivity would be chosen,all the costs of additional samples would be weighed against allthe costs of additional measurements, and an optimum balanceof adjustments between these three parameters would be definedthat efficiently and effectively addressed the research question ofthe study.

    As seen in Figure 3, the balance between essential differenceand sample size for various statistical power levels can be plottedas a series of curves. In this way the planned studies’ parame-ters can be visualized in two dimensions and divided into bandedregions corresponding to estimated statistical power. These canbe thought of two dimensional cross sections describing a sliceof a three dimensional volume where the three study parametersare mapped along three axes and divided into volumetric regionscorresponding to estimated statistical power. For any particu-lar statistical power, a surface describing the required parametervalues to achieve that statistical power can be calculated.

    An acceptable statistical power region for this study, given itscontrolled conditions, has been chosen to be between 0.90 and0.95, meaning that a study whose estimated statistical power fallsoutside this range is non-ideally designed. The estimated statis-tical power expected to be achieved by the study can be adjustedby varying one or more of the study parameters while fixing theothers. The process of optimizing the study is then a question ofwhich parameters may need to be treated as fixed, and which aremore or less expensive to adjust than the others.

    Imagine a case where the parameters of a planned study re-sulted in an estimated statistical power of 0.80, below the 0.90lower threshold for the desired statistical power surface. Anyof the three parameters could be adjusted to bring the studyback into the desired state with respect to the estimated statis-tical power outcome. Sample size could be increased, if it werecost effective to purchase, store, and handle more samples. Thenumber of measurements could be increased, if it were cost ef-fective to use equipment and handle the current number of sam-ples more frequently. Lastly the minimal essential differencesthe study aims to detect can be increased, though this changeneeds careful consideration to ensure its scientific justification.

    These changes could be made in combination, as well as ad-justed to counteract one another. Perhaps it is most cost effec-tive to actually reduce the sample size, while compensating forlost statistical power with an increase in the number of measure-ments, to reach an ideal estimated statistical power for a studywith highly expensive samples. Or if a degradation study is al-ready ongoing, and the number of samples cannot be changed,both the essential difference of interest and number of measure-ments could be increased by continuing the study further forwardin time and stress dose in order to push samples to higher levels

    of degradation and gather additional data. The route to an op-timized study will be highly individual and dependent upon theassociated costs and considerations of that particular study.

    Though this example study planning features a single datatype, the rate of change of the YI response, the methodologycould be applied to studies featuring multiple data types as well.If this were the case, pilot studies would be conducted for eachdata type, or at least those expected to require the highest num-ber of samples and measurements. The entire study could beoptimized on the basis of the most demanding data type, or par-ticularly difficult data types could be excluded from the study ifit was deemed too expensive to include them.

    Using these methods, a researcher planning a PV degradationstudy can answer the questions of how many samples and mea-surements are needed to detect a particular essential difference ata specific statistical power level. Conversely, the question of howsmall an essential difference a study is capable of drawing sta-tistically significant conclusions about, given its sample size andnumber of measurements, can also be addressed. These thoughtprocesses can be applied to studies in their planning stages, aswell as currently ongoing studies, to adjust their parameters to-wards acceptable levels of statistical significance.

    V. CONCLUSION

    The field of PV degradation is important and growing. It is im-perative to keep good scientific practice in mind while this pro-cess unfolds and strive to produce valid and useful progress tobuild upon. Methodologies have been developed by statisticiansto help structure studies to produce statistically significant re-sults. These tools can be applied to PV degradation research todevelop data science study protocols, which represent a ratio-nal and justifiable approach to planning PV degradation studies.This work demonstrates the planning of one such data sciencestudy protocol, using statistical power analysis informed by aninitial pilot study, aiming to assess the effect of exposure type ona benchmark PV backsheet material. The methodology is easilyadaptable to other studies exploring the degradation of other PVtechnologies.Acknowledgements: Thanks to DuPont for collaboration and

    samples, and to BAPVC (Bay Area Photovoltaic ConsortiumPrime Award No. DE-EE0004946) for funding.

    REFERENCES

    [1] Eric Fitz. The bottom line impact of PV module reliability. Re-newable Energy World, 2011.

    [2] Todd Woody. Solar industry anxious over defective panels. TheNew York Times, May 2013.

    [3] Dirk C. Jordan. Photovoltaic degradation risk. National Renew-able Energy Laboratory„ Golden, Colo :, 2012.

    [4] Kara A. Shell, Scott A. Brown, Mark A. Schuetz, Robert J. Davis,and Roger H. French. Design and performance of a low-costacrylic reflector for a ~7x concentrating photovoltaic module.pages 81080A–81080A–8, September 2011.

    [5] R.H. French, J.M. Rodríguez-Parada, M.K. Yang, R.A. Derry-berry, and N.T. Pfeiffenberger. Optical properties of polymeric

  • materials for concentrator photovoltaic systems. Solar Energy Ma-terials and Solar Cells, 95(8):2077–2086, August 2011.

    [6] L. S. Bruckman, N. R. Wheeler, J. Ma, E. Wang, C. K. Wang,I. Chou, Jiayang Sun, and R. H. French. Statistical and domainanalytics applied to PV module lifetime and degradation science.IEEE Access, 1:384–403, 2013.

    [7] Myles P. Murray, Laura S. Bruckman, and Roger H. French. Pho-todegradation in a stress and response framework: poly(methylmethacrylate) for solar mirrors and lens. Journal of Photonics forEnergy, 2(1):022004–022004, 2012.

    [8] Yang Hu, M.A. Hosain, T. Jain, Y.R. Gunapati, L. Elkin, G.Q.Zhang, and R.H. French. Global SunFarm data acquisition net-work, energy CRADLE, and time series analysis. In 2013 IEEEEnergytech, pages 1–5, May 2013.

    [9] International PV module quality assurance task force, establishedby NREL, AIST, PVTEC, US DOE, EU JRC, SEMI PV group todevelop PV lifetime qualification testing. 2012.

    [10] Kam L. Wong. What is wrong with the existing reliability predic-tion methods? Quality and Reliability Engineering International,6(4):251–257, September 1990.

    [11] Shein-Chung Chow, Jun Shao, and Hansheng Wang, editors. Sam-ple size calculations in clinical research. Number 11 in Biostatis-tics. Marcel Dekker, New York, 2003.

    [12] Ding-Geng Chen and Karl E Peace. Clinical trial data analysisusing R. CRC Press, Boca Raton, Florida, 2011.

    [13] Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware. Ap-plied Longitudinal Analysis. John Wiley & Sons, August 2011.

    [14] Kristoffer Magnusson. Creating a typical textbook illustration ofstatistical power using either ggplot or base graphics, 2013.

    [15] G03 Committee. Practice for operating fluorescent light appara-tus for UV exposure of nonmetallic materials. Technical report,ASTM International, 1900.

    [16] E12 Committee. Practice for calculating yellowness and whitenessindices from instrumentally measured color coordinates. Techni-cal report, ASTM International, 2010.

    Introduction and BackgroundExperimental MethodsResultsDiscussionConclusion