software citation, reuse and metadata considerations: an exploratory study examining lammps

27
Software Citation, Reuse and Metadata Considerations: An Exploratory Study Examining LAMMPS Kai Li, Jane Greenberg and Xia Lin College of Computing and Informatics Drexel University 10/17/2016 ASIS&T Conference 2016

Upload: kai-li

Post on 16-Apr-2017

211 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Software Citation, Reuse and Metadata Considerations:

An Exploratory Study Examining LAMMPS

Kai Li, Jane Greenberg and Xia LinCollege of Computing and Informatics

Drexel University10/17/2016

ASIS&T Conference 2016

Page 2: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Acknowledgement

Page 3: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

The paper is available at: https://goo.gl/uEP9B7

Page 4: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Research questions

• How is the software LAMMPS described to be (re-)used through citation or mention in scientific studies? – What are the use types of LAMMPS?– What metadata standards can be found in the

natural-language mentions of the software?

Page 5: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Conceptual model

Paper

SoftwareStudy

Describes

Uses

Cites or mentions the use of software, the software itself, or other possibilities

Page 6: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Background

• Research data should be properly cited in research outputs.

• “Software as data”:– Scientific software is an unique research data

object in terms of its positions and functions in the research infrastructure/pipeline.

Page 7: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Software reuse or software use?

Page 8: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

What is cited: dataset vs. data paper?

• In established data repositories, datasets are normally accompanied by official citation instructions, including author, title, date, and DOI, and other descriptive metadata elements.

• Data paper is a “searchable metadata document, describing a particular dataset or a group of datasets, published in the form of a peer-reviewed article in a scholarly journal.” (Chavan & Penev, 2011)– There is a also increasing number of software papers

parallel to the format of data paper.

Page 9: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

LAMMPS

• LAMMPS (large-scale atomic/molecular massively simulator) is a molecular dynamics program created by agreement between Sandia National Laboratories, Lawrence Livermore National Laboratory and three other companies.

• It was released as open source code in 2004. Since then, new features, including those developed by third-parties, have been integrated into the package.

Page 10: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Official citation of LAMMPS

Page 11: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Research method

• Sample: 400 most cited papers on Google Scholar citing the original Plimpton’s paper (Plimpton, 1995).

• All the sentences about LAMMPS in the papers were extracted and coded manually from the sample adopting content analysis method.– Classification scheme of reusing type– Metadata elements about LAMMPS and reuse of

LAMMPS

Page 12: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Scheme of reuse types of LAMMPSCategory Definition

Unspecified reuse The paper reuses LAMMPS as whole in the main study or does not specify which other types of reuse it is.

Modified reuseThe paper uses a modified version of LAMMPS in the main study. The specification of modification may or may not be

specified in the paper.

Benchmark The paper only uses LAMMPS (original or modified version) in the background study.

Cite (or non-use)The paper does not use LAMMPS per se, but just cites either the software or Plimpton’s paper, including those papers that

just use the method represented in the original paper.

Page 13: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Examples of “Unspecified reuse” type

• “LAMMPS was used for all MD simulations. [37]” (McMahon, Cheung & Troise, 2011)

• "LAMMPS [28,29] (Large-Scale Atomic/Molecular Massively Parallel Simulator), developed at Sandia National Laboratories, was used to model [0001] oriented ZnO NWs with diameters ranging from 5 to 20 nm." (Agrawal, Peng & Espinosa, 2009)

Page 14: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Examples of “Modified reuse” type

• "The annealing simulations were performed with LAMMPS (large-scale atomic/molecular massively parallel simulator) code from Plimpton at Sandia (modified to handle our force fields). " (Jang et al., 2004)

• "We would like to thank E. Charlaix and P.-F. Gobin for introducing us to this subject, and Dr. S.J. Plimpton for making publicly available a parallel MD code, [25] a modified version of which was used in the present simulations." (Barrat & Bocquet, 1999)

Page 15: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Examples of “Benchmark” type• "To demonstrate what one should expect of a precise MD

trajectory, the same simulation run is performed using LAMMPS again, but this time on four processor cores in parallel. ... We have compared our GPU implementation against LAMMPS running on a fast parallel cluster, see Fig. 8, and we have shown that the GPU performs at the same level as up to 36 processor cores." (Anderson, Lorenz & Travesset, 2008)

• "In order to compare our GPU version to a well-optimized sequential code, we have also compared our CUDA implementation to LAMMPS." (Liu et al., 2008)

Page 16: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Examples of “Cite” type

• "Research by Plimpton [38], Plimpton and Hendrickson [37], and Hwang et al. [19] shows that this method provides a better speedup than RD, and can be used with good speedups up to hundreds of processors. " (Kale et al., 1999)

• "Software such as LAMMPS [212], IMD [213] and DL_POLY [214] are publicly available to perform large-scale MD simulations on parallel platforms." (Mishin, Asta & Li, 2010)

Page 17: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Category occurrence

Unspecified reuse; 305

Modified reuse; 29

Benchmark; 11Cite; 55

Page 18: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Metadata elements in mentions

• The following three metadata elements are focused in this study:– Version– Parallel/part code– Simulation model used

Page 19: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Version of LAMMPS

• Because of the nature of the citation of LAMMPS, version information is not included in most of the sampled papers: only five papers include any version information; and two of them are in an accurate and full form.

Page 20: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Parallel/part code of LAMMPS

• Out of the three earliest parallel code packages, WARP and ParaDyn were found to be mentioned in the papers; but GranFlow wasn’t mentioned in any sampled paper.

• All of these three packages were integrated into LAMMPS in 2001. (“LAMMPS history, n.d.”)

Page 21: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Summary of parallel code mentionsCode package Papers

WARP (13) Ji & Park, 2006; Park, 2006; Park, Gall & Zimmerman, 2005, 2006; Park & Zimmerman, 2005, 2006 Liang & Zhou, 2006

Tschopp & McDowell, 2008a, 2008b; Tschopp, Spearot & McDowell, 2007; Tschopp, Tucker & McDowell, 2007, 2008

ParaDyn (5) Cao & Ma, 2008; Cao & Wei, 2006, 2007a, 2007b; Cao, Wei & Ma, 2008

Page 22: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Simulation model used

• Software can be seen as a set of code, where research method(s) being implemented.

• Research methods connected to scientific software as a type research object should be traced and studied.

Page 23: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Summary of the top simulation model mentioned

Model Occurrence

Adaptive Intermolecular Reactive Empirical Bond Order (AIREBO) Potential

15

Embedded atom method (EAM) potential 15

Nose-Hoover thermostat 10

Reactive force field (ReaxFF) 7

Velocit-verlet algorithm 6

Page 24: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Conclusions

• There are different kinds of semantic elements (reuse type, version, software relationship, method) in the citation/mention of LAMMPS in research papers.

• The current practice of recording such information is highly incomplete, inconsistent, and sometimes confusing.

Page 25: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

Implications

• As the representation of software and software use in scientific studies, what elements and/or relationship should be included in the future standards of software citation?– Metaphor matters! (Parsons & Fox, 2013)

Page 26: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

REFERENCE• Agrawal, R., Peng, B., & Espinosa, H. D. (2009). Experimental-computational investigation of ZnO nanowires strength

and fracture. Nano Letters, 9(12), 4177–4183.• Anderson, J. A., Lorenz, C. D., & Travesset, A. (2008). General purpose molecular dynamics simulations fully

implemented on graphics processing units. Journal of Computational Physics, 227(10), 5342–5359.• Barrat, J.-L., & Bocquet, L. ’ric. (1999). Influence of wetting properties on hydrodynamic boundary conditions at a

fluid/solid interface. Faraday Discussions, 112, 119–128.• Chavan, V., & Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC

Bioinformatics, 12(15), 1.• Jang, S. S., Molinero, V., Cagin, T., & Goddard, W. A. (2004). Nanophase-segregation and transport in Nafion 117 from

molecular dynamics simulations: effect of monomeric sequence. The Journal of Physical Chemistry B, 108(10), 3149–3157.

• Kalé, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., … Schulten, K. (1999). NAMD2: greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151(1), 283–312.

• Liu, W., Schmidt, B., Voss, G., & Müller-Wittig, W. (2008). Accelerating molecular dynamics simulations using Graphics Processing Units with CUDA. Computer Physics Communications, 179(9), 634–641.

• McMahon, D. P., Cheung, D. L., & Troisi, A. (2011). Why holes and electrons separate so well in polymer/fullerene photovoltaic cells. The Journal of Physical Chemistry Letters, 2(21), 2737–2741.

• Mishin, Y., Asta, M., & Li, J. (2010). Atomistic modeling of interfaces and their impact on microstructure and properties. Acta Materialia, 58(4), 1117–1151.

• Parsons, M. A., & Fox, P. A. (2013). Is data publication the right metaphor? Data Science Journal, 12(0), WDS32–WDS46.• Plimpton, S. (1995). Fast Parallel Algorithms for Short-Range Molecular Dynamics. Journal of Computational Physics,

117(1), 1–19.

Page 27: Software Citation, Reuse and Metadata Considerations:  An Exploratory Study Examining LAMMPS

QUESTION TIME

Or you can also send any question to [email protected]