g.a.s. readwriteanalyze manip- ulate generalized atomic systems: a tool kit for atomistic simulation...
Post on 04-Jan-2016
215 Views
Preview:
TRANSCRIPT
G.A.S.
Read
Write
Analyze
Manip-ulate
Generalized Atomic Systems: A Tool Kit for Atomistic Simulation DataMichael WatersKatie Sebeck2/20/2013
2
Overview
Traditional Workflow in Molecular Dynamics
Defining the Problem An Interchangeable Approach Aiding Analysis Current Usage
3
Basics of Atomistic Simulations Atoms in boxes
Positions Updated by iteratively
solving F=ma according to empirical force fields
Velocity Type, charge, etc..
System wide data Simulation box Number of atoms Temperature, energy,
pair potentials…
Molecular Dynamics Data
Input
Initial System Data
Coordinates, types, charges,
mass
Interatomic bonds, angles
Run Time Instructions
Interaction Potential
Equations (form, coefficients)
Output
System
Run Data (CPU rate, memory
usage)
System variables (pressure, stress,
temperature)
Atomic trajectories
Atomic characteristics (charge, type)
Per-Atom
Force, PE, KE, stress
Processed
Neighbor lists
Time averaged: Mean squared displacement,
radial distribution function
ALL molecular dynamics data can be contained in ASCII text files
4
5
A Brief Guide to Atomistic File Types
pdb, xyz, mol, cfg, sfd, gro, mdl, LAMMPS read_data, ccm, xsd, cif, car…
6
Through a Traditional Workflow
Generate input file(s)
Run simulation
Analyze output
Visualization/ plotting analysis
• Control file• Structure file• Format depends on program
n=16, 500 Chains, rho=0.7918
8000 atoms3 atom types7500 bonds1 bond types0 angles0 dihedrals0 impropers
0 92.055 xlo xhi0 70.395 ylo yhi0 37.905 zlo zhi
Masses
1 14.0022 14.0023 63.54
Atoms
1 1 2 1.80500000000000 1.80500000000000 1.80500000000000
2 1 1 2.65313400000000 3.07841000000000 1.80500000000000
units realtimestep 1.0atom_style bonddimension 3boundary p p p#---------------Coordinates and Bonds --------------lattice fcc 1.0region 1 block -9.025 -1.805 0 70.395 0 37.905 #N=28read_data n28latpair_style lj/cut 9.805pair_coeff 1 1 0.1431 3.923pair_coeff 2 2 0.1432 3.923pair_coeff 3 3 4.72 2.616pair_modify mix arithmeticbond_style harmonicbond_coeff 1 41.82 1.54group alkane type 1 2group copper type 3neighbor 1.0 binthermo 1thermo_style custom step temp pe ke etotal#minimize 1.0e-4 1.0e-6 100 1000fix hope all nverun 100000
7
Through a Traditional Workflow
Generate input file(s)
Run simulation
Analyze output
Visualization/ plotting analysis
• Information about simulation run in control file• Hardware, software version metadata formatting depends on system configuration• Produces output of overall run statistics
Loop time of 3515.13 on 32 procs for 50000 steps with 107008 atoms
Pair time (%) = 1108.83 (31.5444)Bond time (%) = 78.4225 (2.231)Neigh time (%) = 162.274 (4.61645)Comm time (%) = 1270 (36.1294)Outpt time (%) = 523.248 (14.8856)Other time (%) = 372.363 (10.5931)
Nlocal: 3344 ave 8049 max 0 minHistogram: 16 0 0 0 0 0 2 6 3 5Nghost: 7940.66 ave 15817 max 0 minHistogram: 8 4 4 0 0 0 0 0 8 8Neighs: 862976 ave 2.19776e+06 max 0 minHistogram: 16 0 0 0 0 2 2 6 2 4
8
Through a Traditional Workflow
Generate input file(s)
Run simulation
Analyze output
Visualization/ plotting analysis
• Output files generally dictated by control file• Final structure file• System properties log• Other run-time analysis
outputs• HIGHLY VARIED FORMATING!• Quantitative analysis of output by scripting, MATLAB or Excel
9
Through a Traditional Workflow
Generate input file(s)
Run simulation
Analyze output
Visualization/ plotting analysis
• Output structure file may or may not be in a format which can be fed into visualization software• Many software options available:
• VMD• Avogadro• POVray• VESTA• …
• Analysis output may or may not be in a format which can be parsed by plotting software
10
An Endless Series of Parsing Problems
Input file Convert from something you can
manipulate/generate to something the code can read
Output analysis Typically requires writing new parsing routines Different codes require re-writing scripts
Visualizations May require extract data from other files manually Most visualization code is already equipped to
parse a variety of file types
11
Data from Legacy Code
Locally developed molecular dynamics code, FLX
Trying to port data into another code, LAMMPS
Ctrl+C, Ctrl+V and lots of manual editing… Very time consuming for each file
12
Obstacles to Data Sharing and Reuse
Energy barrier of converting files formats Example: A file downloaded directly from
Protein Data Bank (.pdb) may not be readable by MD code (LAMMPS)
Extracting relevant quantities from available data sets Parsing rules not always clear if unfamiliar
with the format Formats not always well documented
13
Problem Statement
Too much redundant work Too little documentation or code clarity Too much time spent manipulating data
formatting How can we fix this?
14
Our Approach: Interchangeable Libraries
We created a General Atomic System (GAS) class All file read functions generate a GAS object GAS objects are accepted by
Write file functions Analysis functions Manipulation functions
G.A.S.
15
Examining Existing Standards for Commonalities
Positions Type Number of atoms
16
Examining Existing Standards for Commonalities
Positions Type Number of atoms
17
Examining Existing Standards for Commonalities
Positions Type Number of atoms/ end of atoms section
18
Creating a Common Data Structure
GAS class contains System data Internal functions
Trivial ontology Simplicity in data structure is flexibility Internal functions should be as reliable as
possible Obvious and explicit naming schemes
19
Ontological Details
GASSystem Data
.number_of_atoms
.x, .y, .z
.atomic_number
… and many more
Internal Functions.update_number_of_atoms
.fill_id_list
.sort_by_id
… and many more
20
User Time Savings
From read_data to xyz: timing comparisons Manual copy-paste, eliminating excess
columns: 2.15 minutes Calling functions, including typing out
calls: 1.05 minutes Actual function timing: ~6 seconds
21
Aiding Analysis
With all data in standard structure: Write all analysis based on this format Input format independent
Allows reuse of analysis functions Reuse begs for optimization Intended reuse encourages documentation
Nested analyses now possible Modularization saves:
Time Effort Error
22
Traditional Scripting Problems Scripts typically used for:
Quantitative analysis Modifying files to be parsed by various software
Rewriting input/output handling for each script MATLAB, sed, awk and grep are not the
friendliest or fastest parsing tools Lack of commenting Can only be applied to specific file types or
a single file
23
Examples of Scripting
2.5 seconds
24
The Python Version…
0.4 seconds
Once a function is written, can be called in just a few lines by ANY GAS system containing sufficient information
25
CC BY-NC-SA http://www.flickr.com/photos/katieharbath/
26
User Time Savings
Open source and custom function libraries instead of MATLAB allows for brute force parallelization, shifting of load to external resources
Faster run times: 2.5 using bash versus 0.4 in Python
Faster coding times Reuse of functions without additional modifications
needed Eliminating redundant coding efforts
Use of common language promotes code reusability Writing code for “future” self as well as others
27
Ways We’re Using GAS
Polymerization Analyze pair-pair distances Alter system topology Automatically generate system readable file
Iterative system analysis Quantitative analysis of a series of files
Radial distribution functions Density profile Bond length distributions
Automatically generates easily parsed output files Automatic movie rendering
28
Automatic Movie Rendering
29
System Manipulation: Unwrapping Coordinates
30
Moving Forward
More file formats More advanced analysis methods and
functions Density functional theory support Non-spherical particles Collaboration with other groups Better metadata integration
31
Final Thoughts
Our lives are much better Our code is much more consistent Future users have a hope of understanding what we did If you want people to use it, it needs to be USEFUL and
EASY
G.A.S.
top related