Molecular data features

From VisItusers.org

Jump to: navigation, search

In recent years, support in VisIt has grown greatly in the areas of atomic and molecular visualization and analysis. This comes in the form of internal data model support, new plots and operators, new and upgraded analysis features, a basic understanding of atomic characteristics, and a variety of file format readers.

Contents

[edit] Molecular Plots and Operators

[edit] The Molecule Plot

The Molecule plot takes as input data with atoms and bonds (stored internally as Vertices and Lines in a VTK PolyData structure) and renders it as spheres and lines/cylinders.

[edit] Examples

Molecule Plot of crotamine, colored by element type, atoms shown with covalent radius, and no bonds.

Molecule plot of crotamine, colored by residue type, atoms proportional to covalent radius emulating the CPK style, bonds colored with adjacent atom color.

Molecule plot of crotamine, colored by a scalar quantity, no atoms shown, and bonds drawn as cylinders. Molecule plot of crotamine, colored by backbone, atoms at same width as thicker cylinder-shaped bonds.

[edit] Atoms tab

In the first tab are the options for how atoms are drawn. The options are:

  • Draw atoms as. The value "Spheres" means to draw spheres using 3D geometry. "Sphere Impostors" means to draw them using a single flat polygon with an image of a sphere -- this requires support from graphics hardware and can introduce some minor graphical artifacts, but is very fast. The value "None" means that you are only interested in seeing the bonds, and would like the atoms themselves not to be drawn.
  • Atom sphere quality. When rendering a "Spheres", this determines the number of polygons used to draw the atom geometry. "Low" corresponds to about a dozen polygons per sphere, "Medium" is several dozen, "High" a couple hundred, and "Super" is about a thousand.
  • Radius based on. "Scalar variable" uses a nodal variable on the data set to determine radius. "Covalent radius" and "Atomic radius" are the atomic properties, and they are calculated using a built-in lookup table in VisIt. "Fixed value" simply uses the value in the text field below as the radius. Note that "Covalent radius" and "Atomic radius" require a discrete nodal field called "element" to exist and contain the atomic number. Also, note that some default values are set due to much molecular data being in units of Angstroms. Depending on your data, you may need to change the atomic/bond radii.
    • Variable for atom radius. This value only applies when "Radius based on" is set to "Scalar variable" and determines which variable shall be used (and multiplied by the scale factor below) as the value for the radius of the rendered atoms.
    • Atom radius scale factor. The value applies when "Radius based on" is not set to "Fixed value". This value multiplies the other value used for radius, whether it's the atomic/covalent radius or based on a scalar variable. Note that the atomic and covalent radii used are in Angstroms, so if your data is in other units, you should apply the appropriate conversion factor here.
    • Fixed atom radius. This value only applies when "Radius based on" is set to "Fixed value". It is the actual radius you want to use to draw the atoms in world coordinate units.

[edit] Bonds tab

The second tab shows the options for how bonds are drawn. These options are:

  • Draw bonds as. The value "None" means you are only interested in seeing the atoms and would like any bonds to be hidden. "Lines" uses geometric lines with no 3D shading, and "Cylinders" uses 3D geometry with 3D shading. "Lines" is much faster but "Cylinders" looks better.
    • Bond cylinder quality. When "Draw bonds as" is set to "Cylinders", this determines the number of polygons used to draw the bonds. "Low" is about three polygons, and "High" is about twenty.
    • Bond radius. When "Draw bonds as" is set to "Cylinders", this determines the thickness of the cylinder in world coordinate units. Note that defaults for these values were chosen due to molecular data commonly being in units of Angstroms. Depending on your data, you may need to change the radius used for rendering atoms and bonds.
    • Bond line width. When "Draw bonds as" is set to "Lines", this determines the thickness of the line used to draw the bonds in terms of a number of pixels.
    • Bond line width. When "Draw bonds as" is set to "Lines", this determines the style of the line used to draw the bonds.
  • Color bonds by. This can be set to "Adjacent atom color", which means that each half of the bond is drawn using the color of the atom to which it is attached. Or, it can be set to a "Single color" chosen at the color selector just to the right of this checkbox.

[edit] Colors tab

The third tab shows the various options for coloring the atoms and bonds. These are:

  • The Discrete colors group is for values which take on integral values. When VisIt encounters a discrete-valued variable, it determines which one of these color tables to use based on the variable name ("element" and "restype", specifically).
    • Element and Residue are specific examples, and are separate because there are conventional color tables widely used. VisIt provides some of these color tables.
    • Other discrete fields catches anything which is not an element or residue type.
  • The Continuous colors group is for values which take on real values.
    • Color table for scalars can be set to any color table, typically a continuous one.
    • The Clamp minimum and 'Clamp maximum check boxes, along with their values, toggle whether to clamp the continuous field to narrow the range to a specific range of values of particular interest, making full use of the color table within that range and clamping anything outside that range to the colors at the min/max extrema of the selected color table.

[edit] The CreateBonds Operator

The CreateBonds operator is used to specify ranges of distances for various types of atoms, and use those ranges to create bonds. The default behavior of this operator is to create a bonds between a Hydrogen and any other species if the atoms are separated by a distance between 0.4 and 1.2 units (e.g. Angstroms) and between a pair of atoms (not including H) if they are between 0.4 and 1.9 units apart. This works well for organic molecules. However, in the below image, the default distances were not useful. In this case, the values were changed to create a bond including H for distances between 0.4 and 1.5 units, and for other species between 0.4 and 2.5 units.

In the above image, you see the attributes window for the CreateBonds operator. The options are:

  • Variable for atomic number. This defaults to element as per the convention, but can be set to any integral variable corresponding to the atomic number of each atom.
  • Maximum bonds per atom. If you specify the wrong distance, each atom might try to bond to many other atoms. To keep an error like this from causing a severe hit to memory and performance, this setting will stop the process before it gets out of hand. The default value is "10", and could safely be set lower in many cases, but is user-settable for unusual cases where >10 bonds are needed on some atoms.
  • The Bonds list contains the bonding pair specifications to algorithm. Each row contains the species "1st" and "2nd", and the "Min" and "Max" distance which could be considered a bond. Note: (a) that a "*" matches any species, (b) it does not matter which species is "1st" and which is "2nd"; the bonds are not unidirectional, and (c) the first match in the list is taken, even if later lines also match, which allows you to specify more specific rules above less-specific rules.
    • For example, if the first line is "H", "*", "0.4", "1.2", then specifies that the algorithm should create a bond between two atoms if either one is Hydrogen and the distance between them is between 0.4 and 1.2 spatial units.
    • As a follow-on to this example, suppose the second line is "*", "*", "0.4", and "1.9". Now suppose two atoms exist, H and O, and they are separated by a distance of 1.5 units. Because one is H, it will match the first line, determine the distance is too great (since it's greater than 1.2), and so it will not create a bond between this atom pair. Since this atom pair matched the first line, the second line is not considered, even though the atom pair matches its criteria.
  • Just below the actual bonding rules list are several buttons:
    • The New button creates a new rule, and Del deletes it.
    • Up moves the currently selected rule up in the list, and Down moves the currently selected rule later in the list. Recall that the order of rules matters because only the first match is considered.
  • The Details section contains controls to set the values for a rule.
    • The "1st" and "2nd" controls pop up the species selection widget shown at the right.
      • To get a wildcard which will match any type of atom, choose "Match any element" at the bottom; it's selectable just as any individual species in the periodic table.
      • Also, note that there is the possibility for some "hinting" to help guide your selection to the viable types of atoms. (This depends on conditions like file format support.) For example, in this screenshot, the "H" and "Si" elements are in boldface, since the file contains only those types of atoms (see the above examples in this section).
    • The "Min" and "Max" fields are standard text widgets.

[edit] The Replicate Operator

The image at the left shows the Replicate operator attributes. The options in this window are:

  • Use provided unit cell vectors. Some file formats specify the vectors for the unit cell (sometimes called "direct lattice" vectors) containing the molecular data in the file. If they are present and this box is checked, then it will use those values instead of the ones specified in this window.
  • Vector for X, Y, and Z are the actual vectors describing the amount to displace for a replication in each of the three axes. (The X, Y, and Z labels are only for disambiguation; there is no requirement that the actual vectors specified be related to their name.)
  • Replications in X, Y, and Z specified the total number of instances of the data set to create. E.g. 1,1,1 specifies the original data set with no replications. 2,1,1 specifies a total of two instances -- one is the original, and the other is a new one created at a displacement of 1x along the "X" vector.
  • Merge into one block when possible. This flag specifies that the output of this operator should be created in a single "chunk", and helps with correct operation of the CreateBonds operator. It is recommended to leave this enabled.
  • For molecular data, periodically replicate atoms at unit cell boundaries. When there are periodic boundary conditions, atoms at the boundaries of the unit cell are, by definition, logically present at the matching opposite boundaries as well. By checking this flag it creates those atoms which, after replication, would still fall in the unit cell's inclusive boundaries.
    • For example, in a periodic unit cell with origin [0,0,0] and dimensions [1,1,1], supposed there is an atom centered on the minimum-Z face, i.e. located at [0.5, 0.5, 0]. Due to the periodic boundary conditions, this means that there should be another instance of this atom at the maximum-Z face, i.e. at [0.5, 0.5, 1]. If you set the number of Z replications to at least 2, then it will create this other instance of the atom as desired. However, it will also create any atoms which lie in the replicated cell between z=1 and z=2. Sometimes you want to replicate just those atoms which are still within the original unit cell after replication (within epsilon). By checking this flag, but leaving the number of replications at 1,1,1, this operator will create the instance of the atom at [0.5, 0.5, 1] without adding the other atoms at z>1.

[edit] Replicate and CreateBonds Examples

(A)
This image shows the original data set, with the original data set's unit cell drawn. (The unit cell happens to be orthogonal, but is not actually axis-aligned). No replications and no bond creation have yet been applied.
(B)
In this image, the Replicate operator was applied, with no replications (i.e. X/Y/Z replication counts remaining at 1,1,1), but with the "periodically replicate atoms at unit cell boundaries" feature enabled.
(C)
Now, the replication values have been changed in this image to "2,1,1", with the replication vectors being used as-specified in the file to correspond to the unit cell of the problem.
(D)
This image shows the incorrect result (missing bonds between unit cell instances) occurring in two conditions: either the CreateBonds operator was applied before replication, or the Replicate operator did not have the "Merge into one block" box checked.
(E)
This shows the correct behavior: the "Merge into one block" box was checked, and the CreateBonds operator was applied after replication, thus allowing bonds to span unit cell instances.

[edit] Other Plots and Operators

The following images show plots and operators you might use to explore your data apart from the Molecule plot and related operators. These examples show charge density and force vectors associated with the raw molecular positions and species, all combined in the same window as a Molecule plot.

[edit] Pseudocolor Plot and ThreeSlice Operator

In this image, the charge density grid is shown using the Pseudocolor Plot, with moderate transparency, after applying the ThreeSlice operator to the grid around a point near the center of the molecule.

[edit] Contour Plot on a 3D Structured Grid

In this image, a Contour plot has been applied the charge density grid, with a single low-density value, and some transparency so that the molecule itself is still visible. Note that if you have more than one variable on your grid, for more flexibility you might choose to use the Isosurface operator over one variable and color using the Pseudocolor plot on a second variable.

[edit] Volume Plot of the 3D Grid

This shows a Volume plot of charge density. Note that the Volume plot has a continuously adjustable opacity and by nature allows farther parts of the data to show through to the front, allowing the whole data set to be involved in the final picture.

[edit] Isocontour Lines on a Slice

Here, we used the Contour plot on a slice through the data, with a thicker line width, and a continuous color table to show the increasing charge density.

[edit] Vector Plot of Forces on Point Data

This image shows a Vector plot of the force vectors on the atomic data itself. Vectors are both colored and sized using the magnitude of the force vector.


[edit] Analysis Capabilities

[edit] Subset Selection

The screenshot at left shows the same plot in two windows, but with different subset selection. The top image shows the standard Molecule plot of a data set. The bottom shows the Molecule plot, but with the "Subset" set to de-select Oxygen atoms.

Various file format reader may present a different set of subsets to the user through VisIt. For example, the Protein Data Bank reader presents compounds, residues, and atom type. The VASP reader presents only the atom type, but is smart enough to restrict the choice to only those elements actually present in the file (while the PDB reader presents all 100+ element types).

[edit] Atomic Color Tables

VisIt includes a variety of color tables, some for continuous variables, and some for discrete variables. For molecular plots, such as ones coloring atoms by their species, VisIt includes color tables which match up with residue types or atomic numbers and have similar colors to conventional ones used. The ones included with VisIt for atomic numbers are called "cpk_rasmol" and "cpk_jmol", and for residue types are "amino_rasmol" and "amino_shapely".

However, you can also create your own. The easiest way is to start by selecting one of these, typing a new name, e.g. "my_atom_colors", and clicking the "New" button. This makes a copy of the selected color table with the new name. You can then edit the colors at will, and when you Save Settings (in the Options menu), it will keep your new color tables in future sessions.

Note that in the screenshot to the left, you see one of the features of the color table editor for atomic data, which is to provide hint labels for the colors in the grid. Normally these are displayed as numbers, but for atomic color tables it will display the element's symbol instead. Note: VisIt assumes if the number of colors matches what's in the provided atomic number color tables (which is 109 in versions before 2.0 and 110 in versions starting at 2.0) that it's an atomic color table. So make sure if you're creating a new atomic color table to create one with the correct number of color values.

[edit] Expressions

[edit] Basic Expression Support

Numeric expressions, created in VisIt's Expressions window, are compatible with molecular data types. For example, if one created the variable "xcoord" as a Scalar, defined as "coords(mesh)[2]" (where "mesh" is the name of the mesh in your data file containing the atomic data), then it will create a new value, centered at the atoms, of the value of the Z coordinate of the atoms.

Molecule Plot of "degree(mesh)-1" (subtracting 1 because the atom itself is a cell in VTK)." Molecule Plot of the X coordinates of the atoms via the expression "coords(mesh)[0]".

[edit] Enumerate Expression

One useful expression for some molecular data files is the "enumerate" expression. The most common use case is if your data file contains only a species type index, such as {0, 1, 2, etc.}, but does not have support for mapping this index to an actual atomic number. In this case, some molecular operations in VisIt, which require an atomic number (often called "element"), will not work. In this case, you can use the "enumerate" function to map, e.g. "0" to "14" (Si), "1" to "80" (Hg), etc. Typically you want to call this new scalar variable "element" as this is the convention VisIt follows by default for this variable (though in some plots/operators you can specify a different one).

For example, the LAMMPS readers and VASP POSCAR reader do not have intrinsic knowledge of which type of atom in the file maps to which atomic number -- but they do report the atom type (0,1,2...) as a variable called "species". To enable the VisIt features which use atomic number, define a new expression, called "element", of type "Scalar Mesh Variable", with the definition "enumerate(species, [14,80,8])", which maps the first type to Si, the second to Hg, and the third to O.

Molecule Plot of "species" directly from file. Note that it's simply a continuous scalar field as far as VisIt is concerned, and can't be used for atomic properites. Molecule Plot of "element" expression defined as an enumeration of "species". Note that the Molecule plot can use this element variable to determine atomic radius.

[edit] Enhanced Rendering

[edit] Plot Quality

Most plots have a number of options which can increase their quality at the cost of performance. Some examples follow.

[edit] Molecule Plot Quality

The first example, on the left (before) vs. on the right (after), shows what increasing the atom and bond rendering quality can do in the Molecule plot.

Image:Mol pretty molplot.png

[edit] Vector Plot Quality

This second example, left (before) vs. right (after), shows what using cylinders for stems, and higher polygon count vector heads, does for the Vector plot.

Image:Mol pretty vecplot.png

[edit] Annotations

This example shows the same plot before and after modifying various annotation properties, such as:

  • switching to a darker, gradient background
  • turning off the 3D bounding box, coordinate axes, and triad
  • disabling database and user information
  • moving the legend, changing its orientation and size
  • adding a time slider progress bar, and text showing the time value

Image:Mol pretty annot.png

[edit] File Export

VisIt has the ability to save windows, not just as image formats like PNG and JPEG, but as data files which can be imported into other tools. Some of these data types can be imported back into visit or other visualization and rendering tools which might have different rendering features of interest for making renderings.

[edit] POV-Ray

One of the exportable data file types in VisIt, after composing your plots in VisIt, is a set of POV-Ray scene description files, which are commented and composed in a manner intended to be tweakable by users to achieve results better than what one could get with a real-time rendering tool. See below for an example.

This image on the left shows a set of atoms and geometry rendered with POV-Ray. This image is a closeup of the previous one, showing reflection, refraction, shadows, and varying surface characteristics.

[edit] Data File Formats

VisIt contains readers for over 100 different scientific, code-specific, and other general file formats. Below are listed several of the most specific to molecular data.

Note that many of these formats have lax restrictions on naming, and VisIt may not automatically detect the file type. To force visit to try your desired file reader (as listed in quotation marks in the section header below), use that reader's name as the input to the "-assume_format" command when launching visit. For example, "visit -assume_format CTRL" will try the LMTO CTRL reader before reverting to its automatic detection code, and "visit -assume_format LAMMPS" will try the two LAMMPS readers first.

[edit] VASP (CHGCAR, POSCAR, OUTCAR) File Formats

The VASP code (http://cms.mpi.univie.ac.at/vasp/), as described in the link, is "a package for performing ab-initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set." Its output is ASCII text in several files, and the VASP reader in visit supports "OUTCAR" and "POSCAR" for varieties of atomic positions and variables, and "CHGCAR" for charge density grids.

Since the charge density grids can get very large, the VisIt CHGCAR reader is actually parallelized to help speed the ASCII-binary conversion process on multi-node machines when using the MPI-enabled version of VisIt's computation engine. It will decompose the grid into as many domains as you have processors, and each will read and process its chunk of data. Since this is an ASCII format, the speedup for the I/O portion will not scale to large numbers of processors, but the decomposition will also help the rest of the pipeline scale in parallel for other compute-intensive operations.

[edit] LAMMPS (input structure and output dump) File Formats

LAMMPS is the "Large-scale Atomic/Molecular Massively Parallel Simulator". See http://lammps.sandia.gov/ for more information. The VisIt LAMMPS reader supports two flavors of data files used with LAMMPS.

The first is the output dump file in Atom style, usually ending in ".dump". Here's a small example of that format with three variables per atom (the final three columns):

ITEM: TIMESTEP
1500
ITEM: NUMBER OF ATOMS
5
ITEM: BOX BOUNDS
0.0 2.0
0.0 3.0
0.0 2.5
ITEM: ATOMS
2 1  0.0 0.0 1.0  0 0 0
4 1  2.0 3.0 2.5  0 0 0
1 2  1.4 0.7 0.0  0 3 1
3 2  0.3 1.0 0.5  0 1 7
5 2  1.7 2.0 0.2  0 7 7

In this example, the second and fourth atoms are of the first species type, and the first, third, and fifth are of a second species. So you'll need to create an enumerate expression to create the atomic numbers needed for various molecular operations. For example, create a variable called "element", of type Scalar Mesh Variable, and define it as "enumerate(species, [1, 8])" -- this maps the first species to hydrogen, and the second to oxygen.

Note that the LAMMPS Atom-style dump has changed: the ITEM line with ATOMS now specifies the columns which were be written out. To continue supporting the old atom-style dump format, the reader assumes a format string of "id type x y z" (i.e. unscaled atom coordinates) if the line only contains the word "ATOM" with no format specified. The new default is "id type xs ys zs" (scaled atom coordinates) for the updated format. See the LAMMPS documentation of the "dump" command for details.

The second format is the input format used for the LAMMPS "read_data" command. Its file extension is not standardized, but can sometimes be ".eam", ".meam", and ".rigid".

Position data on strange chemical

       5       atoms
       2       atom types
       0.0 2.0     xlo xhi
       0.0 3.0     ylo yhi
       0.0 2.5     zlo zhi

Atoms

   2    1      0.0           0.0           1.0
   4    1      2.0           3.0           2.5
   1    2      1.4           0.7           0.0
   3    2      0.3           1.0           0.5
   5    2      1.7           2.0           0.2

(As an aside, note that there is a "proper" EAM file containing pair potentials. Though the "EAM" refers to the embedded atom potential method in both usages, these are different files.)

[edit] The ProteinDataBank (.pdb) File Format

The Protein Data Bank (PDB) archive, at http://www.rcsb.org/, contains molecular files in a standard ASCII format. The format, however, is used for a wide range of molecular data, not just proteins. See http://www.wwpdb.org/docs.html for a full description of the file format. This PDB reader support ATOM, HETATM, HETNAM, MODEL/ENDMDL, TITLE, SOURCE, CONECT, and COMPND directives.

This is a simple example of a 2-compound, 4-element type data file with a single model.

COMPND    First
ATOM      1  N   TYR A   1      27.557 -46.589  10.074  1.00  0.00           N  
ATOM      2  H   TYR A   1      28.603 -46.872   9.068  1.00  0.00           H  
COMPND    Second
ATOM      3  C   TYR A   1      29.675 -45.772   8.980  1.00  0.00           C  
ATOM      4  O   TYR A   1      30.403 -45.678   7.992  1.00  0.00           O

[edit] The XYZ File Format

The .xyz file format is a simple ASCII format used for describing atom positions, species, possibly variables, and possibly with multiple time steps. Here's a simple example file:

   3
Some file comment
H      22.3844     2.0352     0.0000
O      18.4512     3.5123     0.0000
Cu     14.2455     6.1056     7.3436


Note that the first line lists the number of atoms, the second is a comment (or blank), and the third starts the data. In each data line, there is the element name, then the X, Y, and Z coordinates. Note that you may have several variables after the Z coordinate -- VisIt will allow up to 6 extra variables. Below is an example with three extra variables, which will be called "var0" through "var2" inside visit, and can be combined into vectors or included in any other plotting or analysis operation VisIt supports.

   3

H      22.3844     2.0352     0.0000     7   7.8    8
O      18.4512     3.5123     0.0000    12   1.6    9
Cu     14.2455     6.1056     7.3436    10   1.4   10

To support multiple timesteps in a single file, simply concatenate each timestep at the end of the previous one, with no blank lines or other separators. Starting with version 1.12, the VisIt XYZ reader also supports atomic numbers instead of element symbols in the first column and also supports the rather dissimilar CrystalMaker flavor of .xyz file (which we don't describe here).

Wikipedia has a page on the format here: http://en.wikipedia.org/wiki/XYZ_file_format, though it does not mention the possibility of extra variables or multiple timesteps, both of which are supported by VisIt.

[edit] The LMTO CTRL File Format

The CTRL file is a format used by the STUTTGART TB-LMTO program (http://www.fkf.mpg.de/andersen/LMTODOC/LMTODOC.html). LMTO is the linear muffin-tin orbital method used in density functional theory (DFT). This CTRL reader supports the STRUC, CLASS, SITE, ALAT, and PLAT file categories. (See this page for more details.)

[edit] Using the VTK File Format for Molecular Data

The VTK file format is well-understood by VisIt, as it is the underlying low-level data model for many of its internal data types. The VTK structure best used for molecular data is that of a "vtkPolyData" type, where the vertices are the atoms, lines are the bonds (if desired), and fields on the atoms are point data fields. An example of an approximate of a water molecule in the ASCII VTK file format is shown below:

# vtk DataFile Version 3.0
vtk output
ASCII
DATASET POLYDATA

POINTS 3 float
1.0 0.5 1.5
0.2 0.1 0.8
0.4 0.2 2.3

LINES 2 6
2 0 1
2 0 2

VERTICES 3 6
1 0 
1 1 
1 2 

POINT_DATA 3
SCALARS element float
LOOKUP_TABLE default
8 1 1
SCALARS somefield float
LOOKUP_TABLE default
0.687 0.262 0.185

If you have no bonds in the file, or would prefer to use the "CreateBonds" operator to generate them inside visit, simply drop the three lines of text in the "LINES" section of the file. For more detailed information about the VTK formats, see http://www.vtk.org/VTK/img/file-formats.pdf. Note that what are called the "Legacy" formats are both simpler and may be more widely supported than the more recent, and complex, XML formats.

[edit] Acknowledgements

This work was supported in part by the Department of Energy (DOE) Office of Basic Energy Sciences (BES), through the Center for Nanophase Materials Sciences (CNMS) and Oak Ridge National Laboratory (ORNL), as well as the Advanced Simulation and Computing (ASC) Program through Lawrence Livermore National Laboratory (LLNL).

Personal tools