-
Notifications
You must be signed in to change notification settings - Fork 20
Chemspecies update #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chemspecies update #130
Changes from 11 commits
8dad06e
b7436ad
3182af8
9364ac5
d470df7
182df35
50b2427
5448ea8
ce4bd7b
0f01b12
f1e9ff4
1a23198
7287bd8
4f786e2
7df11a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,7 @@ velocity | |
set_velocity! | ||
atomic_number | ||
atomic_symbol | ||
atom_name | ||
element_symbol | ||
element | ||
``` | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,8 @@ export bounding_box, | |
atomic_number, | ||
atomic_symbol, | ||
atomkeys, | ||
hasatomkey | ||
hasatomkey, | ||
atom_name | ||
|
||
|
||
# | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,9 +11,15 @@ | |
|
||
export ChemicalSpecies | ||
|
||
# masses are saved in a different file to not clutter this file | ||
include("isotope_masses.jl") | ||
|
||
""" | ||
Encodes a chemical species by wrapping an integer that represents the atomic | ||
number, the number of protons, and additional unspecified information as a `UInt32`. | ||
number, the number of protons, and additional name as max 4 characters. | ||
|
||
The name variable can be used as atom name in PDB format or | ||
some other way to mark same kind of atoms from one another. | ||
|
||
Constructors for standard chemical elements | ||
```julia | ||
|
@@ -35,13 +41,44 @@ | |
ChemicalSpecies(6; n_neutrons = 7) | ||
ChemicalSpecies(:C13) | ||
# deuterium | ||
ChemicalSpecies(:D) | ||
ChemicalSpecies(:D) | ||
``` | ||
|
||
Constructors for names | ||
```julia | ||
ChemicalSpecies(:C; atom_name=:MyC) | ||
ChemicalSpecies(:C13; atom_name=:MyC) | ||
``` | ||
|
||
Comparisons with different isotopes and names | ||
```julia | ||
# true | ||
ChemicalSpecies(:C13) == ChemicalSpecies(:C) | ||
|
||
# false | ||
ChemicalSpecies(:C12) == ChemicalSpecies(:C13) | ||
|
||
# true | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C) | ||
|
||
# true | ||
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C12) | ||
|
||
# false | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12) | ||
|
||
# true | ||
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C) | ||
|
||
# true | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12; atom_name=:MyC) | ||
``` | ||
|
||
""" | ||
struct ChemicalSpecies | ||
atomic_number::Int16 # = Z = number of protons | ||
nneut::Int16 # number of neutrons | ||
info::UInt32 | ||
n_neutrons::Int16 # number of neutrons | ||
name::UInt32 | ||
end | ||
|
||
|
||
|
@@ -51,86 +88,93 @@ | |
|
||
Base.Broadcast.broadcastable(s::ChemicalSpecies) = Ref(s) | ||
|
||
# better to convert z -> symbol to catch special cases such as D; e.g. | ||
# Should ChemicalSpecies(z) == ChemicalSpecies(z,z,0)? For H this is false. | ||
function ChemicalSpecies(z::Integer; kwargs...) | ||
ChemicalSpecies(_chem_el_info[z].symbol; kwargs...) | ||
function ChemicalSpecies(asymbol::Symbol; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1) | ||
str_symbol = String(asymbol) | ||
tmp = 0 | ||
if length(str_symbol) > 1 && isnumeric(str_symbol[end]) | ||
# we exclude the case where n_neutrons > 99 | ||
if length(str_symbol) > 2 && isnumeric(str_symbol[end]) | ||
tmp = parse(Int, str_symbol[end-1:end]) | ||
str_symbol = str_symbol[1:end-2] | ||
else | ||
tmp = parse(Int, str_symbol[end]) | ||
str_symbol = str_symbol[1:end-1] | ||
end | ||
end | ||
asymbol = Symbol(str_symbol) | ||
z = haskey(_sym2z, asymbol) ? _sym2z[asymbol] : 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use 3-argument get with default More broadly: is this a good idea, I mean typos and small missmatches lead to a zero here, which could lead to subtle bugs, no? Same with the other lookups There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes the subtle bugs are possible here. We could also have it error, if symbol is not recognized and is not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a check for symbols and now symbols that are not recognized throw an ArgumentError. |
||
n_neutrons = tmp == 0 ? n_neutrons : tmp - z | ||
if asymbol in [:D, :T] | ||
z = 1 | ||
n_neutrons = asymbol == :D ? 1 : 2 | ||
end | ||
return ChemicalSpecies(Int(z); atom_name=atom_name, n_neutrons=n_neutrons) | ||
end | ||
|
||
function ChemicalSpecies(z::Integer; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1) | ||
atom_name = String(atom_name) | ||
if length(atom_name) > 4 | ||
throw(ArgumentError("atom_name has to be max 4 characters")) | ||
end | ||
if length(atom_name) == 0 | ||
return ChemicalSpecies(z, n_neutrons, zero(UInt32),) | ||
end | ||
tmp = repeat(' ', 4 - length(atom_name)) * atom_name | ||
tmp2 = SVector{4, UInt8}( tmp[1], tmp[2], tmp[3], tmp[4] ) | ||
name = reinterpret(reshape, UInt32, tmp2)[1] | ||
return ChemicalSpecies(z, n_neutrons, name) | ||
end | ||
|
||
ChemicalSpecies(sym::ChemicalSpecies) = sym | ||
|
||
==(a::ChemicalSpecies, sym::Symbol) = | ||
((a == ChemicalSpecies(sym)) && (Symbol(a) == sym)) | ||
|
||
# -------- fast access to the periodic table | ||
|
||
if length(PeriodicTable.elements) != maximum(el.number for el in PeriodicTable.elements) | ||
error("PeriodicTable.elements is not sorted by atomic number") | ||
function ==(cs1::ChemicalSpecies, cs2::ChemicalSpecies) | ||
if cs1.atomic_number != cs2.atomic_number | ||
return false | ||
elseif (cs1.n_neutrons >= 0 && cs2.n_neutrons >= 0) && cs1.n_neutrons != cs2.n_neutrons | ||
return false | ||
elseif (cs1.name != 0 && cs2.name != 0) && cs1.name != cs2.name | ||
return false | ||
elseif (cs1.n_neutrons < 0 && cs1.name == 0) || (cs2.n_neutrons < 0 && cs2.name == 0) | ||
return true | ||
elseif cs1.n_neutrons != cs2.n_neutrons && cs1.name != cs2.name | ||
return false | ||
else | ||
return true | ||
end | ||
end | ||
|
||
if !all(el.number == i for (i, el) in enumerate(PeriodicTable.elements)) | ||
error("PeriodicTable.elements is not sorted by atomic number") | ||
end | ||
# -------- fast access to the periodic table | ||
|
||
const _chem_el_info = [ | ||
(symbol = Symbol(PeriodicTable.elements[z].symbol), | ||
atomic_mass = PeriodicTable.elements[z].atomic_mass, ) | ||
for z in 1:length(PeriodicTable.elements) | ||
] | ||
const _sym2z = Dict{Symbol, UInt8}( | ||
Symbol(el.symbol) => el.number for el in PeriodicTable.elements | ||
) | ||
|
||
const _sym2z = Dict{Symbol, UInt8}() | ||
for z in 1:length(_chem_el_info) | ||
_sym2z[_chem_el_info[z].symbol] = z | ||
end | ||
const _z2sym = Dict{UInt8, Symbol}( | ||
el.number => Symbol(el.symbol) for el in PeriodicTable.elements | ||
) | ||
|
||
function _nneut_default(z::Integer) | ||
nplusp = round(Int, ustrip(u"u", _chem_el_info[z].atomic_mass)) | ||
return nplusp - z | ||
end | ||
const _z2mass = Dict{UInt8, typeof(PeriodicTable.elements[1].atomic_mass)}( | ||
el.number => el.atomic_mass for el in PeriodicTable.elements | ||
) | ||
|
||
function ChemicalSpecies(sym::Symbol; n_neutrons = -1, info = 0) | ||
_islett(c::Char) = 'A' <= uppercase(c) <= 'Z' | ||
|
||
# TODO - special-casing deuterium to make tests pass | ||
# this should be handled better | ||
if sym == :D | ||
return ChemicalSpecies(1, 1, info) | ||
function Base.Symbol(element::ChemicalSpecies) | ||
tmp = element.atomic_number == 0 ? :X : _z2sym[element.atomic_number] | ||
if element.n_neutrons < 0 | ||
return tmp | ||
end | ||
|
||
# number of neutrons is explicitly specified | ||
if n_neutrons != -1 | ||
if !( all(_islett, String(sym)) && n_neutrons >= 0) | ||
throw(ArgumentError("Invalid arguments for ChemicalSpecies")) | ||
end | ||
Z = _sym2z[sym] | ||
return ChemicalSpecies(Z, n_neutrons, info) | ||
end | ||
|
||
# number of neutrons is encoded in the symbol | ||
str = String(sym) | ||
elem_str = str[1:findlast(_islett, str)] | ||
Z = _sym2z[Symbol(elem_str)] | ||
num_str = str[findlast(_islett, str)+1:end] | ||
if isempty(num_str) | ||
n_neutrons = _nneut_default(Z) | ||
else | ||
n_neutrons = parse(Int, num_str) - Z | ||
if element.atomic_number == 1 && element.n_neutrons == 1 | ||
return :D | ||
end | ||
return ChemicalSpecies(Z, n_neutrons, info) | ||
end | ||
|
||
function Base.Symbol(element::ChemicalSpecies) | ||
str = "$(_chem_el_info[element.atomic_number].symbol)" | ||
if element.nneut != _nneut_default(element.atomic_number) | ||
str *= "$(element.atomic_number + element.nneut)" | ||
if element.atomic_number == 1 && element.n_neutrons == 2 | ||
return :T | ||
end | ||
|
||
# TODO: again special-casing deuterium; to be fixed. | ||
if str == "H2" | ||
return :D | ||
end | ||
|
||
return Symbol(str) | ||
n = element.atomic_number + element.n_neutrons | ||
return Symbol("$tmp$n") | ||
end | ||
|
||
|
||
|
@@ -147,7 +191,21 @@ | |
|
||
Base.convert(::Type{Symbol}, element::ChemicalSpecies) = Symbol(element) | ||
|
||
mass(element::ChemicalSpecies) = _chem_el_info[element.atomic_number].atomic_mass | ||
function mass(element::ChemicalSpecies) | ||
if element.n_neutrons < 0 | ||
if haskey(_z2mass, element.atomic_number) | ||
return _z2mass[element.atomic_number] | ||
else | ||
return 0.0u"u" | ||
end | ||
end | ||
akey = (element.atomic_number, element.n_neutrons) | ||
if haskey(_isotope_masses, akey) | ||
return _isotope_masses[akey] * u"u" | ||
end | ||
return missing | ||
end | ||
|
||
|
||
rich_info(element::ChemicalSpecies) = PeriodicTable.elements[element.atomic_number] | ||
|
||
|
@@ -221,3 +279,29 @@ | |
this may be `:D` while `atomic_number` is still `1`. | ||
""" | ||
atomic_number(sys::AbstractSystem, index) = atomic_number.(species(sys, index)) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quite a few newlines |
||
|
||
|
||
""" | ||
atom_name(species) | ||
atom_name(sys::AbstractSystem, i) | ||
|
||
Return atomic name (`Symbol`) for `species` or vector of names for `sys`. | ||
|
||
Defaults to [`atomic_symbol`](@ref), if `name` field is zero or not defined. | ||
""" | ||
function atom_name(cs::ChemicalSpecies) | ||
if cs.name == 0 | ||
return atomic_symbol(cs) | ||
else | ||
# filter first empty space characters | ||
as_characters = Char.( reinterpret(SVector{4, UInt8}, [cs.name])[1] ) | ||
tmp = String( filter( x -> ! isspace(x), as_characters ) ) | ||
return Symbol( tmp ) | ||
end | ||
end | ||
|
||
atom_name(at) = atomic_symbol(at) | ||
|
||
atom_name(sys::AbstractSystem, index) = atom_name.(species(sys, index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by comment, but why are you limiting the name to 4 characters? This is the limit for the obsolete PDB format, but for example the mmcif format that replaced it does not have any length limitation: https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_chem_comp_atom.atom_id.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to have a bits type you need to have fixed amount of bits. This leads to a limitation on character length, which for now is 4*8bit char. We could of course increase it later on (there are several ways to do this). But the main design goal was to have it as bits type, which e.g., allows moving atom types to GPU etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!