-
Notifications
You must be signed in to change notification settings - Fork 20
Chemspecies update #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Chemspecies update #130
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
8dad06e
basics
tjjarvinen b7436ad
isotope masses
tjjarvinen 3182af8
fix symbols
tjjarvinen 9364ac5
symbols work correctly
tjjarvinen d470df7
fix comparisons and add atomic_name
tjjarvinen 182df35
tests and corrections
tjjarvinen 50b2427
fix julia v1.9
tjjarvinen 5448ea8
change atomic_name to atom_name
tjjarvinen ce4bd7b
fixes and more tests
tjjarvinen 0f01b12
more tests
tjjarvinen f1e9ff4
fix doc build
tjjarvinen 1a23198
fix bug in species creation
tjjarvinen 7287bd8
implement requested changes
tjjarvinen 4f786e2
only allow recognized symbols
tjjarvinen 7df11a4
Merge branch 'master' into chemspecies_update
mfherbst File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,6 +49,7 @@ velocity | |
set_velocity! | ||
atomic_number | ||
atomic_symbol | ||
atom_name | ||
element_symbol | ||
element | ||
``` | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,8 @@ export cell_vectors, | |
atomic_number, | ||
atomic_symbol, | ||
atomkeys, | ||
hasatomkey | ||
hasatomkey, | ||
atom_name | ||
|
||
|
||
# | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,9 +11,15 @@ | |
|
||
export ChemicalSpecies | ||
|
||
# masses are saved in a different file to not clutter this file | ||
include("isotope_masses.jl") | ||
|
||
""" | ||
Encodes a chemical species by wrapping an integer that represents the atomic | ||
number, the number of protons, and additional unspecified information as a `UInt32`. | ||
number, the number of protons, and additional name as max 4 characters. | ||
|
||
The name variable can be used as atom name in PDB format or | ||
some other way to mark same kind of atoms from one another. | ||
|
||
Constructors for standard chemical elements | ||
```julia | ||
|
@@ -35,13 +41,44 @@ | |
ChemicalSpecies(6; n_neutrons = 7) | ||
ChemicalSpecies(:C13) | ||
# deuterium | ||
ChemicalSpecies(:D) | ||
ChemicalSpecies(:D) | ||
``` | ||
|
||
Constructors for names | ||
```julia | ||
ChemicalSpecies(:C; atom_name=:MyC) | ||
ChemicalSpecies(:C13; atom_name=:MyC) | ||
``` | ||
|
||
Comparisons with different isotopes and names | ||
```julia | ||
# true | ||
ChemicalSpecies(:C13) == ChemicalSpecies(:C) | ||
|
||
# false | ||
ChemicalSpecies(:C12) == ChemicalSpecies(:C13) | ||
|
||
# true | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C) | ||
|
||
# true | ||
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C12) | ||
|
||
# false | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12) | ||
|
||
# true | ||
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C) | ||
|
||
# true | ||
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12; atom_name=:MyC) | ||
``` | ||
|
||
""" | ||
struct ChemicalSpecies | ||
atomic_number::Int16 # = Z = number of protons | ||
nneut::Int16 # number of neutrons | ||
info::UInt32 | ||
n_neutrons::Int16 # number of neutrons | ||
name::UInt32 | ||
end | ||
|
||
|
||
|
@@ -51,86 +88,105 @@ | |
|
||
Base.Broadcast.broadcastable(s::ChemicalSpecies) = Ref(s) | ||
|
||
# better to convert z -> symbol to catch special cases such as D; e.g. | ||
# Should ChemicalSpecies(z) == ChemicalSpecies(z,z,0)? For H this is false. | ||
function ChemicalSpecies(z::Integer; kwargs...) | ||
ChemicalSpecies(_chem_el_info[z].symbol; kwargs...) | ||
function ChemicalSpecies(asymbol::Symbol; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1) | ||
str_symbol = String(asymbol) | ||
tmp = 0 | ||
if length(str_symbol) > 1 && isnumeric(str_symbol[end]) | ||
# We only consider cases where number of nucleons is < 1000 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
if length(str_symbol) > 2 && isnumeric(str_symbol[end-1]) | ||
if length(str_symbol) > 3 && isnumeric(str_symbol[end-2]) | ||
if isnumeric(str_symbol[end-3]) | ||
# we now have >= 1000 nucleons so we error | ||
throw( ArgumentError("Number of nucleons is >= 1000") ) | ||
end | ||
tmp = parse(Int, str_symbol[end-2:end]) | ||
str_symbol = str_symbol[1:end-3] | ||
else | ||
tmp = parse(Int, str_symbol[end-1:end]) | ||
str_symbol = str_symbol[1:end-2] | ||
end | ||
else | ||
tmp = parse(Int, str_symbol[end]) | ||
str_symbol = str_symbol[1:end-1] | ||
end | ||
end | ||
asymbol = Symbol(str_symbol) | ||
if ! haskey(_sym2z, asymbol) && ! ( asymbol in (:X, :D, :T) ) | ||
throw( ArgumentError("Could not recognize atom symbol $asymbol") ) | ||
end | ||
z = get(_sym2z, asymbol, 0) | ||
n_neutrons = tmp == 0 ? n_neutrons : tmp - z | ||
if asymbol in [:D, :T] | ||
z = 1 | ||
n_neutrons = asymbol == :D ? 1 : 2 | ||
end | ||
return ChemicalSpecies(Int(z); atom_name=atom_name, n_neutrons=n_neutrons) | ||
end | ||
|
||
function ChemicalSpecies(z::Integer; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1) | ||
atom_name = String(atom_name) | ||
if length(atom_name) > 4 | ||
throw(ArgumentError("atom_name has to be max 4 characters")) | ||
end | ||
if length(atom_name) == 0 | ||
return ChemicalSpecies(z, n_neutrons, zero(UInt32),) | ||
end | ||
tmp = repeat(' ', 4 - length(atom_name)) * atom_name | ||
tmp2 = SVector{4, UInt8}( tmp[1], tmp[2], tmp[3], tmp[4] ) | ||
name = reinterpret(reshape, UInt32, tmp2)[1] | ||
return ChemicalSpecies(z, n_neutrons, name) | ||
end | ||
|
||
ChemicalSpecies(sym::ChemicalSpecies) = sym | ||
|
||
==(a::ChemicalSpecies, sym::Symbol) = | ||
((a == ChemicalSpecies(sym)) && (Symbol(a) == sym)) | ||
|
||
# -------- fast access to the periodic table | ||
|
||
if length(PeriodicTable.elements) != maximum(el.number for el in PeriodicTable.elements) | ||
error("PeriodicTable.elements is not sorted by atomic number") | ||
end | ||
|
||
if !all(el.number == i for (i, el) in enumerate(PeriodicTable.elements)) | ||
error("PeriodicTable.elements is not sorted by atomic number") | ||
function ==(cs1::ChemicalSpecies, cs2::ChemicalSpecies) | ||
if cs1.atomic_number != cs2.atomic_number | ||
return false | ||
elseif (cs1.n_neutrons >= 0 && cs2.n_neutrons >= 0) && cs1.n_neutrons != cs2.n_neutrons | ||
return false | ||
elseif (cs1.name != 0 && cs2.name != 0) && cs1.name != cs2.name | ||
return false | ||
elseif (cs1.n_neutrons < 0 && cs1.name == 0) || (cs2.n_neutrons < 0 && cs2.name == 0) | ||
return true | ||
elseif cs1.n_neutrons != cs2.n_neutrons && cs1.name != cs2.name | ||
return false | ||
else | ||
return true | ||
end | ||
end | ||
|
||
const _chem_el_info = [ | ||
(symbol = Symbol(PeriodicTable.elements[z].symbol), | ||
atomic_mass = PeriodicTable.elements[z].atomic_mass, ) | ||
for z in 1:length(PeriodicTable.elements) | ||
] | ||
# -------- fast access to the periodic table | ||
|
||
const _sym2z = Dict{Symbol, UInt8}() | ||
for z in 1:length(_chem_el_info) | ||
_sym2z[_chem_el_info[z].symbol] = z | ||
end | ||
const _sym2z = Dict{Symbol, UInt8}( | ||
Symbol(el.symbol) => el.number for el in PeriodicTable.elements | ||
) | ||
|
||
function _nneut_default(z::Integer) | ||
nplusp = round(Int, ustrip(u"u", _chem_el_info[z].atomic_mass)) | ||
return nplusp - z | ||
end | ||
const _z2sym = Dict{UInt8, Symbol}( | ||
el.number => Symbol(el.symbol) for el in PeriodicTable.elements | ||
) | ||
|
||
function ChemicalSpecies(sym::Symbol; n_neutrons = -1, info = 0) | ||
_islett(c::Char) = 'A' <= uppercase(c) <= 'Z' | ||
const _z2mass = Dict{UInt8, typeof(PeriodicTable.elements[1].atomic_mass)}( | ||
el.number => el.atomic_mass for el in PeriodicTable.elements | ||
) | ||
|
||
# TODO - special-casing deuterium to make tests pass | ||
# this should be handled better | ||
if sym == :D | ||
return ChemicalSpecies(1, 1, info) | ||
end | ||
|
||
# number of neutrons is explicitly specified | ||
if n_neutrons != -1 | ||
if !( all(_islett, String(sym)) && n_neutrons >= 0) | ||
throw(ArgumentError("Invalid arguments for ChemicalSpecies")) | ||
end | ||
Z = _sym2z[sym] | ||
return ChemicalSpecies(Z, n_neutrons, info) | ||
function Base.Symbol(element::ChemicalSpecies) | ||
tmp = get(_z2sym, element.atomic_number, :X) | ||
if element.n_neutrons < 0 | ||
return tmp | ||
end | ||
|
||
# number of neutrons is encoded in the symbol | ||
str = String(sym) | ||
elem_str = str[1:findlast(_islett, str)] | ||
Z = _sym2z[Symbol(elem_str)] | ||
num_str = str[findlast(_islett, str)+1:end] | ||
if isempty(num_str) | ||
n_neutrons = _nneut_default(Z) | ||
else | ||
n_neutrons = parse(Int, num_str) - Z | ||
if element.atomic_number == 1 && element.n_neutrons == 1 | ||
return :D | ||
end | ||
return ChemicalSpecies(Z, n_neutrons, info) | ||
end | ||
|
||
function Base.Symbol(element::ChemicalSpecies) | ||
str = "$(_chem_el_info[element.atomic_number].symbol)" | ||
if element.nneut != _nneut_default(element.atomic_number) | ||
str *= "$(element.atomic_number + element.nneut)" | ||
if element.atomic_number == 1 && element.n_neutrons == 2 | ||
return :T | ||
end | ||
|
||
# TODO: again special-casing deuterium; to be fixed. | ||
if str == "H2" | ||
return :D | ||
end | ||
|
||
return Symbol(str) | ||
n = element.atomic_number + element.n_neutrons | ||
return Symbol("$tmp$n") | ||
end | ||
|
||
|
||
|
@@ -147,7 +203,14 @@ | |
|
||
Base.convert(::Type{Symbol}, element::ChemicalSpecies) = Symbol(element) | ||
|
||
mass(element::ChemicalSpecies) = _chem_el_info[element.atomic_number].atomic_mass | ||
function mass(element::ChemicalSpecies) | ||
if element.n_neutrons < 0 | ||
return get(_z2mass, element.atomic_number, 0.0u"u") | ||
end | ||
akey = (element.atomic_number, element.n_neutrons) | ||
return get(_isotope_masses, akey, missing) * u"u" | ||
end | ||
|
||
|
||
rich_info(element::ChemicalSpecies) = PeriodicTable.elements[element.atomic_number] | ||
|
||
|
@@ -221,3 +284,27 @@ | |
this may be `:D` while `atomic_number` is still `1`. | ||
""" | ||
atomic_number(sys::AbstractSystem, index) = atomic_number.(species(sys, index)) | ||
|
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quite a few newlines |
||
""" | ||
atom_name(species) | ||
atom_name(sys::AbstractSystem, i) | ||
|
||
Return atomic name (`Symbol`) for `species` or vector of names for `sys`. | ||
|
||
Defaults to [`atomic_symbol`](@ref), if `name` field is zero or not defined. | ||
""" | ||
function atom_name(cs::ChemicalSpecies) | ||
if cs.name == 0 | ||
return atomic_symbol(cs) | ||
else | ||
# filter first empty space characters | ||
as_characters = Char.( reinterpret(SVector{4, UInt8}, [cs.name])[1] ) | ||
tmp = String( filter( x -> ! isspace(x), as_characters ) ) | ||
return Symbol( tmp ) | ||
end | ||
end | ||
|
||
atom_name(at) = atomic_symbol(at) | ||
|
||
atom_name(sys::AbstractSystem, index) = atom_name.(species(sys, index)) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the drive-by comment, but why are you limiting the name to 4 characters? This is the limit for the obsolete PDB format, but for example the mmcif format that replaced it does not have any length limitation: https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_chem_comp_atom.atom_id.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to have a bits type you need to have fixed amount of bits. This leads to a limitation on character length, which for now is 4*8bit char. We could of course increase it later on (there are several ways to do this). But the main design goal was to have it as bits type, which e.g., allows moving atom types to GPU etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!