Skip to content

Chemspecies update #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Nov 21, 2024
1 change: 1 addition & 0 deletions docs/src/apireference.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ velocity
set_velocity!
atomic_number
atomic_symbol
atom_name
element_symbol
element
```
Expand Down
3 changes: 2 additions & 1 deletion src/implementation/atom.jl
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ species(atom::Atom) = atom.species

n_dimensions(::Atom{D}) where {D} = D

atom_name(atom::Atom) = atom_name(species(atom))
atomic_symbol(atom::Atom) = atomic_symbol(species(atom))
atomic_number(atom::Atom) = atomic_number(species(atom))
element(atom::Atom) = element(species(atom))
Expand Down Expand Up @@ -51,7 +52,7 @@ function Atom(identifier::AtomId,
position::AbstractVector{L},
velocity::AbstractVector{V} = _default_velocity(position);
species = ChemicalSpecies(identifier),
mass::M=element(species).atomic_mass,
mass::M=mass(species),
kwargs...) where {L <: Unitful.Length, V <: Unitful.Velocity, M <: Unitful.Mass}
Atom{length(position), L, V, M}(position, velocity, species,
mass, Dict(kwargs...))
Expand Down
3 changes: 2 additions & 1 deletion src/interface.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ export bounding_box,
atomic_number,
atomic_symbol,
atomkeys,
hasatomkey
hasatomkey,
atom_name


#
Expand Down
1 change: 1 addition & 0 deletions src/utils/atomview.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ position(v::AtomView) = position(v.system, v.index)
mass(v::AtomView) = mass(v.system, v.index)
species(v::AtomView) = species(v.system, v.index)

atom_name(v::AtomView) = atom_name(species(v))
atomic_symbol(v::AtomView) = atomic_symbol(species(v))
atomic_number(v::AtomView) = atomic_number(species(v))
n_dimensions(v::AtomView) = n_dimensions(v.system)
Expand Down
218 changes: 151 additions & 67 deletions src/utils/chemspecies.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,15 @@

export ChemicalSpecies

# masses are saved in a different file to not clutter this file
include("isotope_masses.jl")

"""
Encodes a chemical species by wrapping an integer that represents the atomic
number, the number of protons, and additional unspecified information as a `UInt32`.
number, the number of protons, and additional name as max 4 characters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the drive-by comment, but why are you limiting the name to 4 characters? This is the limit for the obsolete PDB format, but for example the mmcif format that replaced it does not have any length limitation: https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_chem_comp_atom.atom_id.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to have a bits type you need to have fixed amount of bits. This leads to a limitation on character length, which for now is 4*8bit char. We could of course increase it later on (there are several ways to do this). But the main design goal was to have it as bits type, which e.g., allows moving atom types to GPU etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!


The name variable can be used as atom name in PDB format or
some other way to mark same kind of atoms from one another.

Constructors for standard chemical elements
```julia
Expand All @@ -35,13 +41,44 @@
ChemicalSpecies(6; n_neutrons = 7)
ChemicalSpecies(:C13)
# deuterium
ChemicalSpecies(:D)
ChemicalSpecies(:D)
```

Constructors for names
```julia
ChemicalSpecies(:C; atom_name=:MyC)
ChemicalSpecies(:C13; atom_name=:MyC)
```

Comparisons with different isotopes and names
```julia
# true
ChemicalSpecies(:C13) == ChemicalSpecies(:C)

# false
ChemicalSpecies(:C12) == ChemicalSpecies(:C13)

# true
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C)

# true
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C12)

# false
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12)

# true
ChemicalSpecies(:C12; atom_name=:MyC) == ChemicalSpecies(:C)

# true
ChemicalSpecies(:C; atom_name=:MyC) == ChemicalSpecies(:C12; atom_name=:MyC)
```

"""
struct ChemicalSpecies
atomic_number::Int16 # = Z = number of protons
nneut::Int16 # number of neutrons
info::UInt32
n_neutrons::Int16 # number of neutrons
name::UInt32
end


Expand All @@ -51,86 +88,93 @@

Base.Broadcast.broadcastable(s::ChemicalSpecies) = Ref(s)

# better to convert z -> symbol to catch special cases such as D; e.g.
# Should ChemicalSpecies(z) == ChemicalSpecies(z,z,0)? For H this is false.
function ChemicalSpecies(z::Integer; kwargs...)
ChemicalSpecies(_chem_el_info[z].symbol; kwargs...)
function ChemicalSpecies(asymbol::Symbol; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1)
str_symbol = String(asymbol)
tmp = 0
if length(str_symbol) > 1 && isnumeric(str_symbol[end])
# we exclude the case where n_neutrons > 99
if length(str_symbol) > 2 && isnumeric(str_symbol[end])
tmp = parse(Int, str_symbol[end-1:end])
str_symbol = str_symbol[1:end-2]
else
tmp = parse(Int, str_symbol[end])
str_symbol = str_symbol[1:end-1]
end
end
asymbol = Symbol(str_symbol)
z = haskey(_sym2z, asymbol) ? _sym2z[asymbol] : 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use 3-argument get with default

More broadly: is this a good idea, I mean typos and small missmatches lead to a zero here, which could lead to subtle bugs, no? Same with the other lookups

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the subtle bugs are possible here.

We could also have it error, if symbol is not recognized and is not :X. This would probably be the safest option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for symbols and now symbols that are not recognized throw an ArgumentError.

n_neutrons = tmp == 0 ? n_neutrons : tmp - z
if asymbol in [:D, :T]
z = 1
n_neutrons = asymbol == :D ? 1 : 2
end
return ChemicalSpecies(Int(z); atom_name=atom_name, n_neutrons=n_neutrons)
end

function ChemicalSpecies(z::Integer; atom_name::Symbol=Symbol(""), n_neutrons::Int=-1)
atom_name = String(atom_name)
if length(atom_name) > 4
throw(ArgumentError("atom_name has to be max 4 characters"))
end
if length(atom_name) == 0
return ChemicalSpecies(z, n_neutrons, zero(UInt32),)
end
tmp = repeat(' ', 4 - length(atom_name)) * atom_name
tmp2 = SVector{4, UInt8}( tmp[1], tmp[2], tmp[3], tmp[4] )
name = reinterpret(reshape, UInt32, tmp2)[1]
return ChemicalSpecies(z, n_neutrons, name)
end

ChemicalSpecies(sym::ChemicalSpecies) = sym

==(a::ChemicalSpecies, sym::Symbol) =
((a == ChemicalSpecies(sym)) && (Symbol(a) == sym))

# -------- fast access to the periodic table

if length(PeriodicTable.elements) != maximum(el.number for el in PeriodicTable.elements)
error("PeriodicTable.elements is not sorted by atomic number")
function ==(cs1::ChemicalSpecies, cs2::ChemicalSpecies)
if cs1.atomic_number != cs2.atomic_number
return false
elseif (cs1.n_neutrons >= 0 && cs2.n_neutrons >= 0) && cs1.n_neutrons != cs2.n_neutrons
return false
elseif (cs1.name != 0 && cs2.name != 0) && cs1.name != cs2.name
return false
elseif (cs1.n_neutrons < 0 && cs1.name == 0) || (cs2.n_neutrons < 0 && cs2.name == 0)
return true
elseif cs1.n_neutrons != cs2.n_neutrons && cs1.name != cs2.name
return false
else
return true
end
end

if !all(el.number == i for (i, el) in enumerate(PeriodicTable.elements))
error("PeriodicTable.elements is not sorted by atomic number")
end
# -------- fast access to the periodic table

const _chem_el_info = [
(symbol = Symbol(PeriodicTable.elements[z].symbol),
atomic_mass = PeriodicTable.elements[z].atomic_mass, )
for z in 1:length(PeriodicTable.elements)
]
const _sym2z = Dict{Symbol, UInt8}(
Symbol(el.symbol) => el.number for el in PeriodicTable.elements
)

const _sym2z = Dict{Symbol, UInt8}()
for z in 1:length(_chem_el_info)
_sym2z[_chem_el_info[z].symbol] = z
end
const _z2sym = Dict{UInt8, Symbol}(
el.number => Symbol(el.symbol) for el in PeriodicTable.elements
)

function _nneut_default(z::Integer)
nplusp = round(Int, ustrip(u"u", _chem_el_info[z].atomic_mass))
return nplusp - z
end
const _z2mass = Dict{UInt8, typeof(PeriodicTable.elements[1].atomic_mass)}(
el.number => el.atomic_mass for el in PeriodicTable.elements
)

function ChemicalSpecies(sym::Symbol; n_neutrons = -1, info = 0)
_islett(c::Char) = 'A' <= uppercase(c) <= 'Z'

# TODO - special-casing deuterium to make tests pass
# this should be handled better
if sym == :D
return ChemicalSpecies(1, 1, info)
function Base.Symbol(element::ChemicalSpecies)
tmp = element.atomic_number == 0 ? :X : _z2sym[element.atomic_number]
if element.n_neutrons < 0
return tmp
end

# number of neutrons is explicitly specified
if n_neutrons != -1
if !( all(_islett, String(sym)) && n_neutrons >= 0)
throw(ArgumentError("Invalid arguments for ChemicalSpecies"))
end
Z = _sym2z[sym]
return ChemicalSpecies(Z, n_neutrons, info)
end

# number of neutrons is encoded in the symbol
str = String(sym)
elem_str = str[1:findlast(_islett, str)]
Z = _sym2z[Symbol(elem_str)]
num_str = str[findlast(_islett, str)+1:end]
if isempty(num_str)
n_neutrons = _nneut_default(Z)
else
n_neutrons = parse(Int, num_str) - Z
if element.atomic_number == 1 && element.n_neutrons == 1
return :D
end
return ChemicalSpecies(Z, n_neutrons, info)
end

function Base.Symbol(element::ChemicalSpecies)
str = "$(_chem_el_info[element.atomic_number].symbol)"
if element.nneut != _nneut_default(element.atomic_number)
str *= "$(element.atomic_number + element.nneut)"
if element.atomic_number == 1 && element.n_neutrons == 2
return :T
end

# TODO: again special-casing deuterium; to be fixed.
if str == "H2"
return :D
end

return Symbol(str)
n = element.atomic_number + element.n_neutrons
return Symbol("$tmp$n")
end


Expand All @@ -147,7 +191,21 @@

Base.convert(::Type{Symbol}, element::ChemicalSpecies) = Symbol(element)

mass(element::ChemicalSpecies) = _chem_el_info[element.atomic_number].atomic_mass
function mass(element::ChemicalSpecies)
if element.n_neutrons < 0
if haskey(_z2mass, element.atomic_number)
return _z2mass[element.atomic_number]
else
return 0.0u"u"
end
end
akey = (element.atomic_number, element.n_neutrons)
if haskey(_isotope_masses, akey)
return _isotope_masses[akey] * u"u"
end
return missing
end


rich_info(element::ChemicalSpecies) = PeriodicTable.elements[element.atomic_number]

Expand Down Expand Up @@ -221,3 +279,29 @@
this may be `:D` while `atomic_number` is still `1`.
"""
atomic_number(sys::AbstractSystem, index) = atomic_number.(species(sys, index))


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few newlines



"""
atom_name(species)
atom_name(sys::AbstractSystem, i)

Return atomic name (`Symbol`) for `species` or vector of names for `sys`.

Defaults to [`atomic_symbol`](@ref), if `name` field is zero or not defined.
"""
function atom_name(cs::ChemicalSpecies)
if cs.name == 0
return atomic_symbol(cs)
else
# filter first empty space characters
as_characters = Char.( reinterpret(SVector{4, UInt8}, [cs.name])[1] )
tmp = String( filter( x -> ! isspace(x), as_characters ) )
return Symbol( tmp )
end
end

atom_name(at) = atomic_symbol(at)

Check warning on line 305 in src/utils/chemspecies.jl

View check run for this annotation

Codecov / codecov/patch

src/utils/chemspecies.jl#L305

Added line #L305 was not covered by tests

atom_name(sys::AbstractSystem, index) = atom_name.(species(sys, index))
Loading
Loading