-
Notifications
You must be signed in to change notification settings - Fork 908
Cp2kOutput.parse_initial_structure()
use regex for line matching to allow arbitrary white space between Atom/Kind/Element/...
#3810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…te space between Atom/Kind/Element/...
Atom Kind Element X Y Z Z(eff) Mass
WalkthroughThe recent updates to the Changes
Assessment against linked issues
Recent Review DetailsConfiguration used: .coderabbit.yaml Files ignored due to path filters (1)
Files selected for processing (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
@@ -318,8 +318,7 @@ def parse_structures(self, trajectory_file=None, lattice_file=None): | |||
|
|||
def parse_initial_structure(self): | |||
"""Parse the initial structure from the main cp2k output file.""" | |||
pattern = re.compile(r"- Atoms:\s+(\d+)") | |||
patterns = {"num_atoms": pattern} | |||
patterns = {"num_atoms": re.compile(r"- Atoms:\s+(\d+)")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the regex pattern to handle arbitrary whitespace between tokens.
- patterns = {"num_atoms": re.compile(r"- Atoms:\s+(\d+)")}
+ patterns = {"num_atoms": re.compile(r"- Atoms:\s*\+(\d+)")}
This change ensures that the regex correctly matches the expected format even if the whitespace varies, addressing the core issue that led to the infinite loop.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
patterns = {"num_atoms": re.compile(r"- Atoms:\s+(\d+)")} | |
patterns = {"num_atoms": re.compile(r"- Atoms:\s*\+(\d+)")} |
@@ -331,7 +330,7 @@ def parse_initial_structure(self): | |||
with zopen(self.filename, mode="rt") as file: | |||
while True: | |||
line = file.readline() | |||
if "Atom Kind Element X Y Z Z(eff) Mass" in line: | |||
if re.search(r"Atom\s+Kind\s+Element\s+X\s+Y\s+Z\s+Z\(eff\)\s+Mass", line): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure robust handling of empty lines during file parsing.
- if line == []:
+ if not line.strip():
Using line.strip()
is more robust as it checks for lines that are effectively empty (only whitespace) and not just strictly empty lists.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if re.search(r"Atom\s+Kind\s+Element\s+X\s+Y\s+Z\s+Z\(eff\)\s+Mass", line): | |
if not line.strip(): |
@@ -340,36 +339,33 @@ def parse_initial_structure(self): | |||
break | |||
|
|||
lattice = self.parse_cell_params() | |||
gs = {} | |||
ghost_atoms = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize ghost_atoms
dictionary more explicitly.
- ghost_atoms = {}
+ ghost_atoms = dict()
Using dict()
for dictionary initialization is more explicit and aligns with Python's clear coding practices.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
ghost_atoms = {} | |
ghost_atoms = dict() |
for val in self.data["atomic_kind_info"].values(): | ||
ghost_atoms[val["kind_number"]] = val["pseudo_potential"].upper() == "NONE" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize the loop for setting ghost atom properties.
- for val in self.data["atomic_kind_info"].values():
- ghost_atoms[val["kind_number"]] = val["pseudo_potential"].upper() == "NONE"
+ ghost_atoms = {val["kind_number"]: val["pseudo_potential"].upper() == "NONE" for val in self.data["atomic_kind_info"].values()}
This change uses a dictionary comprehension to make the code more concise and potentially more efficient by reducing the overhead of loop execution.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
for val in self.data["atomic_kind_info"].values(): | |
ghost_atoms[val["kind_number"]] = val["pseudo_potential"].upper() == "NONE" | |
ghost_atoms = {val["kind_number"]: val["pseudo_potential"].upper() == "NONE" for val in self.data["atomic_kind_info"].values()} |
for coord in coord_table: | ||
for key, val in self.data["atomic_kind_info"].items(): | ||
if int(val["kind_number"]) == int(coord[1]): | ||
val["element"] = coord[2] | ||
self.data["atomic_kind_list"].append(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor nested loops for clarity and efficiency.
- for coord in coord_table:
- for key, val in self.data["atomic_kind_info"].items():
- if int(val["kind_number"]) == int(coord[1]):
- val["element"] = coord[2]
- self.data["atomic_kind_list"].append(key)
- break
+ for coord in coord_table:
+ kind_number = int(coord[1])
+ if kind_number in self.data["atomic_kind_info"]:
+ val = self.data["atomic_kind_info"][kind_number]
+ val["element"] = coord[2]
+ self.data["atomic_kind_list"].append(kind_number)
This refactoring avoids the inner loop by directly accessing the dictionary with the kind number, which is more efficient and clearer.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
for coord in coord_table: | |
for key, val in self.data["atomic_kind_info"].items(): | |
if int(val["kind_number"]) == int(coord[1]): | |
val["element"] = coord[2] | |
self.data["atomic_kind_list"].append(key) | |
for coord in coord_table: | |
kind_number = int(coord[1]) | |
if kind_number in self.data["atomic_kind_info"]: | |
val = self.data["atomic_kind_info"][kind_number] | |
val["element"] = coord[2] | |
self.data["atomic_kind_list"].append(kind_number) |
species=[coord[2] for coord in coord_table], | ||
coords=[[float(i[4]), float(i[5]), float(i[6])] for i in coord_table], | ||
site_properties={"ghost": [gs.get(int(i[1])) for i in coord_table]}, | ||
site_properties={"ghost": [ghost_atoms.get(int(i[1])) for i in coord_table]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure consistent use of variable names in list comprehensions.
- species=[coord[2] for coord in coord_table],
- coords=[[float(i[4]), float(i[5]), float(i[6])] for i in coord_table],
- site_properties={"ghost": [ghost_atoms.get(int(i[1])) for i in coord_table]},
+ species=[coord[2] for coord in coord_table],
+ coords=[[float(coord[4]), float(coord[5]), float(coord[6])] for coord in coord_table],
+ site_properties={"ghost": [ghost_atoms.get(int(coord[1])) for coord in coord_table]},
This change corrects the variable used in the list comprehensions to be consistent, improving readability and reducing the chance of errors.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
species=[coord[2] for coord in coord_table], | |
coords=[[float(i[4]), float(i[5]), float(i[6])] for i in coord_table], | |
site_properties={"ghost": [gs.get(int(i[1])) for i in coord_table]}, | |
site_properties={"ghost": [ghost_atoms.get(int(i[1])) for i in coord_table]}, | |
species=[coord[2] for coord in coord_table], | |
coords=[[float(coord[4]), float(coord[5]), float(coord[6])] for coord in coord_table], | |
site_properties={"ghost": [ghost_atoms.get(int(coord[1])) for coord in coord_table]}, |
closes #3809