Skip to content

tickets/SP-2082: Support getting obsloctac visits from the pre-night sim archive #452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 28, 2025

Conversation

ehneilsen
Copy link
Collaborator

I've reworked the start and end times of the obsloctap function to match what @rhiannonlynne proposed to Will:

you call rubin_sim.sim_archive.fetch_obslactap_visits() at any time,
and we return the visits starting from the next sunset after the current time through the 2nd sunrise after that.
So, if you happened to call during the night (after sunset, before sunrise), we will NOT return any visits from the current night, but will return the visits from the next two nights.

@ehneilsen ehneilsen requested a review from rhiannonlynne March 27, 2025 21:40
@ehneilsen ehneilsen marked this pull request as draft March 27, 2025 21:43
@ehneilsen ehneilsen marked this pull request as ready for review March 27, 2025 22:02
Copy link
Member

@rhiannonlynne rhiannonlynne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment and question that is optional, but the error message should be modified (I gave a suggestion) so that people don't think their misspelling means they should try installing lsst.resources.

col_str = ""
for colname in dbcols:
col_str += colname + ", "
col_str = col_str[0:-2] + " "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have recently found that
", ".join([colname for colname in dbcols])
should work for this in a one-liner.
Worth testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just moving what was there, without thinking about it too hard.
What's the list comprehension for? Why not just:

", ".join(dbcols)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh true .. dbcols is already a list here. (was remembering this from a different situation, where the columns were coming back from a query to the consdb schema, and so was not a list).

except ModuleNotFoundError:
raise RuntimeError(
f"Cannot read visits from {db_con}."
"Maybe it does not exist, or maybe you need to install lsst.resources."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error message may need some tweaking.
By far the most common error people are going to make is to try to use a mis-spelled version of the filename, or try to access a database that isn't where they think it is.
Most of these people (who include not-us) will not be trying to load a file from an S3 bucket, but may not be savvy enough to realize that lsst.resources is irrelevant to what they need.
The error message also slightly conflates two different issues - having lsst.resources versus the file existing anywhere (since it could still be a bad or inaccessible uri even with lsst.resources)

Can we tell if the uri is a local file vs. a uri that should be read with lsst.resources? And then give an error message that is as simple as "FileNotFoundError".
If not, then we could indicate why we were trying to use lsst.resources ..

try:
            from lsst.resources import ResourcePath
except ModuleNotFoundError:
            raise RuntimeError(f"Could not find {db_con} as a local file; please install lsst.resources in order to read remote file")
# then try to read the remote file
 with ResourcePath(db_con).as_local() as local_db_path:
                with closing(sqlite3.connect(local_db_path.ospath)) as con:
                    sim_data = pd.read_sql(query, con).to_records(index=False)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error messages lists the two issues as possibilities because it can arise from either cause.

Can we tell if the uri is a local file vs. a uri that should be read with lsst.resources?

I started to write code to do this, and concluded that it was more effort than it was worth, thinking that the user would know whether the file they are trying to read was local so diagnostics to be more specific wouldn't be particularly useful.

A test that is usually right shouldn't be that hard, though: if db_con is a string that does not start with a URI "scheme" designation, its probably intended to be a file name. But, in principle this can fail, because linux files names can have colons in them, so really any URI can be a path to a local file. Naming a file or directory http: or s3: is a foolish and confusing thing to do, but it can happen. It can also fail if support is added for new schemes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ok - I'm ok if we can't decide whether it's local or not ourselves, but the thing to do is to make it clear that we're using lsst.resources for reading remote files only (rather than someone who didn't know what lsst.resources is thinking that they needed to install it to read their misspelled database filename). Because the general users for rubin_sim may very well not know what lsst.resources is or how to install it.


sim_metadata = read_archived_sim_metadata(
archive_uri, num_nights=max_simulation_age, compilation_resource=compilation_uri
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is sim_metadata a pandas dataframe? would it make sense to make it one, so that the section below becomes a query of the dataframe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sim_metadata is a list of dicts. It can be trivially turned into a dataframe, but doing that doesn't really simplify things here. Some of the elements we are concerned with are lists or dicts. Yes, a cell in a pandas DataFrame can be a dict or a list, but dealing with them is awkward, so using a DataFrame ends up making the code harder to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked it so it only tries using ResourcePath if db_con is either not a str, or it can be parsed with pythons URL parser to get a valid scheme. So, it will only mention ResourcePath if it actually needs it.

@ehneilsen ehneilsen merged commit cc8dc54 into main Mar 28, 2025
9 checks passed
@ehneilsen ehneilsen deleted the tickets/SP-2082 branch March 28, 2025 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants