Skip to content

Articles showing dummy date of 12/31/1969 in "Last Update" column #1832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tonycpsu opened this issue Oct 14, 2024 · 38 comments
Open

Articles showing dummy date of 12/31/1969 in "Last Update" column #1832

tonycpsu opened this issue Oct 14, 2024 · 38 comments
Assignees

Comments

@tonycpsu
Copy link

Describe the bug
Since the most recent update, some of my articles are showing dummy date of 12/31/1969 in "Last Update" column.

To Reproduce
Not sure what caused it, but it seemed to appear around when I upgraded to Version 3.9.3 :fd467d52: (8375)

I use Inoreader as my feed backend, if that is relevant.

Screenshots
image

Please complete the following information:

  • Vienna version: Version 3.9.3 :fd467d52: (8375)
  • OS version: 14.1.1 (23B81)

Additional information:

@tonycpsu
Copy link
Author

Downgrading to 3.9.2 seems to fix this, FWIW.

@TAKeanice
Copy link
Contributor

Thank you, we thought we had made sure that no default date of 0 (milliseconds after 1.1.1970 00:00 UTC) can appear anywhere. Can you tell us the feed source as well? Then we can make sure that we refine the feature to avoid that behavior.
Can you add the column for creation date and see what that shows in cases when updated date is 0?

@tonycpsu
Copy link
Author

Feed source is Inoreader -- it seems to affect all of my feeds, not a single RSS source.

Here's what it looks like with 3.9.3 release with published date visible.

image

@TAKeanice TAKeanice self-assigned this Oct 19, 2024
@TAKeanice
Copy link
Contributor

TAKeanice commented Oct 20, 2024

Bildschirmfoto 2024-10-20 um 22 12 17

Another anomaly here: Date published is later than last update
This is the Github feed for recent commits to vienna´s master branch (an atom feed)

@TAKeanice
Copy link
Contributor

I was not able to find the source of such anomalies. According to the current code, if the feed source does not contain the last update, it is set to the publication date if that exists, or to the date when the article was received. For the publication date, if the feed does not contain it, the updated date (if it exists) or the current date is used, whatever is earlier.

@barijaona, @josh64x2 or @Eitot, maybe you can find a reason for the observed anomalies.

I am not sure about my anomaly, it could have been caused by using a development version of Vienna. @tonycpsu did you use any beta version?

@joostdekeijzer
Copy link

Not sure if it's related but using Version 3.9.3 :fd467d52: (8375) I get the date "01-01-1970, 01:00" for both published as updated dates. The source of the feed seems correct?

I think this only happens after a refresh when an article was already there and is updated?

image

@TAKeanice
Copy link
Contributor

TAKeanice commented Oct 26, 2024

Not sure if it's related

It certainly is.

I get the date "01-01-1970, 01:00" for both published as updated dates. The source of the feed seems correct?

Looks like the source is correct, have to look up whether "pubDate" is taken into account though. The date is the same as in the other cases, "0" but in your time zone.

I think this only happens after a refresh when an article was already there and is updated?

That's a good hint and in my case, I had the same gut feeling but couldn't find the code causing that. I will try another time this weekend.

@TAKeanice
Copy link
Contributor

I just realized that the RefreshManager is also manipulating the article dates, which may be causing effects that I have not considered so far.

@joostdekeijzer
Copy link

Hi TAKeanice,

Just to inform you: with Version 3.9.4 :498937af: (8382) I'm still getting reset dates...

From some WordPress blogs I'm following and eg. the example below from which I know it's an updated article.

Screenshot 2024-10-30 at 10 42 45

@TAKeanice
Copy link
Contributor

Hi @joostdekeijzer , thank you for adding that example. I suspect the RefreshManager, which I didn't fix in the last update, is the culprit. I hope to find some time this weekend to figure it out and create a fix.

@barijaona
Copy link
Member

I just subscribed to https://exiftool.org/rss.xml and I do not reproduce the issue…

Capture d’écran 2024-10-31 à 21 38 46

@joostdekeijzer
Copy link

@barijaona

I just subscribed to https://exiftool.org/rss.xml and I do not reproduce the issue…

The issue occurs when an article in the feed is updated by that site. That does not happen very often (and ExifTool is not a very busy feed anyway).

Some other feed I get the 1970 issue with just now (after refresh of article) is https://feeds.feedburner.com/9To5Mac-MacAllDay .

@catxeger
Copy link

Vienna 3.9.4 :49893af: (8382) - I'm seeing this issue regularly with what looks like it may be updated articles as noted above. The dates are sometimes 1969-12-31, and sometimes 1970 (don't have an example to hand). In earlier versions, the provided publication date seemed reasonably appropriate.

image

image

@brianallenlevine
Copy link

If it helps, I've also been encountering "12/31/69, 6:00 PM" dates since a Vienna update or two ago. They're recur regularly in certain feeds, including these two:

https://tidbits.com/feed/
https://blog.documentfoundation.org/feed/

I'm currently using Vienna Version 3.9.4 :498937af: (8382) on a current macOS Sonoma 14.7.1 (23H222).

I'll include a couple screen shots.

image image

(If the screenshots don't appear properly after this comment is posted, I'll try to add them again in a different fashion.)

@TAKeanice
Copy link
Contributor

I have done the change I was talking about a couple of weeks ago, it's just waiting to be released. I'd suggest retesting afterwards.

605dc64

@brianallenlevine
Copy link

Will do. Thanks, @TAKeanice!

@joostdekeijzer
Copy link

Sorry to have to report that with Version 3.9.5 :7784c797: (8414) I'm still experiencing the issue 😢

@brianallenlevine
Copy link

brianallenlevine commented Dec 11, 2024 via email

@TAKeanice
Copy link
Contributor

Can you @brianallenlevine and @joostdekeijzer tell me whether you're seeing this with the old articles that you already received before the upgrade or new ones that were loaded afterwards?

The last update didn't include a script for correction again, and I'm afraid some of the dates of older articles are not salvageable. There's no code any more that I know that would assign strange dates to new articles. That's why it's important for me to know whether the articles with wrong dates are old or new ones.

@brianallenlevine
Copy link

@TAKeanice, I just looked again at the tidbits and documentfoundation links that I posted screenshots of earlier. I still saw the "12/31/69, 6:00 PM" dates in both feeds, in at least some of the same old articles. After this, I unsubscrived and resubscribed both feeds. In the newly resubscribed feeds, I saw fewer articles, with only valid dates.

I do have additional feeds with these invalid dates. Is there perhaps a simpler way of accomplishing an unsubscribe/resubscribe without manually unsubscribing and resubscribing each feed with bad dates? I've looked, but I'm not seeing anything.

Maybe export all my subscriptions, unsubscribe them all, and then re-import would do the trick? I don't mind losing unread/read status and such, if I can get rid of the bad dates.

Thanks!

--Brian

@joostdekeijzer
Copy link

@TAKeanice I'm quite sure it's still happening on new articles. But I did not remove/resubscribe a feed.

See WordPress Core rss feed (Agenda for dec-11 article, published dec-10)

image

@brianallenlevine
Copy link

brianallenlevine commented Dec 13, 2024 via email

@TAKeanice
Copy link
Contributor

Thank you @brianallenlevine . Means one more round of investigation, pulling out some more gray hair and hopefully finding the place that still manipulates the dates. Can you post a screenshot?

@brianallenlevine
Copy link

Here you go, @TAKeanice. Thanks so much for all your effort on this!

image

@joostdekeijzer
Copy link

It's hard to test, but I get the impression that this issue has a bigger change of happening on startup when you have "refresh on startup" enabled or when you press the "Refresh all your subscriptions" button.

@catxeger
Copy link

I'm still seeing the issue with Version 3.9.5 :7784c797: (8414), haven't hit 'refresh all your subscriptions', and it appears to be happening on newly retrieved articles (after 'mark all read' in a given feed).

Oddly, and visible in the screenshot, one of the articles has the wrong date -- 'Trudeau gave a speech to the Liberals' holiday party — but Freeland stole the show' is showing as '2024-06-23, 08:35', but a check of the website shows 'CBC News · Posted: Dec 18, 2024 3:26 PM EST | Last Updated: 8 hours ago' -- off by nearly 6 months!

image

@artcs
Copy link

artcs commented Mar 6, 2025

I'm seeing this issue also on Version 3.9.5 :7784c797: (8414). AFAIKT it happens if an article which existed before is updated with a new published and updated date. Source has correct dates.

Image

@barijaona barijaona self-assigned this Mar 12, 2025
@barijaona
Copy link
Member

I think I found the cause:

in

-(BOOL)updateArticle:(Article *)existingArticle ofFolder:(NSInteger)folderID withArticle:(Article *)articleUpdate

existingArticle may be an incomplete version originating from -minimalCacheForFolder:, and may therefore have neither lastUpdate nor publicationDate.

@TAKeanice
Copy link
Contributor

existingArticle may be an incomplete version originating from -minimalCacheForFolder:

Oh man... I would never have searched in that direction... Let's make sure existingArticle is complete then!

@barijaona
Copy link
Member

Minimalist approach regarding articles' caching makes completely sense when you may have plenty of feeds to update simultaneously, and you have to handle them in memory rather than on disk.

We just need to reproduce the date logic which exists in -addArticle:toFolder: to calculate the new last update date.

It would make sense to keep the existing publication date in database.

@TAKeanice
Copy link
Contributor

I don't think that loading an article from the DB when an update is saved to the DB anyway is too much of an effort.

barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 12, 2025
in `-updateArticle:ofFolder:withArticle:`, `existingArticle` may be an
incomplete version originating from `-minimalCacheForFolder:`, and may
therefore have neither `lastUpdate` nor `publicationDate`

To calculate the new last update date, we reproduce the date logic which
exists in -addArticle:toFolder:

Fix date 1.1.1970 or 31.12.1969 appearing in "Last Update" column
Issue ViennaRSS#1832
@barijaona
Copy link
Member

It is a waste when you may have to handle tens of thousands of articles at a time during a "Refresh All" and you have the possibility to avoid this.

Look thoroughly at commit 9b4c126

@TAKeanice
Copy link
Contributor

I'd rather not alter the last update date if we just don't know the original one because we got the ground truth from some cache. It may override semantically relevant information carried by that date. Assume a feed updates that date only for content changes, not for typo corrections - we would update the date anyway.

Maybe we can write a smarter update query that combines the query and the update into one, then no additional time is wasted by a second query.

barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 15, 2025
in `-updateArticle:ofFolder:withArticle:`, `existingArticle` may be an
incomplete version originating from `-minimalCacheForFolder:`, and may
therefore have neither `lastUpdate` nor `publicationDate`

To calculate the new last update date, we reproduce the date logic which
exists in -addArticle:toFolder:

Fix date 1.1.1970 or 31.12.1969 appearing in "Last Update" column
Issue ViennaRSS#1832
@barijaona
Copy link
Member

I don't really understand your concern.
If the feed provides a last update date, we just use it without questioning its semantics.
It's only when the feed does not provide an update date that we get back to exactly the same calculation that we used on initial publication.

@TAKeanice
Copy link
Contributor

I am thinking about feeds that have a last updated date initially, but an update doesn't have it any more. From what I saw in the code, that's possible for atom feeds for example.

barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 16, 2025
barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 16, 2025
@barijaona

This comment has been minimized.

barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 19, 2025
@barijaona
Copy link
Member

barijaona commented Mar 20, 2025

Another edge case : Wired's RSS feed where they change the title of an article, push it with a new publication date, while the guid reveals that it is in fact an update of a preexisting article.

https://www.wired.com/feed/rss

barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 20, 2025
As suggested by @TAKeanice's comment in issue ViennaRSS#1832
ViennaRSS#1832 (comment)

However, handle the case where the title of an article is changed, and
the updated article is pushed with a new publication date, while the
`guid` value reveals that it is in fact an update of a preexisting article.
barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 24, 2025
in `-updateArticle:ofFolder:withArticle:`, `existingArticle` may be an
incomplete version originating from `-minimalCacheForFolder:`, and may
therefore have neither `lastUpdate` nor `publicationDate`

To calculate the new last update date, we reproduce the date logic which
exists in -addArticle:toFolder:

Fix date 1.1.1970 or 31.12.1969 appearing in "Last Update" column
Issue ViennaRSS#1832
barijaona added a commit to barijaona/vienna-rss that referenced this issue Mar 24, 2025
As suggested by @TAKeanice's comment in issue ViennaRSS#1832
ViennaRSS#1832 (comment)

However, handle the case where the title of an article is changed, and
the updated article is pushed with a new publication date, while the
`guid` value reveals that it is in fact an update of a preexisting article.
barijaona added a commit that referenced this issue Mar 26, 2025
Fix update date / Optimize cache

- Provides a much snappier user interface : prevent some temporary locks when we have simultaneously feed refresh and user selecting a feed
- Fix dates of year 1969 or 1970 in "Last Udpate" column (issue #1832)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants