How can I make sense of this weird looking link in the site's rss feed?

I’m working on making a discord bot that lets me know when a new post was created on this forum, and I had a question about something I found while working on that.

On this website’s rss feed (vexforum.com/posts.rss), there’s a link tag (and apparently the only link tag that my parser can easily find) with an odd structure. As opposed to the normal vexforum.com/t/topic-name/topic-number/post-number, it comes in the format of vexforum.com-post-number. Is there any way to make sense of that, and generate a link to the post from it? For whatever reason, my parser can’t seem to find the link right above it.

image

1 Like

image

This link still seems viable, if your parser could identify the “< pubDate >” callout on the line below and just backtrack a few spaces.

2 Likes

It’s not finding that, and that’s not really something I know how to do. I’ve been playing around with this for a day or so, and the only thing it seems to be able to find is the guid thing.

Is there any way to take that number from the guid and take from it a link to the post?

I really don’t think that’ll be possible. That number changes whenever a post is edited, or a new post comes along, so even if it is possible, it’ll be pretty complicated.

(Of course wait for someone who actually knows what they’re talking abt :slight_smile: )

1 Like

Globally Universal Identifier. Even though you can parse it into a completely valid link, just don’t. The <link> tag is there for a reason.

5 Likes

Taran, obviously, you first want to figure out why you cannot parse the top link with the full URL. There is a good chance that you will run into another similar problem in the near future and you will have to figure it out anyway.

However, if you can parse the guid string and extract the post ID then, I believe, you can retrieve it with this API call:

https://docs.discourse.org/#tag/Posts/paths/~1posts~1{id}.json/get

Then, if you are not afraid to dive right into the depth of Discourse source code, I would suggest you to take a look at this file:

8 Likes

Thanks for the help, but it appears you might have been slightly too late…
image

My bot saw your message

This is, of course, nowhere near done, but I’m much closer than I was a mere hour ago.

11 Likes

I’m using this post as a test for my bot. I would appreciate if it wasn’t flagged as off-topic.

1 Like

I would probably approach this by just scraping the number of posts in the thread (it’s on the sidebar, just find the html tag for that) and whenever it changes, and scrape the lowest post for data. Idk much about how rss feeds work though.

Scraping would also work but considering that the RSS contains the revelvant data you would need to track updates you most likely wouldn’t need to scrape.

2 Likes

This really does make me question if this should be allowed.

I wouldnt be too comfortable with a bot that records even deleted posts. What if I say I don’t give permission for my responses to be recorded by your bot or in your discord server?

1 Like

man are you gonna be mad when you hear about the Wayback Machine

4 Likes

The wayback machine has been going through many lawsuits, and I won’t be surprised if they get shut down in the future. But regardless, doing illegal things, regardless if other people do it, is not a good idea (this is a cliche, even you should likely know that).

For the record, these lawsuits are regarding archive.org’s book distribution services, and are not expected to go anywhere.

I dont think that many people really care that much about banning preserving what people write on random forums.

8 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.