how to divorce two impl?

RoundSparrow@lemmy.ml · edit-2 2 years ago

report a bug to lemmy about the broken quoting.

I have, weeks ago.

Consider posting to the official rust playground and creating a shared link.

I did share a link to GitHub, is that not good enough or something? Here is a screen shot for you.

RoundSparrow@lemmy.ml · edit-2 2 years ago

how to divorce two impl?

RoundSparrow@lemmy.ml · 2 years ago

Better file a bug about that

Bug has been open for weeks. Doesn’t seem to be a priority to undo the added HTML sanitation problems.

The code I pasted here on Lemmy is also on GitHub comment without being munged: https://github.com/LemmyNet/lemmy/pull/3865#issuecomment-1683324467

It’s still mostly like the original function it is copied from, just some of the logic has been successfully stripped out. But it’s probably worth looking at the bigger picture of just how much of the Rust conditional logic goes into building this SQL statement.

RoundSparrow@lemmy.ml · edit-2 2 years ago

I need help with Lemmy code, Diesel object - can I remove these SQL joins that aren't needed?

RoundSparrow@lemmy.ml · 2 years ago

Thank you

RoundSparrow@lemmy.ml · edit-2 2 years ago

Good results with this approach. I hadn’t considered the RANK OVER PARTITION BY criteria_a values and it works like a champ. It moves the ORDER BY into the realm of focus (criteria_a) and performance seems decent enough… and it isn’t difficult to read the short statement.

SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
  (
     SELECT id, post_id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank &lt;= 1000
;

Gives me the expected results over the 5+ million test rows I ran it against.

If you could elaborate on your idea of TOP, please do. I’m hoping there might be a way to wall the LIMIT 1000 into the inner query and not have the outer query need to WHERE filter rank on so many results?

RoundSparrow@lemmy.ml · 2 years ago

Ok, I’m doing some reading: https://medium.com/@amulya349/how-to-select-top-n-rows-from-each-category-in-postgresql-39e3cfebb020

RoundSparrow@lemmy.ml · edit-2 2 years ago

Can I accomplish this in a single SQL statement?

RoundSparrow@lemmy.ml · 2 years ago

I found the total table update wasn’t as bad performing as I thought and the API gateway was timing out. I’m still generating larger amounts of test data to see how it performs in edge worst-case situations.

RoundSparrow@lemmy.ml · 2 years ago

I agree there is potential to reuse the child_count from child/grandchild rows. But there has to be some sense to the order they are updated in so that the deepest child gets count updated first?

RoundSparrow@lemmy.ml · 2 years ago

So it turns out that the query was finishing within minutes and the API gateway was timing out. Too many Lemmy SQL statements in my head. On a test system update of all the comment second run just took under 17 seconds for 313617 rows that has some decent reply depth, so it isn’t as bad as I thought.

RoundSparrow@lemmy.ml · edit-2 2 years ago

given it traverses all the comment X comment space every time a comment is added.

The second query I shared is only referenced for maintenance rebuild. The routine update of count does target only the tree that the reply is to:

 select c.id, c.path, count(c2.id) as child_count from comment c
 join comment c2 on c2.path &lt;@ c.path and c2.path != c.path
 and c.path  &lt;@ '0.1'
group by c.id

I found a particularly complex tree with 300 comments. In production database (with generated test data added for this particular comment tree), it is taking .371 seconds every time a new comment is added, here is the result of the SELECT pulled out without the UPDATE:

Obviously with the UPDATE it will take longer than .371 seconds to execute.

RoundSparrow@lemmy.ml · edit-2 2 years ago

Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structure

RoundSparrow