A bit too long for my brain but nonetheless it written in plain English, conveys the message very clearly and is definitely a very good read. Thanks for sharing.
bahmanm
Husband, father, kabab lover, history buff, chess fan and software engineer. Believes creating software must resemble art: intuitive creation and joyful discovery.
Views are my own.
- 8 Posts
- 33 Comments
When i read the title, my immediate thought was “Mojolicious project renamed? To a name w/ an emoji!?” 😂
We plan to open-source Mojo progressively over time
Yea, right! I can’t believe that there are people who prefer to work on/with a closed source programming language in 2023 (as if it’s the 80’s.)
… can move faster than a community effort, so we will continue to incubate it within Modular until it’s more complete.
Apparently it was “complete” enough to ask the same “community” for feedback.
I genuinely wonder how they managed to convince enthusiasts to give them free feedback/testing (on github/discord) for something they didn’t have access to the source code.
PS: I didn’t downvote. I simply got upset to see this happening in 2023.
bahmanm@lemmy.mlto Programming@programming.dev•(How) do you automate setting up your local development environment?English1·2 years agoI’ve been using sdkman for about a decade now and am totally pleased w/ it. It does a very good job of managing JDK versions for you and much more, eg SBT, Gradle, Scala, Groovy, Leiningen, SpringBoot, …
Now, technically you could use sdkman in your CI/CD pipeline too but I’d find it a strong smell. I’ve always used dedicated images pre-configured for a particular JDK version in the pipeline.
bahmanm@lemmy.mlto Programming@programming.dev•(How) do you automate setting up your local development environment?English8·2 years agoI work primarily on the JVM & the projects (personal/corporate) I work w/ can be summarised as below:
- Building & running the repo is done on the host using an SCM (software configuration management tool) such as Gradle or SBT.
- The external dependencies of the repo, such as Redis, are managed via a
docker-compose.yml
. - The README contains a short series of commands to do different tasks RE (1)
However one approach that I’ve always been fond of (& apply/advocate wherever I can) is to replace (3) w/ a
Makefile
containing a bunch of standard targets shared across all repos, egtest
,integration-test
. Then Makefiles are thinly customised to fit the repo’s particular repo.This has proven to be very helpful wrt congnitive load (and also CI/CD pipelines): ALL projects, regardless of the toolchain, use the same set of commands, namely
make test
make integration-test
make compose-up
make run
In short (quoting myself here):
Don’t repeat yourself. Make Make make things happen for you!
bahmanm@lemmy.mlOPto Programming@programming.dev•[ANN] persianutils 5.0-SNAPSHOT English1·2 years agoSince I haven’t heard/read about any bugs, I plan to release v5.0.0 on the 13th (😬)
I’ll keep this post, well, posted 🙂
That’s a 404 I’m afraid.
Recently, I’ve found myself posting more often on Mastodon a Lemmy & blog way less - indeed credits go to Fediverse and the mods for making it a safe and welcoming place ❤
Here’s my latest one: https://www.bahmanm.com/2023/07/firefox-profiles-quickly-replicate-your-settings.html
It’s not self-hosted, rather I’m using Google’s blogspot. I used to host my own website and two dozens of clients’ and friends’ until a few years ago (using Plone and Zope.) But at some point, my priorities changed and I retired my rock-solid installations and switched to blogspot.
could not resize shared memory
That means too many chunky parallel maintenance workers are using the memory at the same time (
max_parallel_maintenance_workers
andmaintenance_work_mem
.)VACCUM
ing is a very important part of how PG works; can you try settingmax_parallel_maintenance_workers
to 1 or even 0 (disable parallel altogether) and retry the experiment?I did increase shared_buffers and effective_cache_size with no effect.
That probably rules out the theory of thrashed indices.
https://ctxt.io/2/AABQciw3FA https://ctxt.io/2/AABQTprTEg https://ctxt.io/2/AABQKqOaEg
Since those stats are cumulative, it’s hard to tell anything w/o knowing when was the
SELECT
run. It’d be very helpful if you could run those queries a few times w/ 1min interval and share the output.I did install Prometheus with PG exporter and Grafana…Anything specific you can suggest that I should focus on?
I’d start w/ the 3 tables I mentioned in the previous point and try to find anomalies esp under different workloads. The rest, I’m afraid, is going to be a bit of an investigation and detective work.
If you like, you can give me access to the Grafana dashboard so I can take a look and we can take it from there. It’s going to be totally free of charge of course as I am quite interested in your problem: it’s both a challenge for me and helping a fellow Lemmy user. The only thing I ask is that we report back the results and solution here so that others can benefit from the work.
I used to be in a relatively similar position years ago so I totally relate to what you’ve got to do on a daily basis.
These are the the titles that come to my mind (leaving ths seniority level up to you):
- Software engineer
- Software engineer in Integrations
- Software engineer in Enterprise Applications
- ERP Implementation consultant
Oh, updated the link 🤦♂️
The stock Grafana dashboard for PG is a good starting point. At least, that’s how I started. You really should add new metrics to your dashboard if you really need them as you said.
Don’t forget to install node-exporter too. It gives some important bits of info about the PG host. Again the stock dashboard is a decent one to start w/.
A few things off the top of my head in order of importance:
-
How frequently do you
VACCUM
the database? Have you triedVACCUM
ing a few of times over a 5 min span & see if there are changes to the disk I/O aftewards? -
I’ve got no idea how Lemmy works but “seeding content”, to my mind, possibly means a lot of
INSERT
/UPDATE
s. Is that correct? If yes, there’s a chance you may be thrashing your indices & invalidating them too frequently which triggers a lot of rebuilding which could swallow a very large portion of theshared_buffers
. To rule that out, you can simply bumpshared_buffers
(eg 16GB) &effective_cache_size
and see if it makes any difference. -
Please include a bit more information about PG activity, namely from
pg_stat_activity
,pg_stat_bgwriter
&pg_stat_wal
. -
You’ve got quite a high value for
max_connections
- I don’t believe that’ s the culprit here.
And finally, if possible, I’d highly recommend that you take a few minutes & install Prometheus, Prometheus node exporter, Proemetheus PG exporter and Grafana to monitor the state of your deployment. It’s way easier to find correlations between data points using the said toolset.
-
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Can I accomplish this in a single SQL statement?English2·2 years agoI’m not at my desk ATM but I think this is a prime usecase for
crosstab
s.
Bookmarked!
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structureEnglish2·2 years agoCan you keep this thread posted please? Or you can share a PR link so I can follow up the progress there. Am very interested.
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structureEnglish2·2 years agoHow did it go @RoundSparrow@lemmy.ml? Any breakthroughs/
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structureEnglish1·2 years agopotential to reuse
I have a feeling that it’s going to make a noticeable difference; it’s way cheaper than a
JOIN ... GROUP BY
query.
order they are updated in so that the deepest child gets count updated first
Given the declarative nature of SQL, I’m afraid that’s not possible - at least to my knowledge.
But worry not! That’s why there are stored procedures in almost every RDBMS; to add an imperative flare to the engine.
In purely technical terms, Implementing what you’re thinking about is rather straight-forward in a stored procedure using a
CURSOR
. This could be possibly the quickest win (plus the idea ofCOUNT(*)
if applicable.)
Now, I’d like to suggest a possibly longer route which I think may be more scalable. The idea is based around the fact that comments themselves are utterly more important than the number of child comments.
- The first priority should be to ensure
INSERT/UPDATE/SELECT
are super quick oncomment
andpost
. - The second priority should be to ensure
child_count
is eventually correctly updated when (1) happens. - The last priority should be to try to make (2) as fast as we can while making sure (3) doesn’t interfere w/ (1) and (2) performance.
Before rambling on, I’d like to ask if you think the priorities make sense? If they do, I can elaborate on the implementation.
- The first priority should be to ensure
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structureEnglish2·2 years agoFirst off, IIRC,
COUNT(*)
used to be slightly faster (~10-15%) thanCOUNT(some_column)
in PG. There’s a chance that recent versions of PG have fixed this inconsistency but still worth benchmarking.
Now to the query:
To my mind, we’ve already got
comment_aggregate
which is supposed to store the result of the query shared above, right? Why do we need to run thatSELECT
again instead of simply:-- pseudo-code SELECT ca.id, ca.child_count, ca.path FROM comment_aggregate ca WHERE ca.post_id = :post_id
I think I’m confusing matters here b/c I don’t know lemmy’s DB structure. Is there a link to an ERD/SQL/… you could share so I could take a look and leave more educated replies?
bahmanm@lemmy.mlto PostgreSQL@programming.dev•Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structureEnglish1·2 years agoDISCLAIMER: I’ve never looked at lemmy’s code base. 🤦♂️
I think no matter any possible optimisation to the query (if any), the current design may not be going to scale very well given it traverses all the
comment X comment
space every time a comment is added.To my mind, it works well when there are many shallow comments (ie little nesting/threading) which might not be the best strategy for the content lemmy serves.
Can you share the structures of
comment_aggregates
andcomment
? I feel there’s a good opportunity for denormalisation there which may mean better performance.That said, here’s one concrete idea that crossed my mind and could be worth benchmarking:
- Create an
AFTER UPDATE
trigger oncomment_aggregates
which updates a comment’s immediate parent(s)child_count
(basically increment it by 1.) - Re-write the posted query to only update the immediate parents of the comment being added.
That results in the trigger being run exactly
m
times wherem
is the number of comments of the subtree where the new comment was just added to.Does that make sense?
- Create an
Oh, I see. It kind of makes sense. Who’d need Jython when GraalVM’s out there nowadays!? Well, unless it’s a legacy app 😬
First off, I was ready to close the tab at the slightest suggestion of using Velocity as a metric. That didn’t happen 🙂
I like the idea that metrics should be contained and sustainable. Though I don’t agree w/ the suggested metrics.
In general, it seems they are all designed around the process and not the product. In particular, there’s no mention of the “value unlocked” in each sprint: it’s an important one for an Agile team as it holds Product accountable to understanding of what is the $$$ value of the team’s effort.
The suggested set, to my mind, is formed around the idea of a feature factory line and its efficiency (assuming it is measurable.) It leaves out the “meaning” of what the team achieve w/ that efficiency.
My 2 cents.
Good read nonetheless 👍 Got me thinking about this intriguing topic after a few years.