What is your opinion of the Large Language Model (LLM) argument made by Reddit?

FearTheCron@lemmy.world · 3 years ago

What is your opinion of the Large Language Model (LLM) argument made by Reddit?

msage@programming.dev · 3 years ago

Scraping open content is OK. Search engines have been doing that, it’s their main job.

LLM won’t exist without large inputs, hehe, and the internet is a good source for a big volume of language, most of which can even make sense.

I don’t feel like Reddit should be against LLMs, ignoring their bogus claims. At least I hope GitHub doesn’t share private and licenced repos.

FearTheCron@lemmy.world · 3 years ago

I was wondering if someone would bring up search engine indexing. Google certainly has the upper hand for LLM training data with Reddit’s new API change since they have the comments anyway. This is a big reason I fear these API changes, it is very much concentrating power in the hands of already powerful companies.

msage@programming.dev · 3 years ago

Always has been meme.jpg

I really don’t think Reddit changed because of the AI, it’s just for the IPO, trying to pump and dump it sky high.

It’s really sad when you imagine what we could do as a species, if we could work together instead of trying to one-up each other.

It kind of brings me back to decentralized services, which for me is the ultimate freedom model, and I’m loving this alternative to Reddit.

FearTheCron@lemmy.world · 3 years ago

I hope cross posts are OK. But I am curious about Experienced Dev’s perspective on this as well since the question is rather technical.

Copying my opinion from the other thread in case you don’t want to look at my other thread:

My personal opinion is that high API usage fees hurt open source LLMs (e.g. GPT4All). I would rather not see this new technology monopolized by those who can pay API fees.

Clifspeare@programming.dev · 3 years ago

I’d tend to agree. There are enough barriers to training large models without artificially increase them just because the largest players can afford it.

jmk1ng@programming.dev · edit-2 3 years ago

I think Reddit does have a legitimate argument that the scales have tipped and Reddit eating the costs of “whales” abusing their APIs for for-profit use cases without Reddit being compensated at all is fair.

3P apps using the API at no cost while simultaneously monetizing Reddit’s content by showing their own ads does seem to be taking advantage.

That said, the way Reddit approached this was so scorched earth and bone headed.

For example. Reddit gets 10s of millions of dollars in free content moderation services from volunteers. The moderators of all their biggest subreddits rely on 3P moderation tools since Reddit’s are so poor.

So with the new API policy, they’re asking their unpaid moderators to PAY them for the privilege. It’s such a slap in the face.

Finally to address the original question, Reddit should absolutely block API consumers who are just training their glorified chat bots to regurgitate plagerized content.

flibbertigibbet@feddit.de · 3 years ago

I think the claim is nonsense. If that were their concern they would rather change the usage agreement and maybe take some of them to court.

What they actually did is everything in their power to drive mobile users to their mobile app. They want old fashioned user tracking data for advertising and selling on. Together with more in app ads.

FearTheCron@lemmy.world · 3 years ago

I totally agree that Reddit’s motivation is probably not related to LLMs and the link I posted is more of an excuse than anything. However, I am curious what people think about data scraping and LLMs in general.