kuuhana
  • Communities
  • Multi-communities
  • Support Lemmy
  • Search
  • Login
  • Sign Up
Reddit@lemmy.worldbyBlaze@lemmy.blahaj.zone
2 years

Reddit Will License Its Data to Train LLMs, So We Made a Firefox Extension That Lets You Replace Your Comments

theluddite.org English

cross-posted from: https://lemmy.ca/post/19946388

An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.

75
    The Luddite
    theluddite.org
    An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.
    You must log in or register to comment.

    • TropicalDingdong@lemmy.world
      2 years

      Reddit LLM:

      This

      This

      This

        • AwkwardLookMonkeyPuppet@lemmy.worldEnglish
          2 years

          just Google it

            • TropicalDingdong@lemmy.world
              2 years

              Wow thanks kind stranger

                • gravitas_deficiency@sh.itjust.worksEnglish
                  2 years

                  figured it out, it works now

              • Steve@startrek.website
                2 years

                This

                • Wav_function@lemmy.world
                  2 years

                  Ah the old LLM switcheroo

                • AwkwardLookMonkeyPuppet@lemmy.worldEnglish
                  2 years

                  Reddit will not license their data, they will license your data. Reddit doesn’t have any data of its own.

                    • fine_sandy_bottom@discuss.tchncs.de
                      2 years

                      Oh my sweet summer child.

                      They have all your data.

                        • AwkwardLookMonkeyPuppet@lemmy.worldEnglish
                          2 years

                          I think you misunderstood my statement

                            • fine_sandy_bottom@discuss.tchncs.de
                              2 years

                              I don’t think I did.

                              You’re saying they’re licensing your data.

                              I’m saying (very ineloquently), you assigned the rights to your data to them when you posted it. It was never yours from the moment it was created. I’m certain that’s what their t&Cs say.

                                • AwkwardLookMonkeyPuppet@lemmy.worldEnglish
                                  2 years

                                  Oh then in that case, legally, you’re right.

                          • slazer2au@lemmy.worldEnglish
                            2 years

                            If you want to have real fun replace all your comments with eicar test strings.

                              • Blaze@lemmy.blahaj.zoneEnglish
                                2 years

                                That’s a quite good idea

                                • 🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.netEnglish
                                  2 years

                                  I’m gonna use Ipsum Lorem.

                                    • Icalasari@fedia.io
                                      2 years

                                      Nah, put in jailbreaks to dump its data. See if you can make its LLMs have a seizure

                                        • Pandantic [they/them]@midwest.socialEnglish
                                          2 years

                                          Please do you have some handy?

                                    • Lvxferre [he/him]@mander.xyz
                                      2 years

                                      A few highlights that I’d like to make about this tool and its usage. Note: on a prescriptive level I’m focusing on moral matters, not legal ones.

                                      This tool allows you to edit your content. You might have allowed other people and Reddit Inc. to use it, but it’s still yours. And you should be free to do whatever you want with your content, even if it inconveniences others. And people expecting you to give up your moral rights for the sake of their own benefit, frankly, are simply entitled.

                                      Another user here compared this with vandalism; I don’t think that the comparison is good, given that vandalism targets someone else’s property.

                                      I also think that people in general are focusing too much on the short-term consequences of the usage of this tool, and too little on the long-term. Here comes some bullet points hell:

                                      • SEO “improvements” already caught up with the “add «reddit» to search queries!” trick. It’s becoming less effective over time.
                                      • Reddit is accumulating huge amounts of noise, due to increased bot activity and decreased moderation. It’ll likely get worse over time.
                                      • Reddit is walling itself off more and more over time. Eventually this info will become unavailable for anyone who didn’t sell their soul to Greedy Pigboy isn’t feeding that cesspool.
                                      • Every piece of content that you leave in that site is yet another piece of content “inviting” other users to register and stay there, dumping their content into that increasingly walled garden, where it won’t be available publicly. And while they’re free to do so if they so desire (it’s their content), you’re also free to not invite them.
                                      • There are alternatives to that enshittified platform, competing directly with it. (We’re in one, by the way.) We should encourage people to use those alternatives, not Reddit.

                                      Are you all getting the picture? You might be tempted to leave your content in Reddit for the sake of other people; even then, the pros of doing so are rather small, and there are cons not often mentioned.

                                      Regarding LLMs, frankly? I think that it’s mostly a neutral point. Sure, data hoarding bots will get your content from Reddit… but they’ll do it if you post here in the Fediverse, in your blog, or elsewhere. The only alternative to not feeding those bots is to not speak “in the open”.

                                        • AtariDump@lemmy.world
                                          2 years

                                          • nucleative@lemmy.worldEnglish
                                            2 years

                                            Has anyone recently checked the Reddit ToS?

                                            It’s possible that by clicking that submit button, a perpetual worldwide license was granted that included any purpose Reddit deemed worthy.

                                            That could actually include every single version of every comment. Your first post, your ninja edit to correct your spellings, your edit update, and finally your plugin’s update that wipes out your comment. All of this could be data Reddit can provide to LLM researchers.

                                            • fine_sandy_bottom@discuss.tchncs.de
                                              2 years

                                              I think the most important point is that its competent ineffective for thwarting LLMS. They will be trained using the original data.

                                              Also, if any significant portion of users nuked their comment history it would be trivial for reddit to block the user and undo the edits.

                                                • Lvxferre [he/him]@mander.xyz
                                                  2 years

                                                  Also, if any significant portion of users nuked their comment history it would be trivial for reddit to block the user and undo the edits.

                                                  It would be trivial from a procedure standpoint, but not from a social one. It would be really bad reputation for Reddit - “this site doesn’t allow you to remove your content from it”. Problematic specially in Europe.

                                                    • fine_sandy_bottom@discuss.tchncs.de
                                                      2 years

                                                      No one cares about their reputation.

                                                        • Lvxferre [he/him]@mander.xyz
                                                          2 years

                                                          No one cares about their reputation.

                                                          This is blatantly false, as advertisers pulling off from Twitter show. Something similar happened in Reddit a few years ago.

                                                          They do care about brand reputation. Don’t lie (or worse, assume) that they don’t.

                                                            • fine_sandy_bottom@discuss.tchncs.de
                                                              2 years

                                                              Nonsense. What happened with the 3rd party apps thing? Mods were staging strikes, resigning, protesting. Pretty much worst possible case for brand rep.

                                                              They just held their ground, users continued, advertisers didn’t/ don’t care.

                                                              Don’t labour under the illusion that some kind of people power exists.

                                                              For every 1 user that cares about this there are 100s of thousands that just plain don’t care.

                                                    • Makeshift@sh.itjust.works
                                                      2 years

                                                      Making info on Reddit useless to real humans is the main reason I need to set aside time to do this.

                                                      I really don’t care if AI trains off of what I’ve said. I do care that greedy greedy Steve Huffman killed 3rd party apps for it.

                                                      If Reddit’s use for searching obscure stuff goes away, there goes the biggest draw of the site. Get people going elsewhere. Like here!

                                                        • spidermanchild@sh.itjust.works
                                                          2 years

                                                          I don’t have anything useful to add other than Steve Huffman is a greedy pig boy.

                                                        • Grobmobularb@lemmy.world
                                                          2 years

                                                          Fuck Reddit.

                                                            • nehal3m@sh.itjust.works
                                                              2 years

                                                              Fuck /u/spez

                                                            • hperrin@lemmy.world
                                                              2 years

                                                              That’s probably not going to be useful. Reddit keeps your original comment text.

                                                                • tehciolo@lemm.ee
                                                                  2 years

                                                                  I think you missed the part where you were strongly suggested “not” to use copyrighted text.

                                                                  The point is not to get rid of the original text. It’s to “poison” the training data.

                                                                    • Everythingispenguins@lemmy.world
                                                                      2 years

                                                                      Are porn scrips copyrighted?

                                                                      • FaceDeer@fedia.io
                                                                        2 years

                                                                        If the AI trainers have the original text then “poisoning” the live site’s content isn’t going to do anything at all.

                                                                        You can’t touch the original text. It’s already been archived.

                                                                          • tehciolo@lemm.ee
                                                                            2 years

                                                                            If they scrape the updated comments again and ingest copyrighted text, you are poisoning the data.

                                                                              • FaceDeer@fedia.io
                                                                                2 years

                                                                                That’s my point. They won’t.

                                                                                And even if they did, it’s unclear that copyright has anything to say about AI training anyway.

                                                                                  • InternetPerson@lemmings.world
                                                                                    2 years

                                                                                    NYT is currently suing because of copyright infringiments.

                                                                                    https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

                                                                                    it’s unclear that copyright has anything to say about AI training anyway

                                                                                    Although lawmakers worldwide have slept while AI advanced and therefore missed to make some important laws, they are catching up. Europe recently passed its first AI act. As far as I’ve seen it also states that companies must disclose a detailed summary of their training data.

                                                                                    https://www.ml6.eu/blogpost/ai-models-compliance-eu-ai-act

                                                                                      • FaceDeer@fedia.io
                                                                                        2 years

                                                                                        You can sue about anything you want in the United States, it remains to be seen whether the courts will side with them. I think it’s unlikely they’ll get much of a win out of it.

                                                                                        A law that requires disclosing a summary of training data isn’t going to stop anyone from using that training data.

                                                                              • Th4tGuyII@kbin.social
                                                                                2 years

                                                                                Yeah - this is what I was thinking. We all heard about people being unable to delete comments or Reddit keeping comments even after account deletions back during the first migration, so what stops them holding onto comment history - and what stops them using that to teach llms to discern poisoned data from real data as @pixxelkick said.

                                                                                • pixxelkick@lemmy.world
                                                                                  2 years

                                                                                  Yeah in fact you’re giving the llm additional data to train on what poisoned data looks like so it can avoid it better, as they can clear see the before vs after

                                                                                    • InternetPerson@lemmings.world
                                                                                      2 years

                                                                                      It is necessary to employ a method which enables the training procedure to distinguish copyrighted material. In the “dumbest” case, some humans will have to label it.

                                                                                      Just because you’ve edited a comment, doesn’t mean that this can be seen as “oh, this is under copyright now”.

                                                                                      I don’t say it’s technical impossible. To the contrary, it very much is possible. It’s just more work. This drives the development costs up and can give some form of satisfaction to angered ex-reddit users like me. However, those costs will be peanuts for giants like Google / Alphabet.

                                                                                  • pHr34kY@lemmy.world
                                                                                    2 years

                                                                                    do not choose something copyrighted.

                                                                                    Is that with a “nudge, nudge, wink, wink”? It would be such a shame if the whole project were jeopardised by such things.

                                                                                    • brygphilomena@lemmy.world
                                                                                      2 years

                                                                                      This only affects scrapers. If reddit is selling the data, they will just sell the unedited version from their database.

                                                                                      This is ineffective and deleting or editing reddit comments has always been a circle jerk to make yourself feel good that you are “hurting” reddit in some way.

                                                                                        • AliasAKA@lemmy.worldEnglish
                                                                                          2 years

                                                                                          While this is true, I also kind of doubt that Reddit isn’t just one mistake away from accidentally deleting an old db and losing the historical data. So it may in fact mess up their ability to sell the data.

                                                                                          Also potential GDPR violations etc if you’re in the EU

                                                                                            • brygphilomena@lemmy.world
                                                                                              2 years

                                                                                              If they were that close, they wouldn’t run a site which solely relies on the safeguarding of that data. I cannot imagine they don’t know how to handle and backup data.

                                                                                              As for the gdpr, selling the data to an AI company for LLMs is probably anonymized. Or they have a database that does not contain any account information and only the posts. From a cursory read of the gdpr your personal data is your account, not necessarily your posts. If the posts are no longer associated with an account they are free game to reddit.

                                                                                              Ironically, deleting the accounts might make it easier for reddit to use the data.

                                                                                            • johannesvanderwhales@lemmy.world
                                                                                              2 years

                                                                                              And really just hurts people who are searching for actual human answers to questions later.

                                                                                                • Illecors@lemmy.cafeEnglish
                                                                                                  2 years

                                                                                                  It also hurts reddit. Fewer useful lookups on reddit - fewer visits to reddit.

                                                                                                  • Railing5132@lemmy.world
                                                                                                    2 years

                                                                                                    There was a time where there were many sites on the internet; hundreds, thousands even. And someone could search for content in topics they were interested in and find discussions in forums. I hope the internet becomes that again and sites like reddit burn to the ground, their servers salted to never grow again.

                                                                                                    The world recovered from the burning of Alexandria, and it would recover from the death of reddit. And from the rumbling of their new ad injection schemes, the sooner the better.

                                                                                                      • Miguel/ミゲル@pleroma.miguelcr.me
                                                                                                        2 years

                                                                                                        I hope the internet becomes that again and sites like reddit burn to the ground, their servers salted to never grow again.

                                                                                                        Based!!!

                                                                                                  • InternetPerson@lemmings.world
                                                                                                    2 years

                                                                                                    I think I have about 4000 comments on reddit. I’ve stopped using reddit last year in summer when they pushed their fucking API changes; have been on Lemmy since and never looked back. However, I still have the account, because sometimes I had really nice conversations, which I would like to look up once in a while, or to pick up something which I wanted to keep for another time, like a bookmark basically. I’m also one of the people who sometimes write really really much; walls of text as a product of a lot of effort I put in. It would be sad to see it all go away. Then again, fuck reddirt and it’s management.

                                                                                                    Is there a tool to back up my comments (or also the corresponding threads)? After that I’ll gladly use the tool provided by luddite.

                                                                                                      • ikidd@lemmy.worldEnglish
                                                                                                        2 years

                                                                                                        You can request your data from Reddit and they’ll send you a CSV file of all your activity. Takes a couple weeks though.

                                                                                                        • Ice@lemmy.world
                                                                                                          2 years

                                                                                                          You can request to download your data from reddit, and they’ll provide it to you. I did that and made my comments available on github.

                                                                                                            • MaximilianKohler@lemmy.worldEnglish
                                                                                                              2 years

                                                                                                              I did that and made my comments available on github

                                                                                                              How? I’ve been looking for a way to host my data elsewhere.

                                                                                                              I found this website https://www.rareddit.com, but I’m not sure how to do that, and I contacted the author and didn’t get a response.

                                                                                                                • Ice@lemmy.world
                                                                                                                  2 years

                                                                                                                  Instructions for downloading data is here:

                                                                                                                  https://support.reddithelp.com/hc/en-us/articles/360043048352-How-do-I-request-a-copy-of-my-Reddit-data-and-information

                                                                                                                  Submit form here:

                                                                                                                  https://www.reddit.com/settings/data-request

                                                                                                                  then host the data wherever you like (preferrably somewhere it will show up when searched)

                                                                                                                  Then replace every comment/post with instructions on how to find that data.

                                                                                                                  Example of redacted post:

                                                                                                                  https://www.reddit.com/r/paradoxplaza/comments/126ka7a/paradox_wants_to_shut_down_development_studios_in/

                                                                                                                  Results from search:

                                                                                                                  https://duckduckgo.com/?q=reddit-u-iceblade02+github

                                                                                                                  Destination:

                                                                                                                  https://github.com/Iceblade02/reddit-u-iceblade02?tab=readme-ov-file#reddit-u-iceblade02

                                                                                                                    • MaximilianKohler@lemmy.worldEnglish
                                                                                                                      2 years

                                                                                                                      The destination part is the issue. That github link works very poorly. The rareeddit example is much better.

                                                                                                                        • Ice@lemmy.world
                                                                                                                          2 years

                                                                                                                          The rareddit example is much better.

                                                                                                                          I’ll admit rareddit looks nicer and is more convenient for the user - but it doesn’t seem like an option, since (as you said) the author isn’t responding.

                                                                                                                          My data is off reddit (most important part) and findable (bonus).

                                                                                                                • EdibleFriend@lemmy.worlddeleted by creator
                                                                                                                  2 years

                                                                                                                  Lots of stuff like this already exists and has been proven useless. A guy here on lemmy was a big answer type on some tech support sub. He used one of the account scrubbers to nuke his account before he deleted. Went to look again a few weeks later and all his top comment answers had been restored.

                                                                                                                  They haven’t bothered with most people because they simply aren’t useful to making the place look attractive but no mater what you do your comments are stored and will be sold off to the AI companies.

                                                                                                                    • AwkwardLookMonkeyPuppet@lemmy.worldEnglish
                                                                                                                      2 years

                                                                                                                      I’m pretty sure that violates GDPR.

                                                                                                                    • 🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.netEnglish
                                                                                                                      2 years

                                                                                                                      Sucks it only works with the desktop version of Firefox.

                                                                                                                      How fast is it, anyway? I was on Reddit for 11 years and commented with the same frequency I do here. I have so, so much to edit.

                                                                                                                        • WhatAmLemmy@lemmy.worldEnglish
                                                                                                                          2 years

                                                                                                                          I believe you can only edit the last 1000 or so comments from your profile. Anything older than that doesn’t display.

                                                                                                                        • bquintb@midwest.socialEnglish
                                                                                                                          2 years

                                                                                                                          Shit I already deleted my account.

                                                                                                                          • Sam_Bass@lemmy.world
                                                                                                                            2 years

                                                                                                                            They could fall out of a 30 story window for all i care

                                                                                                                            • MehBlah@lemmy.world
                                                                                                                              2 years

                                                                                                                              My comments are not your product. the whole thing I don’t need or want it.

                                                                                                                              Reddit@lemmy.world

                                                                                                                              reddit@lemmy.world

                                                                                                                              Subscribe from remote instance

                                                                                                                              Create post

                                                                                                                              Report community

                                                                                                                              Modlog
                                                                                                                              You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !reddit@lemmy.world

                                                                                                                              News and Discussions about Reddit

                                                                                                                              Welcome to !reddit. This is a community for all news and discussions about Reddit.

                                                                                                                              The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

                                                                                                                              Rules


                                                                                                                              Rule 1- No brigading.

                                                                                                                              **You may not encourage brigading any communities or subreddits in any way. **

                                                                                                                              YSKs are about self-improvement on how to do things.



                                                                                                                              Rule 2- No illegal or NSFW or gore content.

                                                                                                                              **No illegal or NSFW or gore content. **



                                                                                                                              Rule 3- Do not seek mental, medical and professional help here.

                                                                                                                              Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



                                                                                                                              Rule 4- No self promotion or upvote-farming of any kind.

                                                                                                                              That’s it.



                                                                                                                              Rule 5- No baiting or sealioning or promoting an agenda.

                                                                                                                              Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



                                                                                                                              Rule 6- Regarding META posts.

                                                                                                                              Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.



                                                                                                                              Rule 7- You can't harass or disturb other members.

                                                                                                                              If you vocally harass or discriminate against any individual member, you will be removed.

                                                                                                                              Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



                                                                                                                              Rule 8- All comments should try to stay relevant to their parent content.

                                                                                                                              Rule 9- Reposts from other platforms are not allowed.

                                                                                                                              Let everyone have their own content.



                                                                                                                              Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.
                                                                                                                              Visibility: Public

                                                                                                                              This community is visible to everyone.

                                                                                                                              • 165 users / Day
                                                                                                                              • 160 users / Week
                                                                                                                              • 160 users / Month
                                                                                                                              • 6.55K users / 6 months
                                                                                                                              • 1.44K posts
                                                                                                                              • 53.8K comments
                                                                                                                              • 1 local subscriber
                                                                                                                              • 23.1K subscribers
                                                                                                                              • UI: 1.0.0-beta.0
                                                                                                                              • BE: 1.0.0-alpha.20
                                                                                                                              • Modlog
                                                                                                                              • Instances
                                                                                                                              • Docs
                                                                                                                              • Code
                                                                                                                              • join-lemmy.org