As I use copilot to write software, I have a hard time seeing how it’ll get better than it already is. The fundamental problem of all machine learning is that the training data has to be good enough to solve the problem. So the problems I run into make sense, like:
Copilot can’t read my mind and figure out what I’m trying to do.
I’m working on an uncommon problem where the typical solutions don’t work
Copilot is unable to tell when it doesn’t “know” the answer, because of course it’s just simulating communication and doesn’t really know anything.
2 and 3 could be alleviated, but probably not solved completely with more and better data or engineering changes - but obviously AI developers started by training the models on the most useful data and strategies that they think work best. 1 seems fundamentally unsolvable.
I think there could be some more advances in finding more and better use cases, but I’m a pessimist when it comes to any serious advances in the underlying technology.
Not copilot, but I run into a fourth problem:
4. The LLM gets hung up on insisting that a newer feature of the language I’m using is wrong and keeps focusing on “fixing” it, even though it has access to the newest correct specifications where the feature is explicitly defined and explained.
I’ve also run into this when trying to program in Rust. It just says that the newest features don’t exist and keeps rolling back to an unsupported library.
Oh god yes, ran into this asking for a shell.nix file with a handful of tricky dependencies. It kept trying to do this insanely complicated temporary pull and build from git instead of just a 6 line file asking for the right packages.
Yeah, once you have to question its answer, it’s all over. It got stuck and gave you the next best answer in it’s weights which was absolutely wrong.
You can always restart the convo, re-insert the code and say what’s wrong in a slightly different way and hope the random noise generator leads it down a better path :)
I’m doing some stuff with translation now, and I’m finding you can restart the session, run the same prompt and get better or worse versions of a translation. After a few runs, you can take all the output and ask it to rank each translation on correctness and critique them. I’m still not completely happy with the output, but it does seem that sometime if you MUST get AI to answer the question, there can be value in making it answer it across more than one session.
I completely understand where you’re coming from, and I absolutely agree with you, genAI is copyright infringement on a weapons-grade scale. With that said, though, in my opinion, I don’t know if calling people parasites like this will really convince people, or change anything. I don’t want to tone police you, if you want to tell people to get fucked, then go ahead, but I think being a bit more sympathetic to your fellow programmers and actually trying to help them see things from our perspective might actually change some minds. Just something to think about. I don’t have all the answers, feel free to ignore me. Much love!
You are right. My apologies, and my congratulations for finding the correct “tone” to respond to me ;) The thing is, I am absolutely fed up with especially the bullshit about snake oil vendors selling LLMs as “AI”, and I am much more fed up with corporations on a large scale getting away with - since it’s for profit - what I guess must already be called theft of intellectual property.
When people then use said LLMs to “develop software”, I’m kind of convinced they are about as gone mentally as the MAGA cult and sometimes I just want to vent. However, I chose the word parasite for a reason, because it’s a parasitic way of working: they use the work of other people, which for more specific algorithms, an LLM will reproduce more or less verbatim, while causing harm to such people by basically copy-pasting such code while omitting the license statement - thereby releasing such code (if open source) into the “wild” with an illegally(*) modified license.
illegal of course only in such countries whose legal system respects copyright and license texts in the first place
Considering on top the damage done to the environment by the insane energy consumption for little to no gain, people should not be using LLMs at all. Not even outside coding. This is just another way to contribute missing our climate goals by a wide margin. Wasting energy like this - basically because people are too lazy to think for themselves - actually gets people killed due to extreme weather events.
So yeah, you have a valid point, but also, I am fed up with the egocentric bullshit world that social media has created and that has culminated in what will soon be a totalitarian regime in the country that once brought peace to Europe by defeating the Nazis and doing a PROPER reeducation of the people. Hooray for going off on a tangent…
Ah, I guess I’ll have to question why I am lying to myself then.
Don’t be a douchebag. Don’t use open source without respecting copyrights & licenses. The authors are already providing their work for free. Don’t shit on that legacy.
Ahh right, so when I use copilot to autocomplete the creation of more tests in exactly the same style of the tests I manually created with my own conscious thought, you’re saying that it’s really just copying what someone else wrote? If you really believe that, then you clearly don’t understand how LLMs work.
I know both LLM mechanisms better than you, it would appear, and my point is not so weak that I would have to fabricate a strawman that I then claim is what you said, to proceed to argue the strawman.
Using LLMs trained on other people’s source code is parasitic behaviour and violates copyrights and licenses.
Look, I recognize that it’s possible for LLMs to produce code that is literally someone else’s copyrighted code. However, the way I use copilot is almost exclusively to autocomplete my thoughts. Like, I write enough code until it guesses what I was about to write next. If that happens to be open source code that someone else has written, then it is complete coincidence that I thought of writing that code. Not all thoughts are original.
Further, whether I should be at fault for LLM vendors who may be breaking copyright law, is like trying to make a case for me being at fault for murder because I drive a car when car manufacturers lobby to the effect that people die more.
Agreed, and I am also 100% opposed to SW patents. No matter what I wrote, if someone came up with the same idea on their own, and finds out about my implementation later, I absolutely do not expect them to credit me. In the use case you describe, I do not see a problem of using other people’s work in a license breaking way. I do however see a waste of time - you have to triple check everything an LLM spits out - and energy (ref: MS trying to buy / restart a nuclear reactor to power their LLM hardware).
Further, whether I should be at fault for LLM vendors who may be breaking copyright law, is like trying to make a case for me being at fault for murder because I drive a car when car manufacturers lobby to the effect that people die more.
If you unknowingly buy stolen (fenced) goods, if found out, you will have to return them to the rightful owner without getting your money back - that you would then have to try and get back from the vendor.
In the case of license agreements, you would still be participant to a license violation - and if you consider a piece of code that would be well-recognizable, just think about the following thought experiment:
Assume someone trained the LLM on some source code Disney uses for whatever. Your code gets autocompleted with that and you publish it, and Disney finds out about it. Do you honestly think that the evil motherfuckers at Disney would stop at anything short of having your head served on a silver platter?
As I use copilot to write software, I have a hard time seeing how it’ll get better than it already is. The fundamental problem of all machine learning is that the training data has to be good enough to solve the problem. So the problems I run into make sense, like:
2 and 3 could be alleviated, but probably not solved completely with more and better data or engineering changes - but obviously AI developers started by training the models on the most useful data and strategies that they think work best. 1 seems fundamentally unsolvable.
I think there could be some more advances in finding more and better use cases, but I’m a pessimist when it comes to any serious advances in the underlying technology.
Try writing comments
Not copilot, but I run into a fourth problem:
4. The LLM gets hung up on insisting that a newer feature of the language I’m using is wrong and keeps focusing on “fixing” it, even though it has access to the newest correct specifications where the feature is explicitly defined and explained.
I’ve also run into this when trying to program in Rust. It just says that the newest features don’t exist and keeps rolling back to an unsupported library.
Oh god yes, ran into this asking for a shell.nix file with a handful of tricky dependencies. It kept trying to do this insanely complicated temporary pull and build from git instead of just a 6 line file asking for the right packages.
“This code is giving me a return value of X instead of Y”
“Ah the reason you’re having trouble is because you initialized this list with brackets instead of
new()
.”“How would a syntax error give me an incorrect return”
“You’re right, thanks for correcting me!”
“Ok so like… The problem though.”
Yeah, once you have to question its answer, it’s all over. It got stuck and gave you the next best answer in it’s weights which was absolutely wrong.
You can always restart the convo, re-insert the code and say what’s wrong in a slightly different way and hope the random noise generator leads it down a better path :)
I’m doing some stuff with translation now, and I’m finding you can restart the session, run the same prompt and get better or worse versions of a translation. After a few runs, you can take all the output and ask it to rank each translation on correctness and critique them. I’m still not completely happy with the output, but it does seem that sometime if you MUST get AI to answer the question, there can be value in making it answer it across more than one session.
So you use other people’s open source code without crediting the authors or respecting their license conditions? Good for you, parasite.
I completely understand where you’re coming from, and I absolutely agree with you, genAI is copyright infringement on a weapons-grade scale. With that said, though, in my opinion, I don’t know if calling people parasites like this will really convince people, or change anything. I don’t want to tone police you, if you want to tell people to get fucked, then go ahead, but I think being a bit more sympathetic to your fellow programmers and actually trying to help them see things from our perspective might actually change some minds. Just something to think about. I don’t have all the answers, feel free to ignore me. Much love!
You are right. My apologies, and my congratulations for finding the correct “tone” to respond to me ;) The thing is, I am absolutely fed up with especially the bullshit about snake oil vendors selling LLMs as “AI”, and I am much more fed up with corporations on a large scale getting away with - since it’s for profit - what I guess must already be called theft of intellectual property.
When people then use said LLMs to “develop software”, I’m kind of convinced they are about as gone mentally as the MAGA cult and sometimes I just want to vent. However, I chose the word parasite for a reason, because it’s a parasitic way of working: they use the work of other people, which for more specific algorithms, an LLM will reproduce more or less verbatim, while causing harm to such people by basically copy-pasting such code while omitting the license statement - thereby releasing such code (if open source) into the “wild” with an illegally(*) modified license.
Considering on top the damage done to the environment by the insane energy consumption for little to no gain, people should not be using LLMs at all. Not even outside coding. This is just another way to contribute missing our climate goals by a wide margin. Wasting energy like this - basically because people are too lazy to think for themselves - actually gets people killed due to extreme weather events.
So yeah, you have a valid point, but also, I am fed up with the egocentric bullshit world that social media has created and that has culminated in what will soon be a totalitarian regime in the country that once brought peace to Europe by defeating the Nazis and doing a PROPER reeducation of the people. Hooray for going off on a tangent…
Very frequently, yes. As well as closed source code and intellectual property of all kinds. Anyone who tells you otherwise is a liar.
Ah, I guess I’ll have to question why I am lying to myself then. Don’t be a douchebag. Don’t use open source without respecting copyrights & licenses. The authors are already providing their work for free. Don’t shit on that legacy.
Programmers don’t have the luxury of using inferior toolsets.
That statement is as dumb as it is non-sensical.
Ahh right, so when I use copilot to autocomplete the creation of more tests in exactly the same style of the tests I manually created with my own conscious thought, you’re saying that it’s really just copying what someone else wrote? If you really believe that, then you clearly don’t understand how LLMs work.
I know both LLM mechanisms better than you, it would appear, and my point is not so weak that I would have to fabricate a strawman that I then claim is what you said, to proceed to argue the strawman.
Using LLMs trained on other people’s source code is parasitic behaviour and violates copyrights and licenses.
Look, I recognize that it’s possible for LLMs to produce code that is literally someone else’s copyrighted code. However, the way I use copilot is almost exclusively to autocomplete my thoughts. Like, I write enough code until it guesses what I was about to write next. If that happens to be open source code that someone else has written, then it is complete coincidence that I thought of writing that code. Not all thoughts are original.
Further, whether I should be at fault for LLM vendors who may be breaking copyright law, is like trying to make a case for me being at fault for murder because I drive a car when car manufacturers lobby to the effect that people die more.
Agreed, and I am also 100% opposed to SW patents. No matter what I wrote, if someone came up with the same idea on their own, and finds out about my implementation later, I absolutely do not expect them to credit me. In the use case you describe, I do not see a problem of using other people’s work in a license breaking way. I do however see a waste of time - you have to triple check everything an LLM spits out - and energy (ref: MS trying to buy / restart a nuclear reactor to power their LLM hardware).
If you drive a car on “autopilot” and get someone killed, you are absolutely at fault for murder. Not in the legal sense, because fuck capitalism, but absolutely in the moral sense. Also, there’s legal precedent in a different example: https://www.findlaw.com/legalblogs/criminal-defense/can-you-get-arrested-for-buying-stolen-goods/
If you unknowingly buy stolen (fenced) goods, if found out, you will have to return them to the rightful owner without getting your money back - that you would then have to try and get back from the vendor.
In the case of license agreements, you would still be participant to a license violation - and if you consider a piece of code that would be well-recognizable, just think about the following thought experiment:
Assume someone trained the LLM on some source code Disney uses for whatever. Your code gets autocompleted with that and you publish it, and Disney finds out about it. Do you honestly think that the evil motherfuckers at Disney would stop at anything short of having your head served on a silver platter?