By Artem Dinaburg and Peter Goodman

(Would you get up and throw it away?)
[sing to the tune of The Beatles – With A Little Help From My Friends]

Here’s a riddle: when new GPUs are constantly being produced, product cycles are ~18-24 months long, and each cycle doubles GPU power (per Huang’s Law), what happens to 10-year-old server GPUs? We’ve asked around and no one can answer; we do know that they get kicked out of Google Cloud and Microsoft Azure (but not AWS), and they’re useless for machine learning, with so many new and exponentially more powerful versions available.

Surely these older GPUs—which are still racked, installed, and functional, with their capital costs already paid—aren’t just going to be thrown in the dump… are they?

Please don’t do that! Here at Trail of Bits, we want to use old GPUs—even those past their official end of life—to solve interesting computer security and program analysis problems. If you’re planning to dispose of a rack of old GPUs, don’t! We’d love to chat about extending the useful life of your capital investment.

How to put old GPUs to use

Below are some of the ideas we’ve been working on and would like to pursue further.

Fuzzing embedded platforms. GPUs are a natural fit for the fuzzing problem, since the fuzzing is embarrassingly parallel and there are natural workarounds for divergence issues. GPU fuzzing is most effective in the embedded space, since one needs to write an emulator anyway. It makes sense, then, to write a fast emulator instead of a slow one. Our prototype GPU fuzzer shows that the concept is sound, but it has limitations that make it difficult to use for real-world fuzzing. We would like to fix this and are working on some ideas (avoiding static translation, applying performance lessons from our fast DBT tools, etc.) to make emulator creation practical.

Stochastic optimization. Stochastic optimizers, like Stanford’s STOKE, search through a large set of potential machine instructions and look for novel, non-obvious transformations that improve program performance. A key bottleneck to this approach is search throughput, which we believe could be done much faster on GPUs.

SMT solving. SMT has numerous uses in optimization and security, but is resistant to parallelism at the algorithm level. Two specific instances of the SMT problem can benefit from simple and effective GPU acceleration. The first is floating point. Prior research has used brute-force search with CPUs to solve floating-point SMT on CPUs, which we’d like to extend with GPU acceleration. Second, GPUs can brute-force-search traditional integer SMT theories that are resistant to normal algorithms. GPU-based search would occur in parallel with other approaches and be a strict improvement over the current state of the art.

Reachability queries. Another key primitive of program analysis is reachability queries; that is, given a (very large) program, can I reach line X from line Y, and if so, what are the path(s)? This problem typically runs in O(n3) time and is frequently a bottleneck in real program analysis. We believe that we can use GPU computation to make even complex reachability queries more practical.

Datalog acceleration. Datalog has found new life as a language to enable static analysis of large programs via tools like Souffle. Recent research has shown promise in accelerating datalog operations via GPUs, which should allow better, more scalable static analysis tools.

API-level translation. This is not a use of GPUs, but is related to GPU programming: We believe that we can use MLIR and Trail of Bits’ VAST to transparently compile code across API layers. That is, source code would stay on one API (e.g., from CUDA), but during compilation, the compiler would use MLIR dialect translations to transform the program from CUDA semantics to OpenCL semantics. We’d like to create a prototype to see if this kind of compilation is feasible.

Help us save old GPUs!

We’ve been thinking about these problems for a while, and would like to write some practical proof-of-concept software to solve them. To do that, we are seeking research funding and access to spare GPU capacity. Importantly, we do not need access to the latest and greatest GPUs; hardware that will soon be end-of-lifed or that is no longer viable for AI/ML applications suits us just fine. If you’d like to help, let us know! We have a history of collaborating with universities on similar research challenges and would be eager to continue such partnerships.

Article Link: What would you do with that old GPU? | Trail of Bits Blog

1 post – 1 participant

Read full topic

​By Artem Dinaburg and Peter Goodman
(Would you get up and throw it away?)
[sing to the tune of The Beatles – With A Little Help From My Friends]
Here’s a riddle: when new GPUs are constantly being produced, product cycles are ~18-24 months long, and each cycle doubles GPU power (per Huang’s Law), what happens to 10-year-old server GPUs? We’ve asked around and no one can answer; we do know that they get kicked out of Google Cloud and Microsoft Azure (but not AWS), and they’re useless for machine learning, with so many new and exponentially more powerful versions available.
Surely these older GPUs—which are still racked, installed, and functional, with their capital costs already paid—aren’t just going to be thrown in the dump… are they?
Please don’t do that! Here at Trail of Bits, we want to use old GPUs—even those past their official end of life—to solve interesting computer security and program analysis problems. If you’re planning to dispose of a rack of old GPUs, don’t! We’d love to chat about extending the useful life of your capital investment.
How to put old GPUs to use
Below are some of the ideas we’ve been working on and would like to pursue further.
Fuzzing embedded platforms. GPUs are a natural fit for the fuzzing problem, since the fuzzing is embarrassingly parallel and there are natural workarounds for divergence issues. GPU fuzzing is most effective in the embedded space, since one needs to write an emulator anyway. It makes sense, then, to write a fast emulator instead of a slow one. Our prototype GPU fuzzer shows that the concept is sound, but it has limitations that make it difficult to use for real-world fuzzing. We would like to fix this and are working on some ideas (avoiding static translation, applying performance lessons from our fast DBT tools, etc.) to make emulator creation practical.
Stochastic optimization. Stochastic optimizers, like Stanford’s STOKE, search through a large set of potential machine instructions and look for novel, non-obvious transformations that improve program performance. A key bottleneck to this approach is search throughput, which we believe could be done much faster on GPUs.
SMT solving. SMT has numerous uses in optimization and security, but is resistant to parallelism at the algorithm level. Two specific instances of the SMT problem can benefit from simple and effective GPU acceleration. The first is floating point. Prior research has used brute-force search with CPUs to solve floating-point SMT on CPUs, which we’d like to extend with GPU acceleration. Second, GPUs can brute-force-search traditional integer SMT theories that are resistant to normal algorithms. GPU-based search would occur in parallel with other approaches and be a strict improvement over the current state of the art.
Reachability queries. Another key primitive of program analysis is reachability queries; that is, given a (very large) program, can I reach line X from line Y, and if so, what are the path(s)? This problem typically runs in O(n3) time and is frequently a bottleneck in real program analysis. We believe that we can use GPU computation to make even complex reachability queries more practical.
Datalog acceleration. Datalog has found new life as a language to enable static analysis of large programs via tools like Souffle. Recent research has shown promise in accelerating datalog operations via GPUs, which should allow better, more scalable static analysis tools.
API-level translation. This is not a use of GPUs, but is related to GPU programming: We believe that we can use MLIR and Trail of Bits’ VAST to transparently compile code across API layers. That is, source code would stay on one API (e.g., from CUDA), but during compilation, the compiler would use MLIR dialect translations to transform the program from CUDA semantics to OpenCL semantics. We’d like to create a prototype to see if this kind of compilation is feasible.
Help us save old GPUs!
We’ve been thinking about these problems for a while, and would like to write some practical proof-of-concept software to solve them. To do that, we are seeking research funding and access to spare GPU capacity. Importantly, we do not need access to the latest and greatest GPUs; hardware that will soon be end-of-lifed or that is no longer viable for AI/ML applications suits us just fine. If you’d like to help, let us know! We have a history of collaborating with universities on similar research challenges and would be eager to continue such partnerships.
Article Link: What would you do with that old GPU? | Trail of Bits Blog
1 post – 1 participant
Read full topic