An AI Security Engineer fixing your code issues
New York, New York
© 2024 Almanax
In this post, we compare two tools for smart contract vulnerability detection:
Using the Code4rena bug bounty on the Beebots.sol contract as our test case, we explore the strengths and weaknesses of each tool. We consider the approved findings from the Code4rena competition as the ground truth and use them to benchmark each tool's performance.
Static analysis tools (testing methods that analyze computer programs without executing them) like Slither have been instrumental in Solidity development, providing developers with rule-based detection of common vulnerabilities. However, these tools operate within predefined scopes and struggle with false positives and functional vulnerabilities, whereas the ALMX-1 model brings a new dimension to smart contract security by combining pattern recognition with contextual understanding–changing the industry's approach to security. ALMX-1 improves:
The Code4rena findings highlight several issues in the Beebots.sol contract: weaknesses in randomness generation, signature malleability, and incorrect implementation of ERC-721 Enumerable functions. These findings provide a benchmark against which we assess Slither and ALMX-1.
The approach between the compared tools differs significantly.
To improve user experience when reviewing findings the model produced, Almanax web interface hides, by default, findings marked as “Low Certainty”. If a user wants to dive deeper into a larger pool of predictions, they can toggle and perform a deeper inspection, prioritizing “High” and “Medium” findings:
The ground truth flags a weak randomness generation issue ([M-01] Bad Randomness) related to the function `randomIndex()`. Specifically, the values passed to `keccak256` are accessible to any miner (`nonce`, `msg.sender`, `block.difficulty`, `block.timestamp`).
Slither
Slither’s weak PRNG detector correctly calls out the bad practice with a “HIGH” severity status:
INFO:Detectors:
Beebots.randomIndex() (Beebots.sol#315-342) uses a weak PRNG: "index = uint256(keccak256(bytes)(abi.encodePacked(nonce,msg.sender,block.difficulty,block.timestamp))) % totalSize (Beebots.sol#320)"
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation#weak-PRNG
ALMX-1
ALMX-1 also captures the vulnerability, assigning severity “Medium” and explaining the root of the problem in clear language:
Overall, ALMX-1's approach is more comprehensive, offering not just detection but context and guidance, which can be more useful for developers and security teams not just looking for an issue, but how to address an issue.
The Code4rena reviewers point out the `verify` function has the potential for signature malleability due to the reliance on EVM’s `ecrecover`, which allows replay attacks. This is flagged as a High Severity finding [H-01]
Slither
Does not flag any problem with the verify function.
ALMX-1
ALMX-1 flags the signature malleability problem:
However, while the human auditors point out the problem is mitigated in the caller function (making this only a problem if any other logic is changed in the contract), the model missed the mitigation in the caller function, which reduces the risk. This highlights the need for contextual oversight to distinguish between actual vulnerabilities and mitigated risks.
The ALMX-1 model does not capture this detail and hence the finding could be considered a False Positive, depending on how strict the classification is. We’ve been working on a new model which is set to be released this month with built-in contextual oversight, and capable of detecting and dismissing issues like this.
The `tokenByIndex()` has several issues mentioned:
Slither
Slither did not flag any issues with the tokenByIndex() function. While Slither can check for ERC token conformity, it does not specifically detect logical errors in the implementation of token functions, such as boundary conditions in this case.
ALMX-1
Captured [H-00], correctly identifying the spec mismatch in the implementation. However, it missed [H-04] and [L-02], capturing only one of the high-severity issues. While this is an improvement over Slither’s zero findings in this category, it underscores the need for further refinement in capturing all relevant vulnerabilities.
While this is an extremely hard problem to solve (even top human experts miss things), we’ve been obtaining promising results with the new models and we have a good understanding of how to significantly improve detection performance.
Conclusion
Slither is capable of detecting a broad range of issues, primarily low-severity ones, but its reliance on predefined vulnerability classes limits its coverage to machine-auditable bugs—which are estimated to account for only 20% of exploitable vulnerabilities.
In contrast, ALMX-1 combines breadth and depth, identifying vulnerabilities beyond the scope of static analyzers, including functional and business logic flaws.
ALMX-1 stands out with its clear, plain-language explanations that make its findings easier to understand and act on, saving developers time and effort during reviews. While both tools may produce false positive predictions, Slither’s tendency to flag a large number of irrelevant issues - with a False Positive Rate that can be as high as 95% for detectable vulnerabilities - can be a major obstacle to efficiency.
The structured, user-friendly output of ALMX-1—hiding low-certainty findings by default—stands in contrast to Slither's comprehensive but unfiltered results, which can be difficult for less experienced developers to navigate.
We’re running a beta with some of the top security experts in the world.
Contact us here to get access to the ALMX-1 model.