Solidity compiler medium-high risk vulnerability: Analysis of head overflow in ABIv2-Reencoding

Eocene | Security
6 min readMay 12, 2023

--

## Overview

This article provides a detailed analysis of the vulnerability issues caused by the Solidity compiler (0.5.8<= version <0.8.16) in the ABIv2 reencoding process due to the error handling of fixed-length `uint` and `bytes32` type arrays, and proposes relevant solutions and avoidance measures.

## Vulnerability details

The ABI encoding format is a standard encoding method used when users or contracts call contracts to pass parameters during function calls. For details, please refer to Solidity’s official description of ABI encoding.

In the process of contract development, data that needs to be obtained from the `calldata` data passed by users or other contracts may be forwarded or emitted after being obtained. Due to the fact that all opcode operations of the evm virtual machine are based on `memory, stack and storage`, Solidity will encode the data in `calldata` according to the new order according to the ABI format when it involves operations that require data to be ABI encoded and store it in `memory`.

The process itself does not have major logical problems, but when combined with Solidity’s cleanup mechanism, due to the omissions in the Solidity compiler code itself, vulnerabilities exist.

According to the ABI encoding rules, after removing the function selector, the ABI-encoded data is divided into two parts: head and tail. When the data format is a fixed-length `uint` or `bytes32` array, ABI will store all data of this type in the head part. And Solidity’s implementation of cleanup mechanism in memory is to set the next index memory to empty after the current index memory is used to prevent dirty data from affecting memory usage when using memory for subsequent indexes. Moreover, when Solidity encodes a set of parameter data using ABI encoding, it is encoded from left to right!!!

To facilitate exploration of the vulnerability principle later, consider contract code in the following form:”

contract Eocene {
event VerifyABI( bytes[],uint[2]);
function verifyABI(bytes[] calldata a,uint[2] calldata b) public {
emit VerifyABI(a, b); // a,b will be stored on chain after encoding
}
}

The function of the verifyABI function in the contract `Eocene` is simply to emit the variable-length `bytes[] a` and fixed-length `uint[2] b` in the function parameters.

Note that `event VerifyABI( bytes[],uint[2]);` will also trigger ABI encoding. Here, the parameters `a,b` will be encoded into ABI format before being stored on the chain.

We compiled the contract code using Solidity version v0.8.14, deployed it through remix, and passed in `verifyABI([‘0xaaaaaa’,’0xbbbbbb’],[0x11111,0x22222])`.

First, let’s take a look at the correct encoding format for `verifyABI([‘0xaaaaaa’,’0xbbbbbb’],[0x11111,0x22222])`:

0x52cd1a9c // bytes4(sha3("verify(btyes[],uint[2])"))
0000000000000000000000000000000000000000000000000000000000000060 // index of a
0000000000000000000000000000000000000000000000000000000000011111 // b[0]
0000000000000000000000000000000000000000000000000000000000022222 // b[1]
0000000000000000000000000000000000000000000000000000000000000002 // length of a
0000000000000000000000000000000000000000000000000000000000000040 // index of a[0]
0000000000000000000000000000000000000000000000000000000000000080 // index of a[1]
0000000000000000000000000000000000000000000000000000000000000003 // length of a[0]
aaaaaa0000000000000000000000000000000000000000000000000000000000 // a[0]
0000000000000000000000000000000000000000000000000000000000000003 // length of a[1]
bbbbbb0000000000000000000000000000000000000000000000000000000000 // a[1]

If the Solidity compiler is working properly, when the parameters `a,b` are recorded on the chain by the event event, the data format should be the same as what we sent. Let’s try calling the contract and check the log on the chain. If you want to compare it yourself, you can check out this TX.

After a successful call, the `VerifyABI()` event is recorded as follows:

0000000000000000000000000000000000000000000000000000000000000060 // index of a
0000000000000000000000000000000000000000000000000000000000011111 // b[0]
0000000000000000000000000000000000000000000000000000000000022222 // b[1]
0000000000000000000000000000000000000000000000000000000000000000 // length of a ??why become 0 ??
0000000000000000000000000000000000000000000000000000000000000040 // index of a[0]
0000000000000000000000000000000000000000000000000000000000000080 // index of a[1]
0000000000000000000000000000000000000000000000000000000000000003 // length of a[0]
aaaaaa0000000000000000000000000000000000000000000000000000000000 // a[0]
0000000000000000000000000000000000000000000000000000000000000003 // length of a[1]
bbbbbb0000000000000000000000000000000000000000000000000000000000 // a[1]

Shocked, following `b[1]`, the value that stores the length of parameter a was incorrectly deleted!!

Why did this happen?

As we mentioned earlier, when Solidity encounters a series of parameters that require ABI encoding, the order in which the parameters are generated is from left to right. The specific encoding logic for `a,b` is as follows:

  1. Solidity first encodes `a`. According to the encoding rules, the index of `a` is placed at the head, and the length and specific value of a are stored at the tail.
  2. Process data `b`. Because the data type of `b` is in the format of `uint[2]`, the specific value of data is stored in the head part. However, due to Solidity’s own cleanup mechanism, after storing `b[1]` in memory, it sets the value of the memory address immediately following it (used to store the length of element a) to 0.
  3. The ABI encoding operation ends, and incorrectly encoded data is stored on-chain, resulting in vulnerability SOL-2022–6.

At the source code level, the specific error logic is also very obvious. When Solidity needs to get fixed-length bytes32 or uint array data from calldata to memory, it always sets the value of the memory index immediately following the data to 0 after the data is copied. And because ABI encoding has head and tail parts, and the encoding order is also from left to right, this leads to the existence of vulnerabilities.

The Solidity compiled code for the specific vulnerability is as follows.

When the source data storage location is `Calldata` and the source data type is ByteArray, String, or the original array base type is `uint or bytes32`, it enters `ABIFunctions::abiEncodingFunctionCalldataArrayWithoutCleanup()`.

After entering the function, it first judges whether the source data is a fixed-length array through `fromArrayType.isDynamicallySized()`.

Only fixed-length arrays meet the vulnerability triggering conditions.

The result of `isByteArrayOrString()` judgment is passed to `YulUtilFunctions::copyToMemoryFunction()`, which determines whether to perform cleanup on the index position after `calldatacopy` operation according to the judgment result.

The above constraints combined, only arrays of `uint` or `bytes32` with fixed formats in calldata can trigger vulnerabilities when copied to memory. That is, the reason for the constraints that trigger vulnerabilities.

Due to the fact that ABI always encodes parameters from left to right, considering the exploitation conditions of the vulnerability, we must understand that there must be a dynamic length type of data stored in the tail part of the ABI encoding format before the fixed-length `uint` and `bytes32` arrays, and the fixed-length `uint` or `bytes32` arrays must be located at the last position of the parameters to be encoded.

The reason is obvious.

- If the fixed-length data is not located in the last position of the parameters to be encoded, then setting 0 to the next memory position will have no effect because the next encoding parameter will overwrite that position.

- If there is no data that needs to be stored in the tail part before the fixed-length data, then it doesn’t matter if the next memory position is set to 0 because that position is not used by ABI encoding.

In addition, it should be noted that all implicit or explicit ABI operations and all Tuples (a group of data) that conform to the format will be affected by this vulnerability. The specific operations involved are as follows:

event
error
abi.encode*
returns // return of a function
struct // user defined struct
all external call

## Solution

  1. When there are operations affected by the vulnerability in the contract code, ensure that the last parameter is not a fixed-length `uint` or `bytes32` array.
  2. Use a Solidity compiler that is not affected by the vulnerability.
  3. Seek help from professional security personnel to conduct professional security audits of contracts.

## About us

At Eocene Research, we provide the insights of intentions and security behind everything you know or don’t know of blockchain, and empower every individual and organization to answer complex questions we hadn’t even dreamed of back then.

Learn more: Website | Medium | Twitter

--

--

Eocene | Security
Eocene | Security

Written by Eocene | Security

Smart contract audit, attack analysis, web3 security research, on-chain monitor and alert. Powered by Eceone Research.