Earlier this year, we had a case where we were given a disk image from a FortiAnalyzer box.
We were asked to extract details about an intrusion into the customer’s network from the logs, and since the logs comprised hundreds of gigabytes, it was faster to ship the entire disk to us.
In order to get to the actual log contents, we were faced with two options:
Attempt to boot the image as a VM and try to use the “official” export functionality.
Directly work with the raw log files on disk.
If you know us, you can figure that we went for option 2.
The issues we saw with option 1 were that it’s unclear how the software would behave when taken out of its usual environment (perhaps it would refuse to even start properly), plus the export feature in such products tends to be excruciatingly slow and clunky, since it’d have to transfer large amounts of data through some web service and serve the data to your browser.
So let’s see what surprises Fortinet has in store for us with their on-disk format…
Fortinet logging basics
Fortinet firewall products write multiple kinds of logs.
In our case, the two we most commonly saw were elog (event log) and tlog (traffic log) files.
Event logs mostly contain administrative events, while traffic logs have the actually interesting data such as connection information and firewall policy decisions.
The filenames contain a timestamp and the files are usually compressed, yielding names such as tlog.1701234567.log.gz (or .log.zst).
They can be found under the path /public/Logs/<FG serial number>/root/archive/<unix timestamp>/ (note: this might be specific to FortiAnalyzer).
gunzip/unzstd do quick work of the outer compression layer.
To our dismay, we were not greeted by text when opening a decompressed file – at least… not exactly:
</a>
Decompressed tlog file
</div>
As can be seen, this is a binary format containing some readable text, but the text doesn’t look quite right.
This indicates a second (lighter) layer of compression.
In the past, Fortinet offered a Java-based tool called lz4_reader that enabled you to transform binary logs into readable text-based logs.
Unfortunately, the log format has been changed over the years and the tool is no longer compatible with the modern log formats.
We were not able to find an updated version of the tool, and we also couldn’t find technical information about the binary format.
However, the fact that LZ4 was mentioned piqued our curiosity and we started experimenting with it (LZ4 is a compression algorithm).
Indeed, it was able to decompress the text correctly once we found the correct beginning/ending offset, so that’s a good start.
The only question is: How are we supposed to parse the overall binary format into meaningful records? We can’t just go around and take guesses at probable boundaries for the compressed log lines.
Down the rabbit hole
The good thing is that we didn’t only have the data partition containing the logs, but also the system partition.
That means we could “simply” reverse engineer the code that reads/writes the logs.
The application itself resided in an initial ramdisk called rootfs.gz.
Attempting to use gunzip on such files will fail, you must use unmkinitramfs in order to extract it.
Looking around the root file system, it stood out that there are around 200 shared objects on the system.
Many of them are common open source libraries, but around 40-50 are proprietary Fortinet libraries, leading us to believe that they extracted most of their functionality into libraries.
</a>
With dozens of libraries to choose from, where does one begin?
</div>
We do have some starting points that may help find the code we’re looking for:
Filenames containing elog./tlog. – perhaps they’re referenced near the logic that works with the files.
Binary records start with EC CF D0 40, though in the beginning it was not clear if/what part of this may have been a magic number.
Some library names contain the word log.
Grepping for tlog. yields results in libfazcore_sysbase.so and libcmfplugin.so.
The string in the sysbase library is actually tlog.log, so not an exact fit for our case.
However, looking around code that referenced the string and its surrounding context, we saw several interesting function calls with a logbase prefix, such as logbase_init right after a call to fopen.
Further down the call chain there were more references such as logbase_fread and logbase_get_logtype.
These functions are defined in another shared object, which is unsurprisingly called liblogbase.so.
That library became the focus for the rest of our research.
The logbase_init function turned out to be pretty boring since it only does basic structure initialization.
logbase_init_log, on the other hand, is exactly what we were looking for.
It has some very conspicuous pointer dereferences where the value is checked against 0xEC, and then another check whether the next byte plus 50 is equal to or smaller than 1.
It’s an unsigned 8-bit comparison that essentially results in the byte having to be 0xCE or 0xCF.
This condition is a perfect fit for the beginning of our binary records, EC CF!
It also answers our previous question about the magic number, which we now know to be two bytes long.
The function also has checks and logic for a bunch of other magic numbers, indicating that there are more supported log formats.
Here’s an overview of magics and format names we found:
Magic bytes
Name
EC CE / EC CF
llog v5
01 AA
tlc log
01 BB
sied log
EF / FE
clog v3
FF 01
clog v1 (chimera?)
Each format has its own function table – the function at index 3 appears to be a sort of initializer that will parse (parts of) the first record.
It allows one to see the boundaries of most of the values that make up a record and where variable-sized data is located.
The purpose of each field is not directly evident for the most part – figuring that out would require a more comprehensive analysis.
The logs we had at hand were mostly comprised of llog v5 records, plus some tlc records here and there, so those two will be described in the following sections.
Note that a single log file may very well contain different record types.
We followed a clean room approach where I reversed engineered the code and wrote specifications, and my colleague Backi implemented the parsing logic without having seen Fortinet’s proprietary code.
llog v5 format
llog v5 is a pretty straight-forward format made up of a header followed by some variable-length data and strings.
Offset
Type
Description
0
Bytes EC CF
Header magic
2
Byte
Some flag, checked for & 4 in one place
3
Byte
? unused
4
Byte
? unused, 0?
5
Byte
Length of devid string
6
Byte
Length of devname string
7
Byte
Length of vdom string
8
Short (big endian)
Entry count
10
Short (big endian)
Compressed length
12
Short (big endian)
Decompressed length
14
Int (big endian)
Unix timestamp
18
ASCII
String data: devid, devname, vdom – with lengths stated above, respectively
After the string data, there is a variable-length section containing Entry count * 2 bytes.
This is essentially an array of shorts, where each element is a pointer into the decompressed text, since it can contain multiple “lines”.
If the flag at offset 2 is non-zero when masked with 0x4, this section is twice as large, i.e., there are two arrays. We’re not sure what the purpose of the extra data is.
After the array(s), there’s the main attraction: the LZ4 compressed log line of length Compressed length.
Finally, there is some sort of info string containing logver, idseq and itime.
This is prefixed with a little endian short specifying the length of that string.
TLC format
This format is handled very differently from llog v5.
Its parser follows a declarative approach, meaning the fields to be parsed are specified in code as a struct, or more precisely, an array of structs.
Each struct describes a single field consisting of a name, an offset where to place it in the C struct that will be returned by the parser, and other metadata.
The parser (or perhaps this family of formats) appears to be called cmsg internally, and the logic for it is implemented in libstdext.so.
The header in this case is just 8 bytes: the AA 01 magic, an unknown big endian short, and a big endian int specifying the total record length (including the header). After the header we immediately get data for the individual fields.
Each field starts with a type byte.
The high nibble (type >> 4) specifies the data type and length of the field.
The low nibble appears to be a bitflag. 2 seems to be used for ASCII strings and 2|4 for data. type&1 has a special meaning that changes processing (not encountered yet).
The next byte is the field id describing the semantic meaning of this field (see below).
After these two bytes, the data that follows depends on the type.
Types
Data type
Meaning
0
Byte array prefixed with int8 length
1
Byte array prefixed with int16be length
2
Byte array prefixed with int32be length
3
int8
4
int16be
5
int32be
6
int64be
7
int128be
TLC fields
Id
Name
1
devid
2
devname
3
vdom
4
devtype
5
logtype
6
tmzone
7
fazid
8
srcip
9
unused?
10
unused?
11
num-logs
12
unzip-len
13
incr-zip
14
unzip-len-p
15
prefix
16
zbuf (LZ4-compressed log data)
17
logs
Conclusion
We were able to uncover the specifics of the new Fortinet logging formats, which enabled us to process them in our custom log parsing pipeline.
The tooling we wrote to transform the binary logs into textual logs has been published on GitHub.
There are more formats that you could potentially encounter in the wild that we haven’t covered in detail here, but perhaps our groundwork of where-to-find-what will help you.
Article Link: https://cyber.wtf/2024/08/30/parsing-fortinet-binary-firewall-logs/
1 post – 1 participant
Earlier this year, we had a case where we were given a disk image from a FortiAnalyzer box.
We were asked to extract details about an intrusion into the customer’s network from the logs, and since the logs comprised hundreds of gigabytes, it was faster to ship the entire disk to us.
In order to get to the actual log contents, we were faced with two options:
Attempt to boot the image as a VM and try to use the “official” export functionality.
Directly work with the raw log files on disk.
If you know us, you can figure that we went for option 2.
The issues we saw with option 1 were that it’s unclear how the software would behave when taken out of its usual environment (perhaps it would refuse to even start properly), plus the export feature in such products tends to be excruciatingly slow and clunky, since it’d have to transfer large amounts of data through some web service and serve the data to your browser.
So let’s see what surprises Fortinet has in store for us with their on-disk format…
Fortinet logging basics
Fortinet firewall products write multiple kinds of logs.
In our case, the two we most commonly saw were elog (event log) and tlog (traffic log) files.
Event logs mostly contain administrative events, while traffic logs have the actually interesting data such as connection information and firewall policy decisions.
The filenames contain a timestamp and the files are usually compressed, yielding names such as tlog.1701234567.log.gz (or .log.zst).
They can be found under the path /public/Logs/<FG serial number>/root/archive/<unix timestamp>/ (note: this might be specific to FortiAnalyzer).
gunzip/unzstd do quick work of the outer compression layer.
To our dismay, we were not greeted by text when opening a decompressed file – at least… not exactly:
</a>
Decompressed tlog file
</div>
As can be seen, this is a binary format containing some readable text, but the text doesn’t look quite right.
This indicates a second (lighter) layer of compression.
In the past, Fortinet offered a Java-based tool called lz4_reader that enabled you to transform binary logs into readable text-based logs.
Unfortunately, the log format has been changed over the years and the tool is no longer compatible with the modern log formats.
We were not able to find an updated version of the tool, and we also couldn’t find technical information about the binary format.
However, the fact that LZ4 was mentioned piqued our curiosity and we started experimenting with it (LZ4 is a compression algorithm).
Indeed, it was able to decompress the text correctly once we found the correct beginning/ending offset, so that’s a good start.
The only question is: How are we supposed to parse the overall binary format into meaningful records? We can’t just go around and take guesses at probable boundaries for the compressed log lines.
Down the rabbit hole
The good thing is that we didn’t only have the data partition containing the logs, but also the system partition.
That means we could “simply” reverse engineer the code that reads/writes the logs.
The application itself resided in an initial ramdisk called rootfs.gz.
Attempting to use gunzip on such files will fail, you must use unmkinitramfs in order to extract it.
Looking around the root file system, it stood out that there are around 200 shared objects on the system.
Many of them are common open source libraries, but around 40-50 are proprietary Fortinet libraries, leading us to believe that they extracted most of their functionality into libraries.
</a>
With dozens of libraries to choose from, where does one begin?
</div>
We do have some starting points that may help find the code we’re looking for:
Filenames containing elog./tlog. – perhaps they’re referenced near the logic that works with the files.
Binary records start with EC CF D0 40, though in the beginning it was not clear if/what part of this may have been a magic number.
Some library names contain the word log.
Grepping for tlog. yields results in libfazcore_sysbase.so and libcmfplugin.so.
The string in the sysbase library is actually tlog.log, so not an exact fit for our case.
However, looking around code that referenced the string and its surrounding context, we saw several interesting function calls with a logbase prefix, such as logbase_init right after a call to fopen.
Further down the call chain there were more references such as logbase_fread and logbase_get_logtype.
These functions are defined in another shared object, which is unsurprisingly called liblogbase.so.
That library became the focus for the rest of our research.
The logbase_init function turned out to be pretty boring since it only does basic structure initialization.
logbase_init_log, on the other hand, is exactly what we were looking for.
It has some very conspicuous pointer dereferences where the value is checked against 0xEC, and then another check whether the next byte plus 50 is equal to or smaller than 1.
It’s an unsigned 8-bit comparison that essentially results in the byte having to be 0xCE or 0xCF.
This condition is a perfect fit for the beginning of our binary records, EC CF!
It also answers our previous question about the magic number, which we now know to be two bytes long.
The function also has checks and logic for a bunch of other magic numbers, indicating that there are more supported log formats.
Here’s an overview of magics and format names we found:
Magic bytes
Name
EC CE / EC CF
llog v5
01 AA
tlc log
01 BB
sied log
EF / FE
clog v3
FF 01
clog v1 (chimera?)
Each format has its own function table – the function at index 3 appears to be a sort of initializer that will parse (parts of) the first record.
It allows one to see the boundaries of most of the values that make up a record and where variable-sized data is located.
The purpose of each field is not directly evident for the most part – figuring that out would require a more comprehensive analysis.
The logs we had at hand were mostly comprised of llog v5 records, plus some tlc records here and there, so those two will be described in the following sections.
Note that a single log file may very well contain different record types.
We followed a clean room approach where I reversed engineered the code and wrote specifications, and my colleague Backi implemented the parsing logic without having seen Fortinet’s proprietary code.
llog v5 format
llog v5 is a pretty straight-forward format made up of a header followed by some variable-length data and strings.
Offset
Type
Description
0
Bytes EC CF
Header magic
2
Byte
Some flag, checked for & 4 in one place
3
Byte
? unused
4
Byte
? unused, 0?
5
Byte
Length of devid string
6
Byte
Length of devname string
7
Byte
Length of vdom string
8
Short (big endian)
Entry count
10
Short (big endian)
Compressed length
12
Short (big endian)
Decompressed length
14
Int (big endian)
Unix timestamp
18
ASCII
String data: devid, devname, vdom – with lengths stated above, respectively
After the string data, there is a variable-length section containing Entry count * 2 bytes.
This is essentially an array of shorts, where each element is a pointer into the decompressed text, since it can contain multiple “lines”.
If the flag at offset 2 is non-zero when masked with 0x4, this section is twice as large, i.e., there are two arrays. We’re not sure what the purpose of the extra data is.
After the array(s), there’s the main attraction: the LZ4 compressed log line of length Compressed length.
Finally, there is some sort of info string containing logver, idseq and itime.
This is prefixed with a little endian short specifying the length of that string.
TLC format
This format is handled very differently from llog v5.
Its parser follows a declarative approach, meaning the fields to be parsed are specified in code as a struct, or more precisely, an array of structs.
Each struct describes a single field consisting of a name, an offset where to place it in the C struct that will be returned by the parser, and other metadata.
The parser (or perhaps this family of formats) appears to be called cmsg internally, and the logic for it is implemented in libstdext.so.
The header in this case is just 8 bytes: the AA 01 magic, an unknown big endian short, and a big endian int specifying the total record length (including the header). After the header we immediately get data for the individual fields.
Each field starts with a type byte.
The high nibble (type >> 4) specifies the data type and length of the field.
The low nibble appears to be a bitflag. 2 seems to be used for ASCII strings and 2|4 for data. type&1 has a special meaning that changes processing (not encountered yet).
The next byte is the field id describing the semantic meaning of this field (see below).
After these two bytes, the data that follows depends on the type.
Types
Data type
Meaning
0
Byte array prefixed with int8 length
1
Byte array prefixed with int16be length
2
Byte array prefixed with int32be length
3
int8
4
int16be
5
int32be
6
int64be
7
int128be
TLC fields
Id
Name
1
devid
2
devname
3
vdom
4
devtype
5
logtype
6
tmzone
7
fazid
8
srcip
9
unused?
10
unused?
11
num-logs
12
unzip-len
13
incr-zip
14
unzip-len-p
15
prefix
16
zbuf (LZ4-compressed log data)
17
logs
Conclusion
We were able to uncover the specifics of the new Fortinet logging formats, which enabled us to process them in our custom log parsing pipeline.
The tooling we wrote to transform the binary logs into textual logs has been published on GitHub.
There are more formats that you could potentially encounter in the wild that we haven’t covered in detail here, but perhaps our groundwork of where-to-find-what will help you.
Article Link: https://cyber.wtf/2024/08/30/parsing-fortinet-binary-firewall-logs/
1 post – 1 participant
Read full topic