Charles Engelke's Blog

October 17, 2014

Parsing BER and DER encoded ASN.1 Objects

Filed under: Uncategorized — Charles Engelke @ 10:18 am
Tags: , , , ,

[Updated October 21, 2014: Fixed error about how an indefinite length is encoded. It is not encoded as 0x00. It is encoded as 0x80. Text and code have been corrected.]

The question that likely comes to mind is, “Why?” What do BER, DER, and ASN.1 have to do with web crypto?

Some of the most dominant implementations of public key cryptography, along with important parts of the Web Cryptography API, use BER and DER encoded ASN.1 objects to serialize and deserialize keys, documents, and other important data structures. So parsing this format is a step along the way to handling those cryptographic objects. This post will create a JavaScript parser for these objects. It’s only a means to an end, and you don’t need to know all the gory details here in order to use the parser in upcoming posts. So feel free to skip this and wait for the next article about working with X.509 certificates.

Overview

ASN.1 is Abstract Syntax Notation 1, a method for describing data objects. It was created in the 1980s, is a binary format, and as the name says, it is abstract. To actually store or transfer an ASN.1 object, it must be encoded in a standard way. Distinguished Encoding Rules (DER) is the encoding used for X.509, and is an unambiguous subset of Basic Encoding Rules (BER). Those rules are specified in ITU standard X.690, but I recommend reading the Wikipedia article instead of the specification, at least first.

Since DER is a subset of BER, any valid DER encoded object is also a valid BER one, so we only need to know BER to handle both encodings. To interpret a BER encoded object, you go byte by byte, figuring out the characteristics of the object as you go:

  • The object will be in one of four classes: 0, 1, 2, or 3. 0 is the Universal Class, which defines a standard set of tags for objects. Other classes define other meanings of the tags, which can be application or context specific.
  • Every object has a tag that describes the kind of object it is. Tags are non-negative integers, and there’s no upper limit on how big they can be. Universal class tags include 1 (BOOLEAN), 2 (INTEGER), and so on.
  • Objects can be structured, meaning their value includes other BER encoded objects, or primitive. Marking this may seem redundant since the tag type implies structure or not, but that’s only true for universal class tags that have globally agreed meaning.
  • The contents of the object is a series of bytes. In most cases, the length of that sequence is explicitly specified preceding the content.

Parsing an Object

Here are the rules for getting the BER encoded information out of the byte array:

  • The first two bits of the first byte are the object’s class.
  • The third bit of the first byte tells whether it is primitive (0) or structured (1).
  • The remaining 5 bits of the first byte are the object’s tag, unless they are all 1. In that case, the next one or more bytes give the tag value. Take each byte until you encounter one with a leading 0 bit instead of a leading 1 bit. Drop the first bit from each byte, and concatenate the remaining bits. Interpret that result as an integer. (Don’t worry, examples will follow.)
  • The next byte starts defining the length of the contents. If it is 0x80, the length is unknown, and the contents immediately follow, trailed by two 0 bytes in a row. If the value of the first byte is between 0 and 128 (exclusive) then that value is the length. Otherwise, the value is 128 more than the number of bytes containing the length, which is interpreted as a big-endian integer. (Again, examples will follow.)
  • The remaining length bytes are the contents if the length is non-zero. Otherwise the length is unknown until you reach two zero bytes in a row. The contents in that case are everything up to, but not including, those two zero bytes.

Examples

Knowing the rules is good, but examples help a lot. Here are four that cover the major variations.Each one is shown in hex.

04 05 12 34 56 78 90

The first byte is 00000100 in binary. The first two bits are 00, so the class is 0 (Universal). The third bit is 0, so this is primitive. The last five bits are 00100. They’re not all 1, so they are the value of the tag: 4 in decimal form. The length is described starting in the second byte 05. That’s between 0 and 128, so that’s the length of the contents, and the contents are the next 5 bytes: 12 34 56 78 90.

30 82 02 10 04 01 56 … (for many more bytes)

The first byte is 00110000 in binary. The first two bits are 00 so the class is again 0. The third bit is 1, so it is structured. The last five bits are 10000, so the tag is 16 decimal. The next byte is 82 hex, which is 130 decimal, which is 128 + 2, the the following 2 bytes give the length. They are 02 10, which is interpreted in “big-endian” format as 2*256 + 16 = 528. The next 528 bytes, starting with 04 01 56, contain the contents.

df 82 02 05 12 34 56 78 90

The first byte is 11011111 in binary. The first two bits are 11, so this is class 3 – Private. The next bit is a 0, so this is primitive. The remaining five bits are all 1, so the actual tag starts in the second byte. The second byte has a leading one, and the third byte does not, so the tag is constructed by taking those two bytes (10000010 00000010 in binary), dropping their leading bits to get the fourteen bits 00000100000010, and interpreting this as a binary number. Thus, the tag is 258 decimal. The next byte is 05, which is less than 128, so that is the actual length of the contents. The next 5 bytes (12 34 56 78 90) are the contents.

30 80 04 03 56 78 90 00 00

The first byte, 30, is one we’ve seen before. It is universal class, structured, with tag 16. The next byte is 80, so the length is unknown at first. The contents are all the following bytes, up to (but not including) the first two sequential zero bytes. So the contents are 04 03 56 78 90, and we can figure out from the contents that the length is 5.

Those examples pretty much cover everything, so we can start on code.

berToJavaScript

This function will take a BER (or DER) encoded byte array (Uint8Array) and return a JavaScript object with fields cls (for class, which is a reserved word, an integer value), tag (integer), structured (boolean), and contents (Uint8Array). There won’t be a length field because the contents object automatically has a length property. But there will be a byteLength field, which tells how many bytes long the entire object it. It will also have a field called raw, which is the BER/DER encoded source data, included for ease of debugging.

Note: this code is absolutely not production ready! Given bad data it may try to overrun the byteArray it is given, causing an exception. Even good data may cause a problem if it includes numbers too big for JavaScript to represent exactly as a Number. A lot more data checking as it goes will need to be added to make this robust. The point here is just to get something good enough to explore things like X.509 certificates, and perhaps be a starting point for production code.

The code keeps track of the position of the next byte to examine, and extracts the fields sequentially. It’s pretty straightforward:

function berToJavaScript(byteArray) {
    "use strict";
    var result = {};
    var position = 0;

    result.cls              = getClass();
    result.structured       = getStructured();
    result.tag              = getTag();
    var length              = getLength(); // As encoded, which may be special value 0

    if (length === 0x80) {
        length = 0;
        while (byteArray[position + length] !== 0 || byteArray[position + length + 1] !== 0) {
            length += 1;
        }
        result.byteLength   = position + length + 2;
        result.contents     = byteArray.subarray(position, position + length);
    } else {
        result.byteLength   = position + length;
        result.contents     = byteArray.subarray(position, result.byteLength);
    }

    result.raw              = byteArray.subarray(0, result.byteLength); // May not be the whole input array
    return result;

    // Define the "get" functions here
}

The zero length byte possibility really adds some ugliness here, and cannot happen for DER encoded objects, so you can leave it out if that is all you need.

Each get function is pretty straightforward. These functions have to be nested inside berToJavaScript because they use the shared varables byteArray and position.

    function getClass() {
        var cls = (byteArray[position] & 0xc0) / 64;
        // Consumes no bytes
        return cls;
    }

Dividing by 64 is the same as shifting right six bits.

    function getStructured() {
        var structured = ((byteArray[0] & 0x20) === 0x20);
        // Consumes no bytes
        return structured;
    }

Getting the tag is trickier:

    function getTag() {
        var tag = byteArray[0] & 0x1f;
        position += 1;
        if (tag === 0x1f) {
            tag = 0;
            while (byteArray[position] >= 0x80) {
                tag = tag * 128 + byteArray[position] - 0x80;
                position += 1;
            }
            tag = tag * 128 + byteArray[position] - 0x80;
            position += 1;
        }
        return tag;
    }

Getting the length is similar to, but not exactly the same, as getting the tag:

    function getLength() {
        var length = 0;

        if (byteArray[position] < 0x80) {
            length = byteArray[position];
            position += 1;
        } else {
            var numberOfDigits = byteArray[position] & 0x7f;
            position += 1;
            length = 0;
            for (var i=0; i<numberOfDigits; i++) {
                length = length * 256 + byteArray[position];
                position += 1;
            }
        }
        return length;
    }

Checking It Out

With the get functions inserted inside berToJavaScript it should be possible to check the function out with the examples above, using a browser’s JavaScript console. Paste the assembled code (found at the bottom of this post) into the console to define the function, then create and try to parse each byte array:

var test1 = new Uint8Array([0x04, 0x05, 0x12, 0x34, 0x56, 0x78, 0x90]);
berToJavaScript(test1);

That yields the right result:

{cls: 0, structured: false, tag: 4, byteLength: 7, contents: Uint8Array[5]…}

The second test object needs to be a much longer byte array: 532 bytes at least (528 byte contents plus four bytes before the contents). So we create a bit byte array initialized to all zero, then fill in the first several bytes:

var test2 = new Uint8Array(532);
test2.set([0x30, 0x82, 0x02, 0x10, 0x04, 0x01, 0x56]); // Fill in the first several bytes, leave rest 0
berToJavaScript(test2);

Result is {cls: 0, structured: true, tag: 16, byteLength: 532, contents: Uint8Array[528]…}.

How about a big tag number?

var test3 = new Uint8Array([0xdf, 0x82, 0x02, 0x05, 0x12, 0x34, 0x56, 0x78, 0x90]);
berToJavaScript(test3);

Gives {cls: 3, structured: false, tag: 130, byteLength: 9, contents: Uint8Array[5]…}.

The last test has an indeterminate length:

var test4 = new Uint8Array([0x30, 0x00, 0x04, 0x03, 0x56, 0x78, 0x90, 0x00, 0x00]);
berToJavaScript(test4);

Yields {cls: 0, structured: true, tag: 16, byteLength: 9, contents: Uint8Array[5]…}.

These are hardly exhaustive tests, but give confidence that the parser should be good enough for further exploration. That’s going to happen in the next post, which will extract a public key from an X.509 certificate and verify the signature on a certificate.

Assembled Function

If you want to try this out yourself, here’s the code from above put together for ease of cutting and pasting:

function berToJavaScript(byteArray) {
    "use strict";
    var result = {};
    var position = 0;

    result.cls              = getClass();
    result.structured       = getStructured();
    result.tag              = getTag();
    var length              = getLength(); // As encoded, which may be special value 0

    if (length === 0x80) {
        length = 0;
        while (byteArray[position + length] !== 0 || byteArray[position + length + 1] !== 0) {
            length += 1;
        }
        result.byteLength   = position + length + 2;
        result.contents     = byteArray.subarray(position, position + length);
    } else {
        result.byteLength   = position + length;
        result.contents     = byteArray.subarray(position, result.byteLength);
    }

    result.raw              = byteArray.subarray(0, result.byteLength); // May not be the whole input array
    return result;

    function getClass() {
        var cls = (byteArray[position] & 0xc0) / 64;
        // Consumes no bytes
        return cls;
    }

    function getStructured() {
        var structured = ((byteArray[0] & 0x20) === 0x20);
        // Consumes no bytes
        return structured;
    }

    function getTag() {
        var tag = byteArray[0] & 0x1f;
        position += 1;
        if (tag === 0x1f) {
            tag = 0;
            while (byteArray[position] >= 0x80) {
                tag = tag * 128 + byteArray[position] - 0x80;
                position += 1;
            }
            tag = tag * 128 + byteArray[position] - 0x80;
            position += 1;
        }
        return tag;
    }

    function getLength() {
        var length = 0;

        if (byteArray[position] < 0x80) {
            length = byteArray[position];
            position += 1;
        } else {
            var numberOfDigits = byteArray[position] & 0x7f;
            position += 1;
            length = 0;
            for (var i=0; i<numberOfDigits; i++) {
                length = length * 256 + byteArray[position];
                position += 1;
            }
        }
        return length;
    }
}
Advertisement

2 Comments

  1. […] For the CERTIFICATE type, that is a base-64 encoding of the certificate in ASN.1 DER format. The last post went into a lot of detail about this encoding, and included a JavaScript function, berToJavaScript, […]

    Pingback by Web Crypto and X.509 Certificates | Charles Engelke's Blog — October 21, 2014 @ 1:56 pm

  2. Well explained 🙂

    Comment by Raghu — January 14, 2015 @ 5:47 am


RSS feed for comments on this post.

Create a free website or blog at WordPress.com.

%d bloggers like this: