October 21, 2014

Web Crypto and X.509 Certificates

Filed under: Uncategorized — Charles Engelke @ 1:55 pm
Tags: asn.1, ber, der, webcrypto, x.509

If you are going to use cryptography in the browser, there’s a good chance you will want to deal with X.509 certificates. This post is going to get started by using the Web Cryptography API to do two operations on certificates:

Import a public key from an X.509 certificate
Verify the certificate authority (CA) signature on an X.509 certificate

To keep it simple, the example used will be a root CA certificate because those are self-signed. That means we won’t need to get the public key out of one certificate (the CA’s) and use that to verify the signature on another certificate. Instead, we use the same certificate for both. But the concepts and code in this post would be the same for a non-root certificate.

Note: really validating a certificate signature generally requires validating the signatures on a chain of certificates, not just one: the certificate in question, the certificate used to sign it, the certificate used to sign that one, and so on, until you reach the self-signed root certificate. The root certificate has to be verified “out of band” because the chain has to end sometime. Again, this uses the same ideas as here, just repeated as often as needed.

The example will reference the latest current draft of the Web Cryptography API, and RFC 5280, the IETF definition of X.509 certificates. The self-signed root certificate we will use is the VeriSign Universal Root Certification Authority certificate, because VeriSign is a major CA that intends to begin using this as the basis of issued certificates and this certificate should be available on almost any PC with a web browser. Just in case it isn’t, you can get a copy directly from the CA.

Instead of creating a test web page with sample code, all the examples in this post will be made from the browser’s developer console. Google Chrome 38 was used to write it, but it should work in any browser that supports the Web Cryptography API.

Certificate Formats

If you look at the downloaded certificate in a text editor, you should see text starting with:

-----BEGIN CERTIFICATE-----
MIIEuTCCA6GgAwIBAgIQQBrEZCGzEyEDDrvkEhrFHTANBgkqhkiG9w0BAQsFADCB

and ending with:

7M2CYfE45k+XmCpajQ==
-----END CERTIFICATE-----

This is known as PEM format (for Privacy-enhanced Electronic Mail). PEM files can have one or more cryptographic objects of various types in them, delineated with BEGIN and END lines. In between the lines is a base 64 encoded version of the object in a closer-to-native format. For the CERTIFICATE type, that is a base-64 encoding of the certificate in ASN.1 DER format. The last post went into a lot of detail about this encoding, and included a JavaScript function, berToJavaScript, to convert from that binary format to a JavaScript object.

So the first step is going to be base-64 decoding the certificate into a byte array (Uint8Array) for further processing. Get the base-64 encoded part of the PEM file (the stuff between the BEGIN and END lines) into a JavaScript string encoded. It’s easier if you first put it in a text editor and concatenate it into one line. Then just do a simple variable assignment:

var encoded = "MIIEuTCCA6 (rest of string here) XmCpajQ==";

Decode that string using the standard window.atob function, and then copy the bytes in the resulting binary string to a byte array:

var decoded = window.atob(encoded);
var der = new Uint8Array(decoded.length);
for (var i=0; i<decoded.length; i++) {
    der[i] = decoded.charCodeAt(i);
}

Parsing BER/DER

Next comes the hard part: parsing the encoded object. The last post has a function called berToJavaScript that takes a byte array and returns a JavaScript object describing the BER or DER encoded object at the start of the byte array. Copy and paste that function in the browser’s JavaScript console. It returns a JavaScript object with the following fields:

cls – the ASN.1 class of the object. An object’s class controls how the tag field (below) is interpreted. Objects of class 0 (Universal class) have standard tags, others depend on the application or context. The possible values are integers 0, 1, 2, or 3.
tag – the ASN.1 object type. The standard tags can be found in this Wikipedia article. The values are non-negative integers.
structured – a boolean specifying whether this is a structured object or not. Structured objects’ values are themselves encoded ASN.1 objects, one after another. Other objects are called primitive, and interpreting their values depends on the tag and class.
byteLength – the number of bytes the entire encoded object takes.
contents – a byte array containing the object’s value.
raw – the original byte array being parsed.

So we can parse the DER encoded object with:

var parsedCert = berToJavaScript(der);

The returned value is:

{
    cls: 0,
    tag: 16,
    structured: true,
    byteLength: 1213,
    contents: Uint8Array[1209]..., // Actual values skipped here
    raw: Uint8Array[1213]...       // Again, values are skipped
}

So this is Universal class (0), which means the tag value of 16 has a standard meaning: SEQUENCE or SEQUENCE OF. It’s structured (which makes sense for a sequence). Actually parsing the certificate enough to extract the public key and verify the signature on it requires going deeper into the contents.

Parsing an X.509 Certificate

RFC 5280 defines the logical structure of the certificate. It starts with the basic certificate fields:

   Certificate  ::=  SEQUENCE  {
        tbsCertificate       TBSCertificate,
        signatureAlgorithm   AlgorithmIdentifier,
        signatureValue       BIT STRING  }

Based on our berToJavaScript result for the entire data structure, we can see we are on the right track. It had tag 16 of universal class, which matches the SEQUENCE shown here. Here’s a simple parser for a certificate:

function parseCertificate(byteArray) {
    var asn1 = berToJavaScript(byteArray);
    if (asn1.cls !== 0 || asn1.tag !== 16 || !asn1.structured) {
        throw new Error("This can't be an X.509 certificate. Wrong data type.");
    }

    var cert = {asn1: asn1};  // Include the raw parser result for debugging
    var pieces = berListToJavaScript(asn1.contents);
    if (pieces.length !== 3) {
        throw new Error("Certificate contains more than the three specified children.");
    }

    cert.tbsCertificate     = parseTBSCertificate(pieces[0]);
    cert.signatureAlgorithm = parseSignatureAlgorithm(pieces[1]);
    cert.signatureValue     = parseSignatureValue(pieces[2]);

    return cert;
}

The berListToJavaScript function reference above is pretty simple: start parsing at the beginning of the array, then at the first byte following the first result, and so on, until the byte array is consumed, returning an array containing each object:

function berListToJavaScript(byteArray) {
    var result = new Array();
    var nextPosition = 0;
    while (nextPosition < byteArray.length) {
        var nextPiece = berToJavaScript(byteArray.subarray(nextPosition));
        result.push(nextPiece);
        nextPosition += nextPiece.byteLength;
    }
    return result;
}

Now we have to write three parsers, one for each logical part. Since we’ve already converted the encoded objects to JavaScript objects, we will use them as input to the parsers. We will do that in reverse order, because that puts the simplest parsers first. The RFC says that the SignatureValue is a BIT STRING, so…

function parseSignatureValue(asn1) {
    if (asn1.cls !== 0 || asn1.tag !== 3 || asn1.structured) {
        throw new Error("Bad signature value. Not a BIT STRING.");
    }
    var sig = {asn1: asn1};   // Useful for debugging
    sig.bits = berBitStringValue(asn1.contents);
    return sig;
}

BIT STRING is a standard type that is pretty simple to parse. The contents consist of an initial byte giving the number of bits to ignore, then a byte array containing all the bits. For example, the bit string 10110 is five bits long, so any byte containing it has three more bits than the actual bit string. So this is encoded in hex as 03 b0. That means ignore the last three bits in the byte string b0 (which is 10110000 in binary).

function berBitStringValue(byteArray) {
    return {
        unusedBits: byteArray[0],
        bytes: byteArray.subarray(1)
    };
}

What about the SignatureAlgorithm? The RFC says it is an AlgorithmIdentifier, which is:

   AlgorithmIdentifier  ::=  SEQUENCE  {
        algorithm               OBJECT IDENTIFIER,
        parameters              ANY DEFINED BY algorithm OPTIONAL  }

Which leads to:

var parseSignatureAlgorithm = parseAlgorithmIdentifier;
function parseAlgorithmIdentifier(asn1) {
    if (asn1.cls !== 0 || asn1.tag !== 16 || !asn1.structured) {
        throw new Error("Bad algorithm identifier. Not a SEQUENCE.");
    }
    var alg = {asn1: asn1};
    var pieces = berListToJavaScript(asn1.contents);
    if (pieces.length > 2) {
        throw new Error("Bad algorithm identifier. Contains too many child objects.");
    }
    var encodedAlgorithm = pieces[0];
    if (encodedAlgorithm.cls !== 0 || encodedAlgorithm.tag !== 6 || encodedAlgorithm.structured) {
        throw new Error("Bad algorithm identifier. Does not begin with an OBJECT IDENTIFIER.");
    }
    alg.algorithm = berObjectIdentifierValue(encodedAlgorithm.contents);
    if (pieces.length === 2) {
        alg.parameters = {asn1: pieces[1]}; // Don't need this now, so not parsing it
    } else {
        alg.parameters = null;  // It is optional
    }
    return alg;
}

The parameters vary according to the algorithm. The RSA algorithms we will be working with don’t use parameters, so this skips really parsing them fully.

Object Identifiers (OIDs) are a standard type. Their values are essentially sequences of non-negative integers representing different kinds of objects. A common way of writing them is as a list of integers with periods between them. The first two integers are taken from the first byte: the integer division of the first byte by 40, and the remainder of that division. A common first byte is 2a (hex) which is decimal 42, giving initial two integers as 1.2.

The remaining integers are represented as lists of bytes, where all the bytes except the last ones have leading 1 bits. The leading bits are dropped and the remaining bits interpreted as a binary integer. A common hex value is 86 f7 0d. The first two bytes have leading 1 bits and the third does not, so we first convert the hex to binary (10000110 11110111 00001101), then drop the leading bits (0000110 1110111 0001101) and converting that to decimal (113549).

The list of integers is interpreted as a hierarchical tree. The first integer is the master organization for the OID, which can assign the second integer values to member organizations, and so on. For example, the OID for an RSA signature with SHA-1 (a very common one) is 1.2.840.113549.1.1.5. How do you figure these out? Similar to the way DNS lookups work. The leading 1 means this is controlled by the International Standards Organization (ISO), so look up how ISO assigns the next integer. That 2 means ISO has assigned it to a member body. The 840 is the United States, which assigned 113549 to RSA Data Security, Inc. (RSADSI). RSA created the various standards we are using, they assigned 1 to their PKCS family of specifications, which assigned 1 to the PKCS-1 specification, which assigned 5 to the algorithm RSA encryption with SHA-1 hashing.

Or you can just Google 1.2.840.113549.1.1.5 to find a helpful web site that interprets these for you. Based on what we’ve already seen in the API, we will only be supporting that option and 1.2.840.113549.1.1.11 (RSA with SHA-256) for now.

Here’s the code for getting OBJECT IDENTIFIERs:

function berObjectIdentifierValue(byteArray) {
    var oid = Math.floor(byteArray[0] / 40) + "." + byteArray[0] % 40;
    var position = 1;
    while(position < byteArray.length) {
        var nextInteger = 0;
        while (byteArray[position] >= 0x80) {
            nextInteger = nextInteger * 0x80 + (byteArray[position] & 0x7f);
            position += 1;
        }
        nextInteger = nextInteger * 0x80 + byteArray[position];
        position += 1;
        oid += "." + nextInteger;
    }
    return oid;
}

Now for the biggest piece, the TBSCertificate. The RFC gives the definition:

   TBSCertificate  ::=  SEQUENCE  {
        version         [0]  EXPLICIT Version DEFAULT v1,
        serialNumber         CertificateSerialNumber,
        signature            AlgorithmIdentifier,
        issuer               Name,
        validity             Validity,
        subject              Name,
        subjectPublicKeyInfo SubjectPublicKeyInfo,
        issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
        subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
        extensions      [3]  EXPLICIT Extensions OPTIONAL
        }

We can continue down the path as before, defining parsers for each piece, but right now we only care about the subjectPublicKeyInfo (because we want to import that into a CryptoKey object to verify a signature) and signature (which is actually the algorithm used for the signature, which we need to know to verify a signature). So we won’t extend the parsing of any of the other pieces here:

function parseTBSCertificate(asn1) {
    if (asn1.cls !== 0 || asn1.tag !== 16 || !asn1.structured) {
        throw new Error("This can't be a TBSCertificate. Wrong data type.");
    }
    var tbs = {asn1: asn1};  // Include the raw parser result for debugging
    var pieces = berListToJavaScript(asn1.contents);
    if (pieces.length < 7) {
        throw new Error("Bad TBS Certificate. There are fewer than the seven required children.");
    }
    tbs.version = pieces[0];
    tbs.serialNumber = pieces[1];
    tbs.signature = parseAlgorithmIdentifier(pieces[2]);
    tbs.issuer = pieces[3];
    tbs.validity = pieces[4];
    tbs.subject = pieces[5];
    tbs.subjectPublicKeyInfo = parseSubjectPublicKeyInfo(pieces[6]);
    return tbs;  // Ignore optional fields for now
}

We’re almost there. Parsing the SubjectPublicKeyInfo is easy with what’s already been written. The RFC says its structure is:

   SubjectPublicKeyInfo  ::=  SEQUENCE  {
        algorithm            AlgorithmIdentifier,
        subjectPublicKey     BIT STRING  }

So the parser is just:

function parseSubjectPublicKeyInfo(asn1) {
    if (asn1.cls !== 0 || asn1.tag !== 16 || !asn1.structured) {
        throw new Error("Bad SPKI. Not a SEQUENCE.");
    }
    var spki = {asn1: asn1};
    var pieces = berListToJavaScript(asn1.contents);
    if (pieces.length !== 2) {
        throw new Error("Bad SubjectPublicKeyInfo. Wrong number of child objects.");
    }
    spki.algorithm = parseAlgorithmIdentifier(pieces[0]);
    spki.bits = berBitStringValue(pieces[1].contents);
    return spki;
}

Okay, the parser is putting in enough pieces to do the cryptography we want. We parse the certificate with:

var certificate = parseCertificate(der);

If all the code above is right, and was correctly pasted into the console, this should return an object representing the certificate.

Cryptography

The first thing we need to do is get the public key into a CryptoKey object. After all this build-up, it turns out to be quite easy.

var publicKey;

var alg = certificate.tbsCertificate.signature.algorithm;
if (alg !== "1.2.840.113549.1.1.5" && alg !== "1.2.840.113549.1.1.11") {
   throw new Error("Signature algorithm " + alg + " is not supported yet.");
}

var hashName = "SHA-1";
if (alg === "1.2.840.113549.1.1.11") {
    hashName = "SHA-256";
}

window.crypto.subtle.importKey(
    'spki',
    certificate.tbsCertificate.subjectPublicKeyInfo.asn1.raw,
    {name: "RSASSA-PKCS1-v1_5", hash: {name: hashName}},
    true,
    ["verify"]
).
then(function(key) {
    publicKey = key;
}).
catch(function(err) {
    alert("Import failed: " + err.message);
});

Run this in the JavaScript console and then examine publicKey. It should show as a CryptoKey.

Verifying the signature is just as easy:

window.crypto.subtle.verify(
    {name: "RSASSA-PKCS1-v1_5", hash: {name: hashName}},
    publicKey,
    certificate.signatureValue.bits.bytes,
    certificate.tbsCertificate.asn1.raw
).
then(function(verified) {
    if (verified) {
        alert("The certificate is properly self-signed.");
    } else {
        alert("The self-signed certificate's signature is not valid.");
    }
}).
catch(function(err) {
    alert("Error verifying signature: " + err.message);
});

And that is that.

Comments (2)

October 17, 2014

Parsing BER and DER encoded ASN.1 Objects

Filed under: Uncategorized — Charles Engelke @ 10:18 am
Tags: asn.1, ber, der, webcrypto, x.509

[Updated October 21, 2014: Fixed error about how an indefinite length is encoded. It is not encoded as 0x00. It is encoded as 0x80. Text and code have been corrected.]

The question that likely comes to mind is, “Why?” What do BER, DER, and ASN.1 have to do with web crypto?

Some of the most dominant implementations of public key cryptography, along with important parts of the Web Cryptography API, use BER and DER encoded ASN.1 objects to serialize and deserialize keys, documents, and other important data structures. So parsing this format is a step along the way to handling those cryptographic objects. This post will create a JavaScript parser for these objects. It’s only a means to an end, and you don’t need to know all the gory details here in order to use the parser in upcoming posts. So feel free to skip this and wait for the next article about working with X.509 certificates.

Overview

ASN.1 is Abstract Syntax Notation 1, a method for describing data objects. It was created in the 1980s, is a binary format, and as the name says, it is abstract. To actually store or transfer an ASN.1 object, it must be encoded in a standard way. Distinguished Encoding Rules (DER) is the encoding used for X.509, and is an unambiguous subset of Basic Encoding Rules (BER). Those rules are specified in ITU standard X.690, but I recommend reading the Wikipedia article instead of the specification, at least first.

Since DER is a subset of BER, any valid DER encoded object is also a valid BER one, so we only need to know BER to handle both encodings. To interpret a BER encoded object, you go byte by byte, figuring out the characteristics of the object as you go:

The object will be in one of four classes: 0, 1, 2, or 3. 0 is the Universal Class, which defines a standard set of tags for objects. Other classes define other meanings of the tags, which can be application or context specific.
Every object has a tag that describes the kind of object it is. Tags are non-negative integers, and there’s no upper limit on how big they can be. Universal class tags include 1 (BOOLEAN), 2 (INTEGER), and so on.
Objects can be structured, meaning their value includes other BER encoded objects, or primitive. Marking this may seem redundant since the tag type implies structure or not, but that’s only true for universal class tags that have globally agreed meaning.
The contents of the object is a series of bytes. In most cases, the length of that sequence is explicitly specified preceding the content.

Parsing an Object

Here are the rules for getting the BER encoded information out of the byte array:

The first two bits of the first byte are the object’s class.
The third bit of the first byte tells whether it is primitive (0) or structured (1).
The remaining 5 bits of the first byte are the object’s tag, unless they are all 1. In that case, the next one or more bytes give the tag value. Take each byte until you encounter one with a leading 0 bit instead of a leading 1 bit. Drop the first bit from each byte, and concatenate the remaining bits. Interpret that result as an integer. (Don’t worry, examples will follow.)
The next byte starts defining the length of the contents. If it is 0x80, the length is unknown, and the contents immediately follow, trailed by two 0 bytes in a row. If the value of the first byte is between 0 and 128 (exclusive) then that value is the length. Otherwise, the value is 128 more than the number of bytes containing the length, which is interpreted as a big-endian integer. (Again, examples will follow.)
The remaining length bytes are the contents if the length is non-zero. Otherwise the length is unknown until you reach two zero bytes in a row. The contents in that case are everything up to, but not including, those two zero bytes.

Examples

Knowing the rules is good, but examples help a lot. Here are four that cover the major variations.Each one is shown in hex.

04 05 12 34 56 78 90

The first byte is 00000100 in binary. The first two bits are 00, so the class is 0 (Universal). The third bit is 0, so this is primitive. The last five bits are 00100. They’re not all 1, so they are the value of the tag: 4 in decimal form. The length is described starting in the second byte 05. That’s between 0 and 128, so that’s the length of the contents, and the contents are the next 5 bytes: 12 34 56 78 90.

30 82 02 10 04 01 56 … (for many more bytes)

The first byte is 00110000 in binary. The first two bits are 00 so the class is again 0. The third bit is 1, so it is structured. The last five bits are 10000, so the tag is 16 decimal. The next byte is 82 hex, which is 130 decimal, which is 128 + 2, the the following 2 bytes give the length. They are 02 10, which is interpreted in “big-endian” format as 2*256 + 16 = 528. The next 528 bytes, starting with 04 01 56, contain the contents.

df 82 02 05 12 34 56 78 90

The first byte is 11011111 in binary. The first two bits are 11, so this is class 3 – Private. The next bit is a 0, so this is primitive. The remaining five bits are all 1, so the actual tag starts in the second byte. The second byte has a leading one, and the third byte does not, so the tag is constructed by taking those two bytes (10000010 00000010 in binary), dropping their leading bits to get the fourteen bits 00000100000010, and interpreting this as a binary number. Thus, the tag is 258 decimal. The next byte is 05, which is less than 128, so that is the actual length of the contents. The next 5 bytes (12 34 56 78 90) are the contents.

30 80 04 03 56 78 90 00 00

The first byte, 30, is one we’ve seen before. It is universal class, structured, with tag 16. The next byte is 80, so the length is unknown at first. The contents are all the following bytes, up to (but not including) the first two sequential zero bytes. So the contents are 04 03 56 78 90, and we can figure out from the contents that the length is 5.

Those examples pretty much cover everything, so we can start on code.

berToJavaScript

This function will take a BER (or DER) encoded byte array (Uint8Array) and return a JavaScript object with fields cls (for class, which is a reserved word, an integer value), tag (integer), structured (boolean), and contents (Uint8Array). There won’t be a length field because the contents object automatically has a length property. But there will be a byteLength field, which tells how many bytes long the entire object it. It will also have a field called raw, which is the BER/DER encoded source data, included for ease of debugging.

Note: this code is absolutely not production ready! Given bad data it may try to overrun the byteArray it is given, causing an exception. Even good data may cause a problem if it includes numbers too big for JavaScript to represent exactly as a Number. A lot more data checking as it goes will need to be added to make this robust. The point here is just to get something good enough to explore things like X.509 certificates, and perhaps be a starting point for production code.

The code keeps track of the position of the next byte to examine, and extracts the fields sequentially. It’s pretty straightforward:

function berToJavaScript(byteArray) {
    "use strict";
    var result = {};
    var position = 0;

    result.cls              = getClass();
    result.structured       = getStructured();
    result.tag              = getTag();
    var length              = getLength(); // As encoded, which may be special value 0

    if (length === 0x80) {
        length = 0;
        while (byteArray[position + length] !== 0 || byteArray[position + length + 1] !== 0) {
            length += 1;
        }
        result.byteLength   = position + length + 2;
        result.contents     = byteArray.subarray(position, position + length);
    } else {
        result.byteLength   = position + length;
        result.contents     = byteArray.subarray(position, result.byteLength);
    }

    result.raw              = byteArray.subarray(0, result.byteLength); // May not be the whole input array
    return result;

    // Define the "get" functions here
}

The zero length byte possibility really adds some ugliness here, and cannot happen for DER encoded objects, so you can leave it out if that is all you need.

Each get function is pretty straightforward. These functions have to be nested inside berToJavaScript because they use the shared varables byteArray and position.

    function getClass() {
        var cls = (byteArray[position] & 0xc0) / 64;
        // Consumes no bytes
        return cls;
    }

Dividing by 64 is the same as shifting right six bits.

    function getStructured() {
        var structured = ((byteArray[0] & 0x20) === 0x20);
        // Consumes no bytes
        return structured;
    }

Getting the tag is trickier:

    function getTag() {
        var tag = byteArray[0] & 0x1f;
        position += 1;
        if (tag === 0x1f) {
            tag = 0;
            while (byteArray[position] >= 0x80) {
                tag = tag * 128 + byteArray[position] - 0x80;
                position += 1;
            }
            tag = tag * 128 + byteArray[position] - 0x80;
            position += 1;
        }
        return tag;
    }

Getting the length is similar to, but not exactly the same, as getting the tag:

    function getLength() {
        var length = 0;

        if (byteArray[position] < 0x80) {
            length = byteArray[position];
            position += 1;
        } else {
            var numberOfDigits = byteArray[position] & 0x7f;
            position += 1;
            length = 0;
            for (var i=0; i<numberOfDigits; i++) {
                length = length * 256 + byteArray[position];
                position += 1;
            }
        }
        return length;
    }

Checking It Out

With the get functions inserted inside berToJavaScript it should be possible to check the function out with the examples above, using a browser’s JavaScript console. Paste the assembled code (found at the bottom of this post) into the console to define the function, then create and try to parse each byte array:

var test1 = new Uint8Array([0x04, 0x05, 0x12, 0x34, 0x56, 0x78, 0x90]);
berToJavaScript(test1);

That yields the right result:

{cls: 0, structured: false, tag: 4, byteLength: 7, contents: Uint8Array[5]…}

The second test object needs to be a much longer byte array: 532 bytes at least (528 byte contents plus four bytes before the contents). So we create a bit byte array initialized to all zero, then fill in the first several bytes:

var test2 = new Uint8Array(532);
test2.set([0x30, 0x82, 0x02, 0x10, 0x04, 0x01, 0x56]); // Fill in the first several bytes, leave rest 0
berToJavaScript(test2);

Result is {cls: 0, structured: true, tag: 16, byteLength: 532, contents: Uint8Array[528]…}.

How about a big tag number?

var test3 = new Uint8Array([0xdf, 0x82, 0x02, 0x05, 0x12, 0x34, 0x56, 0x78, 0x90]);
berToJavaScript(test3);

Gives {cls: 3, structured: false, tag: 130, byteLength: 9, contents: Uint8Array[5]…}.

The last test has an indeterminate length:

var test4 = new Uint8Array([0x30, 0x00, 0x04, 0x03, 0x56, 0x78, 0x90, 0x00, 0x00]);
berToJavaScript(test4);

Yields {cls: 0, structured: true, tag: 16, byteLength: 9, contents: Uint8Array[5]…}.

These are hardly exhaustive tests, but give confidence that the parser should be good enough for further exploration. That’s going to happen in the next post, which will extract a public key from an X.509 certificate and verify the signature on a certificate.

Assembled Function

If you want to try this out yourself, here’s the code from above put together for ease of cutting and pasting:

function berToJavaScript(byteArray) {
    "use strict";
    var result = {};
    var position = 0;

    result.cls              = getClass();
    result.structured       = getStructured();
    result.tag              = getTag();
    var length              = getLength(); // As encoded, which may be special value 0

    if (length === 0x80) {
        length = 0;
        while (byteArray[position + length] !== 0 || byteArray[position + length + 1] !== 0) {
            length += 1;
        }
        result.byteLength   = position + length + 2;
        result.contents     = byteArray.subarray(position, position + length);
    } else {
        result.byteLength   = position + length;
        result.contents     = byteArray.subarray(position, result.byteLength);
    }

    result.raw              = byteArray.subarray(0, result.byteLength); // May not be the whole input array
    return result;

    function getClass() {
        var cls = (byteArray[position] & 0xc0) / 64;
        // Consumes no bytes
        return cls;
    }

    function getStructured() {
        var structured = ((byteArray[0] & 0x20) === 0x20);
        // Consumes no bytes
        return structured;
    }

    function getTag() {
        var tag = byteArray[0] & 0x1f;
        position += 1;
        if (tag === 0x1f) {
            tag = 0;
            while (byteArray[position] >= 0x80) {
                tag = tag * 128 + byteArray[position] - 0x80;
                position += 1;
            }
            tag = tag * 128 + byteArray[position] - 0x80;
            position += 1;
        }
        return tag;
    }

    function getLength() {
        var length = 0;

        if (byteArray[position] < 0x80) {
            length = byteArray[position];
            position += 1;
        } else {
            var numberOfDigits = byteArray[position] & 0x7f;
            position += 1;
            length = 0;
            for (var i=0; i<numberOfDigits; i++) {
                length = length * 256 + byteArray[position];
                position += 1;
            }
        }
        return length;
    }
}