theomn.com

Writing a client for a web API is a trivial task in many languages. In fact, as a new student of Python, you might even build such a thing soon after a "hello world".

If you are new to Rust, you may immediately run into a few topics which I think are non-obvious, and perhaps difficult to navigate from a cold start.

This series of posts aims to break the ice by working through a toy example using the public Marvel Comics API. Along the way, I'll touch on some of the popular library choices and how they apply to the problem at hand.

Some of the Rust topics we'll cover in this series include:

Working with "split" crate project (multiple binaries with a shared library).
Interior Mutability.
Using generics to reduce code duplication.
Making async HTTP requests with hyper, futures, and tokio.
JSON parsing with serde.

In this post of the series, we'll be focusing on project setup and interior mutability.

If you want to skip all the rationale and explanation, you can head over to GitHub and check out marvel-explorer which is where I prototyped a bunch of the code we'll be talking about.

The Mission ◈

Marvel's Infinity War is currently showing in theaters and has been heralded as the most ambitious cross-over event in history. For this article, our mission will be to use the Marvel Comics API to answer the question:

Which was the first cross-over event to feature two particular characters?

To accomplish this task, our problem space can be broken down into the following areas. Some tasks on the critical path:

A way to see how characters are identified in the remote system (so we can ask for additional information about them).
A way to get a list of events featuring a given character.

In addition to these tasks, we need to implement some sort of system for safely building the API URLs in a way which satisfies the authorization requirements set by the service.

This "URL building" aspect is the first we'll tackle as we get underway.

Project Setup ◈

Following the marvel-explorer prototype, built as a companion to this article, we'll start by creating a new "lib" cargo project.

$ cargo new --lib marvel-explorer

The problem space we've defined could call for more than one program, each sharing some common code. Cargo can accommodate this nicely since while a project can only have one library target, it can still have many additional binary targets.

In your Cargo.toml you can define extra sections for each binary target, mapping a name to a "main" source file (which must contain a main() function). For example:

[[bin]]
name = "foo"
path = "src/any/old/path/foo.rs"

[[bin]]
name = "bar"
path = "extras/bin/bar/main.rs"

In this case, we've defined two programs named foo and bar. Note that the name of the source file is inconsequential, and that the sources are not even required to be under your project's src/ directory.

This is the explicit way to define binary targets, but since this is a common practice, there is a more convenient implicit way, following a simple convention: any rust source matchingsrc/bins/NAME.rs produces a binary target called NAME.

Regardless of how you define binaries in your project, inside each one you'd be able to list our lib using extern crate to get access to it.

While we're ready to share code across multiple binaries, all the code we'll write today will be in src/lib.rs (part of our shared lib target).

Now, on to the coding!

Building API URLs ◈

In the most simple case, building URLs could be done with the basic string formatting, but to do this safely I tend to use the url crate. I mention safely here since it is important that your URLs are percent-encoded where required, and when you are accepting user input to build your URLs, there's no way you can know what will get fed in. In this case, it's nice to have a library to take care of the encoding concerns for you!

As it would happen, the Marvel Comics API is not the most simple case. The service uses a "shared secret" authorization scheme where each request must include the following query string parameters:

apikey, your public key in clear text.
ts, a value which varies from request to request (Marvel recommends using a timestamp for this).
hash, the md5 hash of ts + your private key + apikey.

By sending all this information, Marvel can verify each request by repeating the hashing process on their end to confirm that we both have the same private key string. If the private key portion differs, the resulting hashes would not match!

To satisfy all these requirements we also bring in the rust-crypto crate.

Finally, while the url crate is nice for building URLs programmatically, the url::Url struct it provides is actually not compatible with the HTTP client provided by hyper. In order to make our requests, we'll have to convert our final url::Url into a hyper::Uri. As such, we need to bring in hyper to do this.

extern crate crypto;
extern crate hyper;
extern crate url;

use std::cell::RefCell;
use std::time::{SystemTime, UNIX_EPOCH};
use crypto::digest::Digest;
use crypto::md5::Md5;
use hyper::Uri;
use url::Url;

struct UriMaker {
    /// Our Marvel API *public* key
    key: String,
    /// Our Marvel API *private* key
    secret: String,
    /// The prefix of every url we'll be producing.
    api_base: String,
    /// Our md5 hasher, used to generate our `hash` query
    /// string parameter.
    hasher: RefCell<Md5>,
}

The above code lays out the general types we'll be working with for this problem. This struct has been modeled to hold all the pieces we'll need to do the work we've outlined. Basic Strings are used to store the public and private keys, as well as the common prefix for all the URLs we want to build.

The hasher field is a little more exotic in comparison. First, a little background on the hasher itself.

The Md5 hasher we get from the crypto crate has an API whereby you feed in bytes of data, potentially via multiple calls, then at some point later you request the hash digest, which is the string of characters the hasher has computed for the supplied input. In order for this to work, the various inputs fed into the hasher are buffered internally, which in Rust require mutable access to the data structure.

In Rust, the mutability of objects is managed by something called the borrow checker. The borrow checker acts like a lock, ensuring that only one part of the program can modify an object at a time.

When I first started working with Rust, I was confused by the borrow checker barking at me because of my unintentional double-borrow attempts, and how I had to keep marking things as mutable with the mut keyword even though I felt like I shouldn't have to.

A big gap in my initial mental model for how the borrow checker works was around how the mutable state of an object is all or nothing. If you need to call a method on an object which modifies some private field, the entire object needs to be marked as mutable. This can be a big hassle for the callers of your code since it means they will need to be aware of what operations will require the object to be mut. Additionally, this impacts any objects, all the way up the ownership chain, which own fields of a type requiring mutability to function.

This is where RefCell can help! RefCell can be employed to limit the scope of a mutability change such that pivot from immutable to mutable is not seen further up the chain of ownership. The term for this is managing interior mutability.

As we implement methods for the UriMaker struct, we'll see how RefCell works in practice.

impl UriMaker {
    /// convenience method to initialize a new `UriMaker`.
    pub fn new(
        key: String,
        secret: String,
        api_base: String
    ) -> UriMaker {
        UriMaker {
            key,
            secret,
            api_base,
            hasher: RefCell::new(Md5::new()),
        }
    }

    /// Produces an md5 digest hash for ts + private key + public key
    fn get_hash(&self, ts: &str) -> String {
        // The `RefCell` lets us get a mutable reference to the
        // object within while not having to flag the whole `UriMaker`
        // as mutable.
        let mut hasher = self.hasher.borrow_mut();
        hasher.reset();
        hasher.input_str(ts);
        hasher.input_str(&self.secret);
        hasher.input_str(&self.key);
        hasher.result_str()
    }

    /// Convert from a `url::Url` to a `hyper::Uri`.
    fn url_to_uri(url: &url::Url) -> Uri {
        url.as_str().parse().unwrap()
    }

    /// Append a path to the api root, and set the authorization
    /// query string params.
    fn build_url(
        &self,
        path: &str
    ) -> Result<Url, url::ParseError> {
        let ts = {
            let since_the_epoch =
                SystemTime::now().duration_since(UNIX_EPOCH).unwrap();
            let ms = since_the_epoch.as_secs() * 1000
                + since_the_epoch.subsec_nanos() as u64 / 1_000_000;
            format!("{}", ms)
        };
        let hash = &self.get_hash(&ts);
        let mut url = Url::parse(&self.api_base)?.join(path)?;

        url.query_pairs_mut()
            .append_pair("ts", &ts)
            .append_pair("hash", hash)
            .append_pair("apikey", &self.key);
        Ok(url)
    }
    // ... snip ...
}

In the above code, we're defining methods for the default implementation for UriMaker. We've attached a public UriMaker::new() method which will simply return a new instance, handling the creation of the hasher for the caller. Notice that all the fields are private - there's currently no need for access to these from the outside.

In addition to UriMaker::new(), we've also added a couple private instance methods which will be used by other methods which will act as our public interface.

In the get_hash() method, we are able to get access to a mutable reference to the hasher thanks to the RefCell wrapper. If we didn't have RefCell::borrow_as_mut() to manage the mutability for us, get_hash() would need to be defined instead with &mut self in the parameter list, which in turn would require whatever owned UriMaker to be mutable as well.

The UriMaker::url_to_uri() method is just a little helper to convert from type to type. We'll be calling it to finalize the values of your public methods right before they return.

The build_url() method simply assembles all the parts. We take the prefix + path + standard query string and package it all up in a url::Url which the caller can modify further as need. The common case for further modification is to simply add additional query string parameters (as we'll see in this next code sample).

Moving on, we'll flesh out the public interface for this type.

impl UriMaker {
    // ... snip ... continued from the previous code sample

    /// Lookup character data by name (exact match).
    pub fn character_by_name_exact(&self, name: &str) -> Uri {
        let mut url = self.build_url("characters").unwrap();
        url.query_pairs_mut().append_pair("name", name);
        Self::url_to_uri(&url)
    }

    /// Lookup character data by name (using a "starts with" match).
    pub fn character_by_name(&self, name_starts_with: &str) -> Uri {
        let mut url = self.build_url("characters").unwrap();
        url.query_pairs_mut()
            .append_pair("nameStartsWith", name_starts_with);
        Self::url_to_uri(&url)
    }

    /// Get all the events for a given character.
    pub fn character_events(&self, character_id: i32) -> Uri {
        let mut url = self.build_url(
                &format!("characters/{}/events", character_id)
            ).unwrap();
        url.query_pairs_mut()
            // 100 is currently the largest limit we can set.
            .append_pair("limit", &format!("{}", "100"));
        Self::url_to_uri(&url)
    }
}

These public methods offer a simple interface to turn questions like, "which characters match this name prefix?" or, "which events were this character featured in?" into fully-fledged hyper::Uri instances we can pass directly to hyper to fetch some answers.

Wrapping Up ◈

In this post, we looked at how to arrange a cargo project for sharing code across multiple binary targets.

In the next post, we'll introduce the serde and futures crates, then build a new struct called MarvelClient which will house all the hyper HTTP client tech we'll need to actually make requests.