theomn.com

Continuing on from the previous post in the series (Part 1), this post will focus on making async HTTP requests, and JSON parsing with serde.

Much of this post will be spent introducing concepts and building foundational tech that will be put to use in the next post.

If you want to skip all the rationale and explanation, you can head over to GitHub and check out marvel-explorer which is where I prototyped a bunch of the code we'll be talking about.

Before we finally get down to business, know that a tremendous ergonomic enhancing language feature known as impl Trait, introduced in Rust 1.26 along with many other great changes, has a huge impact on the topic of returning futures.

Returning futures from functions was made a little ugly by some aspects of the type system, prior to impl Trait.

I'll be showing an example of how this would be done prior to 1.26, and then show how the code would be refactored to leverage impl Trait instead.

There are also significant changes to async strategies in Rust coming to the language (later this year?) which will likely impact the design of the crates we'll employ, but those changes are further out, potentially warranting a follow-up post when the new features are stable.

In addition to being aware of the Rust version being targeted, it's worth remembering that many of the crates being used here are pre-1.0, meaning their APIs can be highly volatile with each release.

I'd like to draw attention to the versions of the futures, hyper, and tokio crates.

futures = "0.1.0"
hyper = "0.11.25"
tokio-core = "0.1.17"

The futures crate has a newer release available (0.2), which is not compatible with the 0.1 API currently targeted by this specific version of hyper.

See the dependencies section of the Cargo.toml in the marvel-explorer repo for a full listing.

Building our API Client ◈

extern crate futures;
extern crate hyper;
extern crate hyper_tls;
extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;
extern crate tokio_core;

use std::cell::RefCell;
use std::io;

use futures::{Future, Stream};
use hyper::Client;
use hyper_tls::HttpsConnector;
use hyper::client::HttpConnector;
use serde_json::Value as JsValue;
use tokio_core::reactor::Core;

/// type alias for a custom hyper client, configured for HTTPS
/// instead of HTTP.
type HttpsClient = Client<HttpsConnector<HttpConnector>, hyper::Body>;

/// The top level interface for interacting with the remote service.
pub struct MarvelClient {
    /// The `UriMaker` we built in Part 1 of the series.
    uri_maker: UriMaker,
    /// tokio "core" to run our requests in.
    core: RefCell<Core>,
    /// hyper http client to build requests with.
    http: HttpsClient,
}

The above code shows the external crates we're going to be depending on for this next chunk of work, as well as the shape of the MarvelClient struct we'll be building on top of.

Since this struct has a handful of moving pieces, next up we will build a new() method to handle the initialization for us.

Initializing the Client ◈

impl MarvelClient {
    pub fn new(key: String, secret: String) -> MarvelClient {
        let uri_maker = UriMaker::new(
            key,
            secret,
            "https://gateway.marvel.com:443/v1/public/".to_owned(),
        );

        let core = Core::new().unwrap();

        let http = {
            let handle = core.handle();
            let connector = HttpsConnector::new(4, &handle).unwrap();

            Client::configure()
                .connector(connector)
                .build(&handle)
        };

        MarvelClient {
            uri_maker,
            core: RefCell::new(core),
            http,
        }
    }
    // ... snip ...
}

When creating a new client instance, we'll ask the caller to supply the public and private keys, then the new() method takes care of the rest.

Inside, we pass the keys along to a new UriMaker (the struct we defined in the previous post), as well as supplying the prefix for the URIs we will build.

Although the Marvel Comics API seems to offer access over both HTTP and HTTPS, generally speaking, HTTPS should always be preferred. Hyper works with HTTP out of the box, but to get HTTPS working, we actually need the help of another crate.

To enable hyper to make HTTPS requests, hyper-tls is one such option, though seemingly there are others. hyper-tls provides an HttpsConnector struct which we can hand off to the hyper Client via its builder interface. Since most of the pieces required for the construction of the Client are only needed by the client itself, we can build the whole thing within a block to hide the internals (the connector, and core handle) from prying eyes.

The core is a piece of the tokio landscape, and worth noting. The core is essentially an "event loop" which may be a term you recognize from the async APIs provided by other languages. The core will start up a background thread which will monitor the status of additional "tasks" (threads) submitted to the core for execution.

The futures crate pairs with tokio-core, and provides the primitives for building the "tasks" we submit to the core. Building individual futures for execution in the core is the simplest case, but the API allows for the more advanced usage of chaining futures together as a graph of tasks.

Next, we'll add a method that will run a basic HTTP request (as a future) and parse the response body as JSON. Since this method will return a futures::Future, we will be able to chain it to extract more meaningful values from the resulting JSON response.

A Brief Intro to Serde ◈

When working with JSON data in Rust, a popular library to use is serde.

serde, named for a combination of ser(ialize) and de(serialize), is a data transformation library with various supported formats (provided by additional crates).

In the initial code sample of this post, you can see 3 crates being used related to serde:

extern crate serde;
#[macro_use]
extern crate serde_derive;
extern crate serde_json;

The serde crate is the core library which provides traits and types that the various format-providing crates are built on top of.

The serde_derive crate is used for the auto-generation the Serialize and Deserialize traits based on a given struct. This is a nice convenience since the appropriate JSON types corresponding to many primitive types in Rust are well known.

Finally, the serde_json crate adds in the format-specific logic to allow serde to parse and produce JSON data.

Parsing JSON with Serde ◈

Given a piece of JSON:

{
  "id": 1009664,
  "name": "Thor",
  "description": "As the Norse God of thunder and ligh..."
}

we can define a struct representation, which by using the derive annotation will have an auto-generated implementation for the Deserialize trait.

#[derive(Debug, Deserialize)]
pub struct Character {
    pub id: i32,
    pub name: String,
    pub description: String,
}

So long as the types listed for each field are compatible with the types found for the corresponding keys in the JSON object, we can parse this JSON and unpack it into an instance of the struct. This is normally done by using one of the many serde_json::from_* methods.

In our case, the data we receive from our HTTP requests will be a slice of bytes. In Rust this is represented as &[u8], so we can use serde_json::from_slice() to parse these responses.

#[test]
fn test_parse_bytes() {
    // .as_bytes() converts `str` to `&[u8]`
    let raw_json: &[u8] = r##"{
      "id": 1009664,
      "name": "Thor",
      "description": "As the Norse God of thunder and ligh..."
    }"##.as_bytes();
    
    // the type ascription here tells serde what kind of 
    // struct to unpack the data into.
    let character: Character = 
        serde_json::from_slice(raw_json).unwrap();
        
    assert_eq!(character.id, 1009664);
    assert_eq!(character.name.as_str(), "Thor");
    assert_eq!(
        character.description.as_str(), 
        "As the Norse God of thunder and ligh..."
    );
}

[view in playground][test-parse-bytes] In the above example the type hint of `Character` tells `serde` how to collect the data it's trying to parse. If the raw JSON does not conform to the shape, the `Result` return from `serde_json::from_slice()` will be a `serde_json::Error`.

Since we're going to be making different requests resulting in differently shaped JSON structures, it might be nice to not specialize right away.

For this, serde has an intermediate representation of the parsed data structure known as serde_json::Value. This generic Value allows for indexing, so you can do some basic traversal through the JSON data to retrieve a single value rather than building many intermediate structs to get down to a specific branch of the tree. You can also take unpack them into a struct with serde_json::from_value() just as you would when parsing string or byte JSON data.

Doing the initial JSON parsing (as a Value), then specializing later, can be especially helpful if you want to have overlapping "views" of the same data. For the purpose of our project, doing an initial Value parse will help us to fail fast if the response body is not even valid JSON and give us a nice generic type we can use in our signatures for the futures we'll return when making HTTP requests to the Web API.

#[test]
fn test_parse_bytes_as_value() {
    let raw_json: &[u8] = r##"{
      "id": 1009664,
      "name": "Thor",
      "description": "As the Norse God of thunder and ligh..."
    }"##.as_bytes();
    
    // serde can produce an intermediate representation 
    // as a `serde_json::Value`.
    let value: serde_json::Value =
        serde_json::from_slice(raw_json).unwrap();
        
    assert_eq!(value["id"], 1009664_i32);
    assert_eq!(value["name"], "Thor");
    assert_eq!(
        value["description"], 
        "As the Norse God of thunder and ligh..."
    );
}

[view in playground][test-parse-bytes-value]

Next, we'll put all this to use by fetching some JSON from via HTTP with hyper.

A Brief Intro to Futures ◈

We'll be using hyper to make our HTTP requests. hyper uses another crate called futures to allow for async execution of these requests.

futures in Rust are not that different from using promises in JavaScript, or using futures in Scala. The main idea is you have a task that executes in a non-blocking fashion.

Futures can succeed or fail, just like a Result. As such they are defined with two associated types. In the next code sample, you'll see the future type of the fn get_json() method defined as Future<Item = JsValue, Error = io::Error>, meaning if the future succeeds we'll end up with a serde_json::Value, or otherwise we'll get a std::io::Error.

hyper expects the types of Error for futures to conform to some specific kinds, and their docs hint towards using std::io::Error as a convenient path to working with those expectations. This is because they handle the conversion between std::io::Error and their own Error types. In light of this, we can use a small helper function like the following:

use std::io;

fn to_io_error<E>(err: E) -> io::Error
where
    E: Into<Box<std::error::Error + Send + Sync>>,
{
    // We can create a new IO Error with an ErrorKind of "other", then
    // pass in the actual error as data inside the wrapper type.
    io::Error::new(io::ErrorKind::Other, err)
}

This helper will allow us to effectively wrap any Error type within a std::io::Error, thus conforming to hyper's error handling demands.

In addition to succeeding and failing, futures can be combined in interesting ways. You can chain futures together using the and_then() combinator meaning each step in the chain will execute after the previous has completed.

Additionally, you can execute several futures in parallel and then operate on the respective returns when they are all complete. In this way, we can define complex graphs of async execution, all working towards producing a final result.

Making Requests ◈

Since hyper uses futures for making HTTP requests, and we'll ultimately be making requests to various API endpoints, and expecting JSON responses in each case, I opted to structure my client code around the idea of doing this initial step in one function, returning a future of a JSON value. Other functions can then be written to use this and transform the value by chaining as needed.

impl MarvelClient {
    // ... snip ... continued from the previous code sample

    /// Given a uri to access, this generates a future json 
    /// value (to be executed by a core later).
    fn get_json(
        &self,
        uri: hyper::Uri
    ) -> Box<Future<Item = JsValue, Error = io::Error>> {
        let f = self.http
            .get(uri)
            .and_then(|res| {
                res.body().concat2().and_then(move |body| {
                    let value: JsValue = serde_json::from_slice(&body)
                        // Wrap the `serde_json::Error` in a 
                        // `std::io::Error` (when needed).
                        .map_err(to_io_error)?;
                    Ok(value)
                })
            })
            .map_err(to_io_error);
        Box::new(f)
    }
    // ... snip ...
}

The above code sample defines our new private function, get_json(), which accepts a hyper::Uri to make a request to, then builds that request as a future.

It is worth noting that calling this function will not cause any network connections to be made. The future does not start execution until scheduled in a tokio core. The return of this function is simply planned work to be executed later.

As described in the tokio docs on returning futures, we are returning a Box of our future type, which is what I'd consider the most straightforward approach in Rust versions prior to 1.26.

For Rust 1.26 and above, impl Trait offers a simplified syntax which does not require us to put the future in a Box thus making it faster, and more memory efficient.

You can see the more simple, and more efficient version in the impl-trait branch of the the marvel-explorer repo (diff).

Wrapping Up ◈

In this post, we created a new struct, MarvelClient, which will be the way our binaries will ultimately interact with the Marvel Comics API.

It's got a foundational method for fetching data from various API endpoints, get_json(), but we don't yet have anything to call it.

In the next post, we'll call get_json() from some new methods, chaining futures together to compute a result.