theomn.com

Continuing on from the previous post in the series (Part 2), this post will be focusing on making async HTTP requests, generics, and JSON parsing with serde.

This time we'll be assembling all the pieces we've built thus far, and calling them from some new binary targets.

Finally, we will use the API client library to answer the questions we originally set out to answer:

Which was the first cross-over event to feature two particular characters?

Once again:

If you want to skip all the rationale and explanation, you can head over to GitHub and check out marvel-explorer which is where I prototyped a bunch of the code we'll be talking about.

Exploring Marvel's Character Data ◈

The problem we're trying to solve will require a couple different programs to help us explore the Marvel API.

The first program will be for searching for characters by name so we can pick out specific entities (or records). For this, we can leverage the helper function, UriMaker#character_by_name(), which we wrote in Part 1. The function was written to build a URL we can use to fetch a list of character entities that have a name starting with a given prefix.

This program is important for the exploration process since otherwise, we'd be stabbing in the dark looking for exact matches. At least now we can cast a wide net and see what comes back.

Much of the work for this program will happen in src/lib.rs, but will also include some argument and presentation handling over in src/bin/character-search.rs.

Here's an example of the usage:

$ cargo run --bin character-search thor
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/character-search thor`
+---------+----------------------------------+
| ID      | Name                             |
+---------+----------------------------------+
| 1009664 | Thor                             |
| 1017576 | Thor (Goddess of Thunder)        |
| 1017106 | Thor (MAA)                       |
| 1017315 | Thor (Marvel Heroes)             |
| 1017328 | Thor (Marvel War of Heroes)      |
| 1017302 | Thor (Marvel: Avengers Alliance) |
| 1011025 | Thor (Ultimate)                  |
| 1010820 | Thor Girl                        |
+---------+----------------------------------+

The actual program also includes a third column with a truncated description which has been cut out for readability's sake.

Modeling Character Responses ◈

The documentation for the Marvel Comics API shares a fairly detailed blueprint for what the various responses are shaped like.

A simplified version of the responses for "character" data might be:

{
  "data": {
    "results": [
      {"id": 1, "name": "...", "description": "..."},
      {"id": 2, "name": "...", "description": "..."},
      {"id": 3, "name": "...", "description": "..."},
      {"id": 4, "name": "...", "description": "..."}
    ]
  }
}

Many fields have been omitted at each level of the JSON data. We don't need them to complete our task. Likewise, serde does not require us to exhaustively model the data we will receive, rather we just have to worry about the data we want to extract.

We can write some structs to mirror the anticipated types of the fields (just as before in Part 2).

#[derive(Debug, Deserialize)]
pub struct Character {
    pub id: i32,
    pub name: String,
    pub description: String,
}

#[derive(Debug, Deserialize)]
struct CharacterDataWrapper {
    pub data: CharacterDataContainer,
}

#[derive(Debug, Deserialize)]
struct CharacterDataContainer {
    pub results: Vec<Character>,
}

If we look closely at the spec in the API docs, you may notice that the intermediate fields, data and results, are actually optional.

If we were to update our model to account for this, we could have our structs to use Option<T> for each field. Still, this might be pedantic and overly verbose for our use case. We have modeled the data we need, and anything less will effectively mean we can't offer a meaningful return value.

Another way to think about it is, if at any point one of the fields turns out to be null or missing, the end result (for us) is the same. We cannot proceed any further, so the outcome can only be failure. By modeling our needs we can skip the formality of carefully interrogating each Option to see if it is Some or None. Instead, we can simply ask serde if the data matches our expectations. If fields are missing, serde will give us an Err when we try to unpack the data into our struct.

Fetching the Character List ◈

To build our program, we'll write a function that can accept a name prefix. The pieces we built earlier in the series, Parts 1 and 2, are finally put to work for this.

In src/lib.rs we add on to MarvelClient.

impl MarvelClient {
    // ... snip ... continued from the work done in Part 2

    pub fn search_characters(
        &self, 
        name_prefix: &str
    ) -> Result<Vec<Character>, io::Error> {
        let uri = self.uri_maker.character_by_name(name_prefix);
        let work = self.get_json(uri).and_then(|value| {
            let wrapper: CharacterDataWrapper =
                serde_json::from_value(value).map_err(to_io_error)?;

            Ok(wrapper.data.results)
        });
        self.core.borrow_mut().run(work)
    }
}

Our new method puts all those pieces together. We use:

UriMaker to build our Uri.
MarvelClient#get_json() to prepare a Future that will run our request.
Future#and_then() to transform the return from that Future.
our helper to_io_error() converts any potential serde_json::Error to std::io::Error so as to be compatible with hyper's internal error handling.
our core to schedule the Future work, and block until it completes.

Writing the character-search binary ◈

With this method written, we can build our command line program in src/bin/character-search.rs

extern crate marvel_explorer;
#[macro_use]
extern crate prettytable;

use marvel_explorer::MarvelClient;
use prettytable::Table;
use prettytable::format;
use std::env;

fn main() {
    // read our auth info from environment vars
    let key = env::var("MARVEL_KEY").unwrap();
    let secret = env::var("MARVEL_SECRET_KEY").unwrap();
    
    // create an instance of our client.
    let client = MarvelClient::new(key, secret);
    
    // read a hero name to search for from the 
    // arguments to our program.
    let name = env::args().nth(1).expect("name");

    match client.search_characters(&name) {
        Err(e) => eprintln!("{:?}", e),
        Ok(results) => {
            // Create the table
            let mut table = Table::new();
            table.set_format(
                *format::consts::FORMAT_NO_LINESEP_WITH_TITLE);
            // Add a row
            table.set_titles(row!["ID", "Name"]);

            for character in &results {
                table.add_row(row![character.id, character.name]);
            }
            table.printstd();
        }
    };
}

Most of the work has already been done for us in our library code.

The binary program brings in our library, pulls keys for the Marvel API from environment variables set in the shell before running the program.

See the "Get a Key" section of the Marvel Comics API site for how to get your own keys.

It also reads from the arguments list to capture the name we want to search for.

Once all the inputs for our request have been collected, we can make our request by calling our new method.

Since the method returns a Result, and we're at the top of our call stack, we finally use it to present the outcome to the user. We use match to inform the user of either the success or failure of our request.

In the case of an error, we simply print to stderr some debug information about the error itself by using the debug format token ("{:?}"). In a more complete program, we might take further steps to provide better details or recommendations to the user in this section of the program.

In the case of success, we format the data and print it out. I found a crate to help me with this. Using prettytable-rs, we can set some table headers and insert rows, one per matching entity, then print the result.

The output looks like the example up at the start of the post.

Exploring Marvel's Event Data ◈

In Part 1 we wrote a method for building URLs to fetch a list of events associated with a specific character id.

We will now use this to build a binary in src/bin/character-events.rs

Here is some example output using the top id from our earlier search for "thor":

$ cargo run --bin character-events 1009664
   Compiling marvel-explorer v0.1.0 (file:///home/owen/projects/marvel-explorer)
    Finished dev [unoptimized + debuginfo] target(s) in 4.39 secs
     Running `target/debug/character-events 1009664`
+-----+-----------------------+---------------------+
| ID  | Title                 | Date                |
+-----+-----------------------+---------------------+
| 116 | Acts of Vengeance!    | 1989-12-10 00:00:00 |
| 233 | Atlantis Attacks      | 1989-01-01 00:00:00 |
... snip ...
| 273 | Siege                 | 2009-12-06 00:00:00 |
| 60  | World War Hulks       | 2007-07-07 00:00:00 |
+-----+-----------------------+---------------------+

Again, in the actual program, there is a 4th column with a truncated description which is omitted for readability.

Modeling Event Responses ◈

Just as with the character data, we begin by building some structs to represent the data.

#[derive(Clone, Debug, Deserialize, Eq, Hash, PartialEq)]
pub struct Event {
    pub id: i32,
    pub title: String,
    pub start: Option<String>,
    pub description: String,
}

#[derive(Debug, Deserialize)]
struct EventDataWrapper {
    pub data: CharacterDataContainer,
}

#[derive(Debug, Deserialize)]
struct EventDataContainer {
    pub results: Vec<Event>,
}

This time, we will be just a bit more permissive. We'll define the Event.start field as Option<String>. This is to say, if there are any events in the response without dates associated with them, we don't want to reject the whole data set.

Instead, we can sort the entities without a date to the bottom of our list, or filter them out completely when searching for the earliest event.

You may notice there is an alarming amount of overlap between the EventDataContainer, and the CharacterDataContainer we wrote earlier in the post. In fact, if these two types could be unified somehow, then the EventDataWrapper and CharacterDataWrapper types would also be virtually identical.

This is where the Rust generics system can really shine! Let's take a quick detour to reduce this duplication.

Refactoring the Model with Generics ◈

When we talk about generics we are talking about types that are framed in terms of other types.

We've already seen this in practice with Option, where we have a single Option type that can be reused for any number of other types by filling in the type parameter in angle brackets.

Option<String> represents either a String or None. Option<i32> represents either an i32 or None. The generics system allows us to parameterize a type as an input so we don't have to have separate concrete types for each variant. This is what makes it possible to not need an OptionalString type, for example.

To leverage this in our program, we can define some new types to replace the ones we originally wrote.

#[derive(Debug, Deserialize)]
struct DataWrapper<T> {
    pub data: DataContainer<T>,
}

#[derive(Debug, Deserialize)]
struct DataContainer<T> {
    pub results: Vec<T>,
}

In this case, T represents an unknown type to be decided later by the caller.

To trace through, imagine you have a DataWrapper<Character>. The compiler makes the replacement for T all the way through these type definitions. By using this signature in our code, we are telling the compiler that we have a struct with a data field pointing to a DataContainer<Character>, which has a results field pointing to a Vec<Character>.

Likewise, by using a DataWrapper<Event> we follow the same path, ending with a results field holding a Vec<Event>. We just let the compiler stamp out equivalent types to the ones we wrote manually earlier! To sweeten the deal, we can continue to use these generic structs for new types later on, if we wanted to explore other areas of the Marvel data.

The generics system is really awesome and can be a great help to reduce repetition in your code. Be on the lookout for places where generics can help.

Fetching the Event List ◈

With our new generic data model, we can now implement a new fetcher method to get us a list of events for a given character id.

impl MarvelClient {
    // ... snip ...
    pub fn events_by_character(
        &self, 
        character_id: i32
    ) -> Result<Vec<Event>, io::Error> {
        let uri = self.uri_maker.character_events(character_id);
        let work = self.get_json(uri).and_then(|value| {
            let wrapper: DataWrapper<Event> = 
                serde_json::from_value(value).map_err(to_io_error)?;
            Ok(wrapper.data.results)
        });
        self.core.borrow_mut().run(work)
    }
}

The work being done in this function is pretty much the same as the last. We're gathering the inputs needed to make our request, building the request, transforming the data by attaching a closure to the Future returned by get_json(), and finally scheduling the work on our tokio core, and blocking to await the final result.

Writing the character-events binary ◈

Just like last time, we gather up authorization info from the environment, collect an argument from the args list, and present the outcome to the user.

extern crate marvel_explorer;
#[macro_use]
extern crate prettytable;

use marvel_explorer::MarvelClient;
use prettytable::Table;
use prettytable::format;
use std::env;

fn main() {
    let key = env::var("MARVEL_KEY").unwrap();
    let secret = env::var("MARVEL_SECRET_KEY").unwrap();
    let client = MarvelClient::new(key, secret);
    let id: i32 = env::args()
        .nth(1)
        .expect("character_id")
        .parse()
        .expect("parse character_id");

    match client.events_by_character(id) {
        Err(e) => eprintln!("{:?}", e),
        Ok(results) => {
            let mut table = Table::new();
            table.set_format(
                *format::consts::FORMAT_NO_LINESEP_WITH_TITLE);
            table.set_titles(row!["ID", "Title", "Date"]);
            for event in &results {
                // fall back to empty string if an event
                // has no start date.
                let start = match event.start {
                    Some(ref s) => s,
                    None => "",
                };
                table.add_row(row![event.id, event.title, start]);
            }
            table.printstd();
        }
    };
}

This program is very similar to the last, except for:

parsing the first argument as an i32 instead of leaving it as String.
normalizing the value of start so it is always a str (even when the value is None).

Beyond these minor points, everything should look familiar.

Putting it all together ◈

So far, we've been able to fetch a list of character matching a given name prefix and a list of events for a given character id. Technically we have enough information to answer the question, "which event was the first to feature two particular characters?"

Still, it would be nice if we didn't have to run our two programs multiple times to spit out lists of events that we'd have to find matches in, then sort by date.

Seems like this is the sort of thing a computer should be able to do for us neatly. Let's see if we can manage to automate all that.

Calculating the Earliest Event ◈

To calculate the earliest event that two characters participated in, we can follow a process like:

Look up the character id so we can get a list of events.
Fetch the event list for each character.
Pack the event lists into a HashSet so we can intersect the two lists.
Sort the intersection by date to get the earliest result.

impl MarvelClient {
    // ... snip ...
    pub fn earliest_event_match(
        &self,
        name1: &str,
        name2: &str,
    ) -> Result<Option<Event>, io::Error> {
        let name_to_event_set = |name: String| {
            let id_lookup = self.uri_maker
                .character_by_name_exact(&name);
            self.get_json(id_lookup)
                .and_then(move |characters_resp| {
                    let wrapper: DataWrapper<Character> =
                        serde_json::from_value(characters_resp)
                            .map_err(to_io_error)?;
                    match wrapper.data.results.first() {
                        Some(character) => Ok(character.id),
                        None => Err(io::Error::new(
                            io::ErrorKind::Other,
                            format!("Character `{}` Not Found", name),
                        )),
                    }
                })
                .and_then(|id| {
                    let uri = self.uri_maker.character_events(id);
                    // Return a future from a future.
                    // The next `.and_then()` receives the resolved
                    // value of this.
                    self.get_json(uri)
                })
                .and_then(|events_resp| {
                    let wrapper: DataWrapper<Event> =
                        serde_json::from_value(events_resp)
                            .map_err(to_io_error)?;
                    let result_set: HashSet<Event> = 
                        wrapper.data.results.into_iter().collect();
                    Ok(result_set)
                })
        };

        // build up a graph of futures to compute the final value.
        let work = name_to_event_set(name1.to_owned())
            .join(name_to_event_set(name2.to_owned()))
            .and_then(|(events1, events2)| {
                let maybe_event: Option<Event> = events1
                    .intersection(&events2)
                    .min_by_key(|x| &x.start)
                    .map(|x| x.clone());
                Ok(maybe_event)
            });

        self.core.borrow_mut().run(work)
    }
}

If I'm completely honest, this design is greatly influenced by the Rust borrow checker. If I were stronger with the language, I might have figured out a way to rewrite the name_to_event_set closure as a method on MarvelClient, which I would prefer, but I suppose there are some complicated ownership aspects to this. Likely this sort of thing will be improved with the async API changes coming to Rust later this year.

This method, as written, is a bit of a mouthful, so let's step through it.

Early in the method, we define a new closure called name_to_event_set. This represents the work we will be doing for each character name.

At the start of this section, we laid out some bullet points for the steps required to compute our final result. This closure effectively handles all but the final point (comparing the events for each character). The closure will:

Translate a character name into an id.
Use that id to fetch event list JSON data.
Convert the data into a HashSet<Event> so we can easily find the intersection.

Once we finish defining this closure (which accounts for the bulk of this method), we can build a complete graph of the work to be done.

The .join() method on Future allows us to transform two separate futures into a single future that will resolve once each of the individual futures resolves. By using this, we can run our closure twice, in parallel, once for each character name. Since the return of a .join() is yet another future, we can chain a call to .and_then() to perform a final task that consumes the final values from each call to name_to_event_set.

This diagram shows how these tasks might be scheduled on our tokio core.

flow of futures to compute result

Each box inside the blue groupings roughly corresponds to the 3 .and_then() calls chained to the initial call to get_json(). Each of these steps echoes work we did previously so I won't go into detail on them.

The final step performs the intersection on the two event sets, and uses the .min_by_key() method to either find None or Some(event) with the lowest start field.

You may have noticed the definition for the Event struct had a long list of traits derived for it, most of which were not required for the other types we defined.

The Eq, Hash, and PartialEq are what allow Event to be stored in a HashSet.

Writing the first-event binary ◈

Here's src/bin/first-event.rs

extern crate marvel_explorer;
use marvel_explorer::MarvelClient;
use std::env;

fn main() {
    let key = env::var("MARVEL_KEY").unwrap();
    let secret = env::var("MARVEL_SECRET_KEY").unwrap();
    let client = MarvelClient::new(key, secret);
    let name1: String = env::args().nth(1).unwrap();
    let name2: String = env::args().nth(2).unwrap();
    match client.earliest_event_match(&name1, &name2) {
        Err(e) => eprintln!("{:?}", e),
        Ok(maybe_event) => {
            println!("{:?}", maybe_event);
        }
    };
}

Here's the final output:

$ cargo run --bin first-event Deadpool Nightcrawler
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/first-event Deadpool Nightcrawler`
Some(Event { id: 318, title: "Dark Reign", start: Some("2008-12-01 00:00:00"), description: "Norman Osborn came out the hero of Secret Invasion, and now the former Green Goblin has been handed control of the Marvel Universe. With his Cabal and the Dark Avengers at his side, can anything stop this long time villain from reshaping the world in his own image? And what has become of the heroes?" })

Not as pretty as the other table-based outputs, but it does the job. Since we're using the debug format token, we can see if the two characters have a match or none at all since the printed output will show one of Some(... blob of data ...), None, or some kind of error.

Wrapping Up ◈

This has been a guided tour of working with an assortment of libraries to build a small client for the Marvel Comics API.

While the code may not be production-ready, it shows how quickly a somewhat trivial task can send a developer into dealing with topics that wouldn't require attention when working in another language.

Notably, our interior mutability handling is not thread-safe, so concurrent access to UriMaker in our implementation for MarvelClient#earliest_event_match() could result in a panic! at runtime. I suggest you check out Ricardo Martins' Interior Mutability in Rust, part 2: thread safety for an in-depth guide on thread-safe alternatives to RefCell.

At the very least, I hope this series helps to "break the ice," highlighting topics for further reading.