Working with data

Overview

LocusZoom.js aims to provide reusable and highly customizable visualizations. Towards this goal, a separation of concerns is enforced between data adapters (data) and data layers (presentation).

Your first plot: defining how to retrieve data

All data retrieval is performed by adapters: special objects whose job is to fetch the information required to render a plot. A major strength of LocusZoom.js is that it can connect several kinds of annotation from different places into a single view: the act of organizing data requests together is managed by an object called LocusZoom.DataSources.

Below is an example that defines how to retrieve the data for a “classic” LocusZoom plot, in which GWAS, LD, and recombination rate are overlaid on a scatter plot, with genes and gnomAD constraint information on another track below. In total, five REST API endpoints are used to create this plot: four standard datasets, and one user-provided summary statistics file.

const apiBase = 'https://portaldev.sph.umich.edu/api/v1/';
const data_sources = new LocusZoom.DataSources()
    .add('assoc', ['AssociationLZ', {url: apiBase + 'statistic/single/', params: { source: 45, id_field: 'variant' }}])
    .add('ld', ['LDServer', { url: 'https://portaldev.sph.umich.edu/ld/', source: '1000G', population: 'ALL', build: 'GRCh37' }])
    .add('recomb', ['RecombLZ', { url: apiBase + 'annotation/recomb/results/', build: 'GRCh37' }])
    .add('gene', ['GeneLZ', { url: apiBase + 'annotation/genes/', build: 'GRCh37' }])
    .add('constraint', ['GeneConstraintLZ', { url: 'https://gnomad.broadinstitute.org/api/', build: 'GRCh37' }]);

Of course, defining datasets is only half of the process; see the Getting Started Guide for how to define rendering instructions (layout) and combine these pieces together to create the LocusZoom plot.

Understanding the example

In the example above, a new data source is added via a line of code such as the following:

data_sources.add('assoc', ['AssociationLZ', {url: apiBase + 'statistic/single/', params: { source: 45, id_field: 'variant' }}]);

A lot is going on in this line!

The importance of genome build

You may notice that in the example above, many of the datasets specify build: 'GRCh37. For “standard” datasets that are widely used (LD, genes, recombination, and GWAS catalog), the UMich APIs will automatically try to fetch the most up-to-date list of genes and GWAS catalog entries for the specified genome build. We currently support build GRCh37 and GRCh38. Be sure to use the genome build that matches your dataset.

We periodically update our API server. If you think new information is missing, please let us know.

What should the data look like?

In theory, LocusZoom.js can display whatever data it is given: layouts allow any individual layout to specify what fields should be used for the x and y axes.

In practice, it is much more convenient to use pre-existing layouts that solve a common problem well out of the box: the set of options needed to control point size, shape, color, and labels is rather verbose, and highly custom behaviors entail a degree of complexity that is not always beginner friendly. For basic LocusZoom.js visualizations, our default layouts assume that you use the field names and format conventions defined in the UM PortalDev API docs. This is the quickest way to get started.

Most users will only need to implement their own way of retrieving GWAS summary statistics; the other annotations are standard datasets and can be freely used from our public API. For complex plots (like annotations of new data), see our example gallery.

How data gets to the plot

If you are building a custom tool for exploring data, it is common to show the same data in several ways (eg, a LocusZoom plot next to a table of results). The user will have a better experience if the two widgets are synchronized to always show the same data, which raises a question: which widget is responsible for making the API request?

In LocusZoom.js, the user is allowed to change the information shown via mouse interaction (drag or zoom to change region, change LD calculations by clicking a button… etc). This means that LocusZoom must always be able to ask for the data it needs, and initiate a new request to the repository if the required data is not available locally: a pull approach. This contrasts with static plotting libraries like matplotlib or excel that render whatever data they are given initially (a push approach).

The act of contacting an external data repository, and fetching the information needed, is coordinated by Adapters. It is possible to share data with other widgets on the page via event callbacks, so that those widgets retrieve the newest data whenever the plot is updated (see subscribeToData in the guide to interactivity for details).

Not every web page requires an API

LocusZoom.js is designed to work well with REST APIs, but you do not need to create an entire web server just to render a single interactive plot. As long as the inputs can be transformed into a recognized format, they should work with the plot.

Some examples of other data retrieval mechanisms used in the wild are:

Example: Loading data from static JSON files

One way to make a LocusZoom plot quickly is to load the data for your region in a static file, formatted as JSON objects to look like the payload from our standard REST API. The key concept below is that instead of a server, the URL points to the static file. This demonstration is subject to the limits described above, but it can be a way to get started.

data_sources = new LocusZoom.DataSources()
    .add("assoc", ["AssociationLZ", {url: "assoc_10_114550452-115067678.json", params: {source: null}}])
    .add("ld", ["LDLZ", { url: "ld_10_114550452-115067678.json" }])
    .add("gene", ["GeneLZ", { url: "genes_10_114550452-115067678.json" }])
    .add("recomb", ["RecombLZ", { url: "recomb_10_114550452-115067678.json" }])
    .add("constraint", ["GeneConstraintLZ", {  url: "constraint_10_114550452-115067678.json" }]);

Mix and match

Each data adapter in the chain is largely independent, and it is entirely normal to mix data from several sources: for example, GWAS data from a tabix file alongside genes data from the UMich API server.

If a single data layer needs to combine two kinds of data (eg association and LD), you will achieve the best results if the sources have some common assumptions about data format. Adapters are highly modular, but because they do not enforce a specific contract of field names or payload structure, you are responsible for ensuring that the resulting data works with the assumptions of your layout.

What if my data doesn’t fit the expected format?

The built-in adapters are designed to work with a specific set of known REST APIs and fetch data over the web, but we provide mechanisms to customize every aspect of the data retrieval process, including how to construct the query sent to the server and how to modify the fields returned. See the guidance on “custom adapters” below.

In general, the more similar that your field names are to those used in premade layouts, the easier it will be to get started with common tasks. Certain features require additional assumptions about field format, and these sorts of differences may cause behavioral (instead of cosmetic) issues. For example:

If the only difference is field names, you can customize the layout to tell it where to find the required information. (see: guide to layouts and rendering for details) Transformation functions (like neglog10) can then be used to ensure that custom data is formatted in a way suitable for rendering and plotting.

Creating your own custom adapter

Re-using code via subclasses

Most custom sites will only need to change very small things to work with their data. For example, if your REST API uses the same payload format as the UM PortalDev API, but a different way of constructing queries, you can change just one function and define a new data adapter:

const AssociationLZ = LocusZoom.Adapters.get('AssociationLZ');
class CustomAssociation extends AssociationLZ {
    getURL(state, chain, fields) {
        // The inputs to the function can be used to influence what query is constructed. Eg, the current view region is stored in `plot.state`.
        const {chr, start, end} = state;
        // Fetch the region of interest from a hypothetical REST API that uses query parameters to define the region query, for a given study URL such as `data.example/gwas/<id>/?chr=_&start=_&end=_`
        return `${this.url}/${this.params.study_id}/?chr=${encodeURIComponent(chr)}&start=${encodeURIComponent(start)}&end${encodeURIComponent(end)}`
  }
}
// A custom adapter should be added to the registry before using it
LocusZoom.Adapters.add('CustomAssociation', CustomAssociation);

// From there, it can be used anywhere throughout LocusZoom, in the same way as any built-in adapter
data_sources.add('mystudy', ['CustomAssociation', {url: 'https://data.example/gwas', params: { study_id: 42 }}]);

In the above example, an HTTP GET request will be sent to the server every time that new data is requested. If further control is required (like sending a POST request with custom body), you may need to override additional methods such as fetchRequest. See below for more information, then consult the detailed developer documentation for details.

Common types of data retrieval that are most often customized:

What happens during a data request?

The adapter performs many functions related to data retrieval: constructing the query, caching to avoid unnecessary network traffic, and parsing the data into a transformed representation suitable for use in rendering.

Methods are provided to override all or part of the process, called in roughly the order below:

getData(state, fields, outnames, transformations)
    getRequest(state, chain, fields)
        getCacheKey(state, chain, fields)
        fetchRequest(state, chain, fields)
            getURL(state, chain, fields)
    parseResponse(resp, chain, fields, outnames, transformations)
        normalizeResponse(data)
        annotateData(data, chain)
        extractFields(data, fields, outnames, trans)
        combineChainBody(data, chain, fields, outnames)

The parameters passed to getData are as follows:

Step 1: Fetching data from a remote server

The first step of the process is to retrieve the data from an external location. getRequest is responsible for deciding whether the query can be satisfied by a previously cached request, and if not, sending the response to the server. At the conclusion of this step, we typically have a large unparsed string: eg REST APIs generally return JSON-formatted text, and tabix sources return lines of text for records in the region of interest.

Most custom data sources will focus on customizing two things:

Step 2: Formatting and parsing the data

The parseResponse sequence handles the job of parsing the data. It can be used to convert many different API formats into a single standard form. There are four steps to the process:

Working with the data “chain”

Each data layer is able to request data from multiple different sources. Internally, this process is referred to as the “chain” of linked data requested. LocusZoom.js assumes that every data layer is independent and decoupled: it follows that each data layer has its own chain of requests and its own parsing process.

This chain defines how to share information between different adapters. It contains of two key pieces:

Only chain.body is sent to the data layer. All other parts of the chain are discarded at the end of the data retrieval process.

See also