AR Cloud Guide

The challenge of AR multiplayer and persistence is to keep a consistent coordinate system across devices and sessions, and this is made possible thanks to the AR Cloud.

This guide first introduces the basic concepts behind coordinate systems and relocalization, then covers challenges and solutions, then details 6D.ai's implementation of the AR Cloud.

Base Concepts

What is a coordinate system?

In regular 3D games and applications, the world is an empty three-dimensional canvas where artists carefully position objects and characters with 3D coordinates (x, y, z). Those coordinates are all relative to a point called the origin, at coordinates (0, 0, 0), along three axes that are perpendicular to each other.

In such games, the position of the origin is arbitrary and mostly irrelevant, however it comes with a meaningful orientation (e.g. X is right; Y is up; etc.), and a scale (e.g. one unit of coordinates is one meter). The combination of the origin's position, orientation, and scale makes a coordinate system.

Coordinate Systems in AR

In mobile AR frameworks such as ARKit and ARCore, while the scale is always metric, the position and orientation of the AR origin is arbitrarily set to the device's position and orientation in the world at the beginning of the app's AR session.

Since the device's position is different for every session, the AR coordinate system itself is always different; in other words, the AR coordinates of real world locations or objects is different on every device and every time the app is launched.

That lack of consistency makes it difficult to pin a virtual object to the real world for multiple players or across sessions. If only they could all share the same coordinate system, objects could be persisted and exchanged across devices using no more than their pose (3 floats for the XYZ position, and 4 floats for the orientation quaternion).

That is a key issue that the 6D.ai SDK solves with the AR Cloud.

What is Tracking?

Mobile AR frameworks like ARKit and ARCore provide a functionality called Tracking, which calculates the frame-to-frame pose (position and orientation) of the device in the coordinate system of its session.

Tracking works by fusing computer vision and sensor data. It is very efficient and provides a pose for every frame (60 times a second), but comes with some inaccuracy accumulating over time called drift. As the device pose drifts, virtual objects also drift from their original real-world position.

Tracking drift is particularly visible when closing a loop: meshed objects tend to overlap or get duplicated with a slight offset.

Some of the tracking drift is unavoidable due to limits in the precision and accuracy of the sensors; but it also comes from situations where tracking is more difficult; e.g. fast or jerky motion, dark or low-detail environments.

What is Relocalization?

The 6D.ai SDK provides a feature called Relocalization which, unlike tracking, uses a pre-existing scan of the world to find the absolute pose of the device in a shared coordinate system.

Relocalization is too expensive to perform at 60 frames per second, so it is only performed once. Upon successful relocalization, the conversion from session to shared coordinate system is calculated, allowing tracking to efficiently output converted poses for the shared coordinate system.

Relocalization requires some location-specific data called map, which can be thought of as an advanced 3D model, made of geometry and point clouds, with additional data to triangulate the position of the camera.

Maps are automatically generated in the background by the 6D.ai SDK, and can be saved to the AR Cloud for later reuse.

When sessions and devices successfully relocalize against the same map, they can exchange and load persisted virtual objects, by positioning them with the same shared coordinate system, ensuring they appear at the same real world location.

What is the AR Cloud?

The 6D.ai AR Cloud is a huge shared library of maps, accessible on connected devices through the 6D.ai SDK. The AR Cloud receives maps from user devices and redistributes them as efficiently as possible.

Besides map stewardship, the AR Cloud has two important roles: associate maps to a Location ID, and stitch together maps sharing a same location ID, a process called Mesh Fusion.

The unique location ID is an identifier for a physical location. The AR Cloud is designed so developers can assume that multiple devices and sessions relocalizing with the same location ID also share the same map and the same coordinate system, allowing them to interact together through multiplayer and persistence.

Challenges & Solutions

Getting the Right Map

Unlike some mobile AR frameworks like ARKit and ARCode, which expose map data as an opaque byte buffer that developers have to handle, often deferring the burden to users, the 6D.ai SDK chooses to take on most of that work, leaving developers to choose when to load (SixDegreesSDK_LoadFromARCloud()) and when to save (SixDegreesSDK_SaveToARCloud()).

Typically, apps should load at the very start of the session or, if there is no data available, start building from scratch. In either case, apps should save towards the end of the session, if predictable, or at regular intervals in the case of multiplayer where others players may join at any moment and will benefit from the map contributions of the present session.

It wouldn't be practical to send the entire map of the world to every device, or try and re-stitch the entire world every time a new bit of map is saved. For this reason, maps are kept in "buckets" indexed by a Location ID.

The AR Cloud is responsible for choosing a location ID from metadata provided by 6D.ai SDK requests, then picking the right bucket to save the map to or send it from. The location ID is available through the SixDegreesSDK_GetLocationID() API call, which returns an empty string until either saving or loading succeeds.

By default, the location ID is an 8-character geohash of the device's GPS position. At the Equator, it is a rectangle of 38x20 meters, a bit less than half of a football field. GPS accuracy being 5 meters at best, outside in a field in clear weather, in many situations the location ID may be inaccurate.

Indoors, GPS accuracy is worse, and an anonymized 6-character hash of the WiFi's SSID (name) is used instead, if available. The table below sums up location ID selection:

No WiFi WiFi SSID available
No GPS No save/load possible WiFi SSID Hash (6 hex chars)
GPS location available GPS geohash (8 chars) WiFi SSID Hash (6 hex chars)

This implementation is temporary and will be improved over time to address the following limitations:

  • Two devices at the same location but on different WiFi networks, or one on WiFi and the other on cellular network, will not have the same location ID.
  • Another caveat is that two devices on two WiFi networks of the same name in entirely different locations will have the same location ID.

Mesh Fusion

In earlier versions, the AR Cloud would just override its own maps with the ones received from client apps, losing a lot of scanned data in the process.

With SDK v0.21.0, saving was made additive by upgrading the AR Cloud to run Mesh Fusion. Conceptually, Mesh Fusion ensures that AR Cloud maps are always growing and being refined by new data. Mesh Fusion does three things:

  • Realign inbound maps to remove most of the distortion incurred by tracking drift;
  • Fuse inbound maps with the location's existing maps;
  • Slice fused maps into regularly sized chunks for mesh loading.

Realigning and fusing saved maps may change their coordinate systems. Maps in the same location that do not connect live in separate clusters, until they grow enough to overlap. They are then merged and the smaller map is rotated and translated to the coordinate system of the bigger one.

There are important caveats to this behavior, from a developer standpoint:

  1. Coordinates systems may be different for players sharing the same location ID: for multiplayer, ensure that all players relocalize around the same spot, so they're part of the same cluster.
  2. Coordinate systems may change as Mesh Fusion realigns and straightens maps: for persistence, ensure that a thorough scan of the area precedes content placement. Sessions should always start by relocalizing and end by saving to make the map more robust.

As a rule, the more map data gets saved to the AR Cloud, the more reliable relocalization is for that location, ensuring consistent coordinate systems across devices and sessions.

AR Cloud Implementation Details

Map Realignment

On the device, maps are sliced into overlapping spheres of 5 meter radius. The 6D.ai tech assumes that tracking is robust enough to ensure the internal consistency of geometry within those spheres.

A scan of an L-shaped wall will show distortion and discontinuities as a consequence of tracking drift.

On the AR Cloud, Mesh Fusion uses the same algorithm as on-device relocalization to realign the overlapping spheres together, correcting for tracking drift.

Mesh Fusion finds the best possible overlap between the spheres to compensate for the drift.

It also includes the spheres making up the location's existing map, realigning and wiggling them until they all fit and overlap optimally, like a jigsaw puzzle.

The newly saved map can be moved and rotated to fit the existing map on the AR Cloud. It also helps better align the existing spheres of the existing map.

Those connected spheres form a cluster, possibly multiple if the location was scanned in disjointed parts. The clusters are sliced into larger spheres (10 meter radius), corresponding to the size of the mesh being loaded upon successful relocalization.

The larger spheres are sized to hold a desirable mesh area to load after relocalization.

Timing

Mesh Fusion takes between a few seconds for a new map in an empty location, to about a minute in a large space. In large environments that were scanned repeatedly, the process can be significantly longer.

Although the output of Mesh Fusion is always the best map to relocalize against, there are situations where it makes sense not to wait for the end of it and relocalize against unprocessed map coming from another device.

When loading from the AR Cloud while Mesh Fusion is still running, the AR Cloud packages any new map data being processed together with the previous output of Mesh Fusion, if any. This supports the common use case of a player loading immediately after another player saved.

In that case, the coordinate systems of both players will be consistent for multiplayer, but may be unsuitable for persistence, as the unprocessed data used for relocalization is about to be realigned and integrated into the location's larger map.

In this example, the 6D.ai SDK is being used for the first time ever at this location. Player 2 tries to load from the AR Cloud 3 times and gets 3 different results: no map data is available before Player 1 saves for the first time; then Player 1's map is served unprocessed until Mesh Fusion completes.

This issue becomes less visible over time, as repeated scans and saves cement the quality of the location's map.

In a similar example, the 6D.ai SDK was used before at this location. Player 4 tries to load from the AR Cloud 3 times and gets 3 different results: the old map is returned before Player 3 saves; then the old map is packaged with Player 3's unprocessed map until Mesh Fusion completes; after which a new fused map is returned.