This document defines a set of JavaScript APIs that allow local media, including audio and video, to be requested from a platform.

This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation. The API is based on preliminary work done in the WHATWG.

Introduction

Access to multimedia streams (video, audio, or both) from local devices (video cameras, microphones, Web cams) can have a number of uses, such as real-time communication, recording, and surveillance.

This document defines the APIs used to get access to local devices that can generate multimedia stream data. This document also defines the MediaStream API by which JavaScript is able to manipulate the stream data or otherwise process it.

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.

Terminology

HTML Terms:

The EventHandler interface represents a callback used for event handlers as defined in [[!HTML5]].

The concepts queue a task and fires a simple event are defined in [[!HTML5]].

The terms event handlers and event handler event types are defined in [[!HTML5]].

source

A source is the "thing" providing the source of a media stream track. The source is the broadcaster of the media itself. A source can be a physical webcam, microphone, local video or audio file from the user's hard drive, network resource, or static image.

Some sources have an identifier which must be unique to the application (un-guessable by another application) and persistent between application sessions (e.g., the identifier for a given source device/application must stay the same, but not be guessable by another application). Sources that must have an identifier are camera and microphone sources; local file sources are not required to have an identifier. Source identifiers let the application save, identify the availability of, and directly request specific sources.

Other than the identifier, other bits of source identity are never directly available to the application until the user agent connects a source to a track. Once a source has been "released" to the application (either via a permissions UI, pre-configured allow-list, or some other release mechanism) the application will be able discover additional source-specific capabilities.

Sources do not have constraints -- tracks have constraints. When a source is connected to a track, it must conform to the constraints present on that track (or set of tracks).

Sources will be released (un-attached) from a track when the track is ended for any reason.

On the MediaStreamTrack object, sources are represented by a sourceType attribute. The behavior of APIs associated with the source's capabilities and state change depending on the source type.

Sources have capabilities and state. The capabilities and state are "owned" by the source and are common to any (multiple) tracks that happen to be using the same source (e.g., if two different tracks objects bound to the same source ask for the same capability or state information, they will get back the same answer).

State (Source State)

State refers to the immediate, current value of the source's (optionally constrained) capabilities. State is always read-only.

A source's state can change dynamically over time due to environmental conditions, sink configurations, or constraint changes. A source's state must always conform to the current set of mandatory constraints that all of the tracks it is bound to have defined, and should do its best to conform to the set of optional constraints specified.

A source's state is directly exposed to audio and video track objects through individual read-only attributes. These attributes share the same name as their corresponding capabilities and constraints.

Events are available that signal to the application that source state has changed.

A conforming user-agent must support all the state names defined in this spec.

Capabilities

Source capabilities are the intrinsic "features" of a source object. For each source state, there is a corresponding capability that describes whether it is supported by the source and if so, what the range of supported values are. Capabilities are expressed as either a series of states (for enumerated-type capabilities) or as a min/max range.

The values of the supported capabilities must be normalized to the ranges and enumerated types defined in this specification.

Capabilities return the same underlying per-source capabilities, regardless of any user-supplied constraints present on the source (capabilities are independent of constraints).

Source capabilities are effectively constant. Applications should be able to depend on a specific source having the same capabilities for any session.

Constraints

Constraints are an optional feature for restricting the range of allowed variability on a source. Without provided constraints, implementations are free to select a source's state from the full range of its supported capabilities, and to adjust that state at any time for any reason.

Constraints may be optional or mandatory. Optional constraints are represented by an ordered list, mandatory constraints are an unordered set. The order of the optional constraints is from most important (at the head of the list) to least important (at the tail of the list).

Constraints are stored on the track object, not the source. Each track can be optionally initialized with constraints, or constraints can be added afterward through the constraint APIs defined in this spec.

Applying track level constraints to a source is conditional based on the type of source. For example, read-only sources will ignore any specified constraints on the track.

It is possible for two tracks that share a unique source to apply contradictory constraints. Under such contradictions, the implementation will mute both tracks and notify them that they are over-constrained.

Events are available that allow the application to know when constraints cannot be met by the user agent. These typically occur when the application applies constraints beyond the capability of a source, contradictory constraints, or in some cases when a source cannot sustain itself in over-constrained scenarios (overheating, etc.).

Constraints that are intended for video sources will be ignored by audio sources and vice-versa. Similarly, constraints that are not recognized will be preserved in the constraint structure, but ignored by the UA. This will allow future constraints to be defined in a backward compatible manner.

A correspondingly-named constraint exists for each corresponding source state name and capability name. In general, user agents will have more flexibility to optimize the media streaming experience the fewer constraints are applied.

MediaStreamTrack

A MediaStreamTrack object represents a media source in the user agent. Several MediaStreamTrack objects can represent the same media source, e.g., when the user chooses the same camera in the UI shown by two consecutive calls to getUserMedia() .

Note that a web application can revoke all given permissions with MediaStreamTrack.stop().

Interface Definition

readonly attribute DOMString kind

The MediaStreamTrack.kind attribute MUST return the string "audio" if the object represents an audio track or "video" if object represents a video track.

readonly attribute DOMString id

Unless a MediaStreamTrack object is created as a part a of special purpose algorithm that specifies how the track id must be initialized, the user agent MUST generate a globally unique identifier string and initialize the object's id attribute to that string.

An example of an algorithm that specifies how the track id must be initialized is the algorithm to represent an incoming network component with a MediaStreamTrack object. [[!WEBRTC10]]

MediaStreamTrack.id attribute MUST return the value to which it was initialized when the object was created.

readonly attribute DOMString label

User agents MAY label audio and video sources (e.g., "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute MUST return the label of the object's corresponding track, if any. If the corresponding track has or had no label, the attribute MUST instead return the empty string.

Thus the kind and label attributes do not change value, even if the MediaStreamTrack object is disassociated from its corresponding track.

attribute boolean enabled

The MediaStreamTrack.enabled attribute, on getting, MUST return the last value to which it was set. On setting, it MUST be set to the new value, and then, if the MediaStreamTrack object is still associated with a track, MUST enable the track if the new value is true, and disable it otherwise.

Thus, after a MediaStreamTrack is disassociated from its track, its enabled attribute still changes value when set; it just doesn't do anything with that new value.

readonly attribute boolean muted

The MediaStreamTrack.muted attribute MUST return true if the track is muted, and false otherwise.

attribute EventHandler onmute
This event handler, of type mute, MUST be supported by all objects implementing the MediaStreamTrack interface.
attribute EventHandler onunmute
This event handler, of type unmute, MUST be supported by all objects implementing the MediaStreamTrack interface.
readonly attribute boolean _readonly
If the track (audio or video) is backed by a read-only source such as a file, or the track source is a local microphone or camera, but is shared so that constraints applied to the track cannot modify the source's state, the readonly attribute MUST return the value true. Otherwise, it must return the value false.
readonly attribute boolean remote
If the track is sourced by an RTCPeerConnection, the remote attribute MUST return the value true. Otherwise, it must return the value false.
readonly attribute MediaStreamTrackState readyState

The readyState attribute represents the state of the track. It MUST return the value to which the user agent last set it.

attribute EventHandler onstarted
This event handler, of type started, MUST be supported by all objects implementing the MediaStreamTrack interface.
attribute EventHandler onended
This event handler, of type ended, MUST be supported by all objects implementing the MediaStreamTrack interface.
static void getSources(SourceInfoCallback resultCallback)

The static getSources() method collects authorized information for all available sources.

MediaTrackConstraints? constraints()
Returns the complete constraints object associated with the track. If no mandatory constraints have been defined, the mandatory field will not be present (it will be undefined). If no optional constraints have been defined, the optional field will not be present (it will be undefined). If neither optional, nor mandatory constraints have been created, the value null is returned.
MediaSourceStates states()
Returns an object containing all the state variables associated with the allowed constraints.
(AllVideoCapabilities or AllAudioCapabilities) capabilities ()

Returns a dictionary with all of the capabilities for the track type. If the track type is VideoStreamTrack, the AllVideoCapabilities dictionary is returned. If the track type is AudioStreamTrack, the AllAudioCapabilities dictionary is returned.

Given that implementations of various hardware may not exactly map to the same range, an implementation should make a reasonable attempt to translate and scale the hardware's setting onto the mapping provided by this specification. If this is not possible due to the user agent's inability to retrieve a given capability from a source, then for CapabilityRange-typed capabilities, the min and max fields will not be present on the returned dictionary, and the supported field will be false. For CapabilityList-typed capabilities, a suitable "notavailable" value will be the sole capability in the list.

An example of the user agent providing an alternative mapping: if a source supports a hypothetical fluxCapacitance state whose type is a CapabilityRange, and the state is defined in this specification to be the range from -10 (min) to 10 (max), but the source's (hardware setting) for fluxCapacitance only supports values of "off" "medium" and "full", then the user agent should map the range value of -10 to "off", 10 should map to "full", and 0 should map to "medium". Constraints imposing a strict value of 3 will cause the user agent to attempt to set the value of "medium" on the hardware, and return a fluxCapacitance state of 0, the closest supported setting. No error event is raised in this scenario.

CapabilityList objects should order their enumerated values from minimum to maximum where it makes sense, or in the order defined by the enumerated type where applicable.

See the AllVideoCapabilities and AllAudioCapabilities dictionaries for details on the expected types for the various supported state names.

void applyConstraints ()
MediaTrackConstraints constraints
A new constraint structure to apply to this track.

This API will replace all existing constraints with the provided constraints (if existing constraints exist). Otherwise, it will apply the newly provided constraints to the track.

attribute EventHandler onoverconstrained
This event handler, of type overconstrained, MUST be supported by all objects implementing the MediaStreamTrack interface.
MediaStreamTrack clone()

Clones the given MediaStreamTrack.

When the clone() method is invoked, the user agent MUST run the following steps:

  1. Let trackClone be a newly constructed MediaStreamTrack object.

  2. Initialize trackClone's id attribute to a newly generated value.

  3. Let trackClone inherit this track's underlying source, kind, label and enabled attributes.

  4. Return trackClone.

void stop ()

When a MediaStreamTrack object's stop() method is invoked, the user agent MUST run following steps:

  1. Let track be the current MediaStreamTrack object.

  2. If track has no source attached (sourceType is "none") or if the source is provided by an RTCPeerConnection, then abort these steps.

  3. Set track's readyState attribute to ended.

  4. Permanently stop the generation of data for track's source. If the data is being generated from a live source (e.g., a microphone or camera), then the user agent SHOULD remove any active "on-air" indicator for that source. If the data is being generated from a prerecorded source (e.g. a video file), any remaining content in the file is ignored.

    This will effectively end all other MediaStreamTrack objects sharing the same source as track.

The task source for the tasks queued for the stop() method is the DOM manipulation task source.

new
The track type is new and has not been initialized (connected to a source of any kind). This state implies that the track's label will be the empty string.
live

The track is active (the track's underlying media source is making a best-effort attempt to provide data in real time).

The output of a track in the live state can be switched on and off with the enabled attribute.

ended

The track has ended (the track's underlying media source is no longer providing data, and will never provide more data for this track). Once a track enters this state, it never exits it.

For example, a video track in a MediaStream ends if the user unplugs the USB web camera that acts as the track's media source.

Track Source Types

none
This track has no source. This is the case when the track is in the "new" or "ended" readyState.
camera
A valid source type only for VideoStreamTracks. The source is a local video-producing camera source.
microphone
A valid source type only for AudioStreamTracks. The source is a local audio-producing microphone source.

Source Info

sequence<SourceInfo> sourceInfoList
A sequence of SourceInfo objects representing the result of a call to MediaStreamTrack.getSources().
DOMString sourceId
The unique id for this source as described in the MediaSourceStates dictionary.
DOMString kind
MUST be either "audio" or "video".
DOMString label
If the application is authorized to get info from this source, the label attribute will be filled in with exactly the same value as would have been returned from a call to getUserMedia() with a constraint specifying this source's sourceId.

Video Facing Mode Enum

user
The source is facing toward the user (a self-view camera).
environment
The source is facing away from the user (viewing the environment).
left
The source is facing to the left of the user.
right
The source is facing to the right of the user.

Isolated Media Streams

When either the "noaccess" or "peerIdentity" constraints is applied to a MediaStreamTrack, the track shall be isolated so that its content is not accessible to the content JS. An isolated media stream may be used for two purposes:

When the noaccess=true constraint applies to a track, that track may be added to any PeerConnection.

Open Issue: The editors worry that the above paragraph is just wrong. If the track can be added to a PeerConnection that is connect to another PeerConenction in the same application, the application could get access to the data. We sugest this should be changed from "may be added" to "may not be added". This will allow noaccess=true to be used for things like hair check dialogs.

When the peerIdentity=foo constraint applies to a track, then that track may be added only to PeerConnections with compatible peer identities as described in the WebRTC document.

Both the noaccess and peerIdentity constraints must be mandatory. Any use of them in the optional block must trigger an error.