This document defines a set of JavaScript APIs that allow local media, including audio and video, to be requested from a platform.
This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation. The API is based on preliminary work done in the WHATWG.
Access to multimedia streams (video, audio, or both) from local devices (video cameras, microphones, Web cams) can have a number of uses, such as real-time communication, recording, and surveillance.
This document defines the APIs used to get access to local devices that can generate multimedia stream data. This document also defines the MediaStream API by which JavaScript is able to manipulate the stream data or otherwise process it.
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.
The
EventHandler
interface represents a callback used for event
handlers as defined in [[!HTML5]].
The concepts queue a task and fires a simple event are defined in [[!HTML5]].
The terms event handlers and event handler event types are defined in [[!HTML5]].
A source is the "thing" providing the source of a media stream track. The source is the broadcaster of the media itself. A source can be a physical webcam, microphone, local video or audio file from the user's hard drive, network resource, or static image.
Some sources have an identifier which must be unique to the application (un-guessable by another application) and persistent between application sessions (e.g., the identifier for a given source device/application must stay the same, but not be guessable by another application). Sources that must have an identifier are camera and microphone sources; local file sources are not required to have an identifier. Source identifiers let the application save, identify the availability of, and directly request specific sources.
Other than the identifier, other bits of source identity are never directly available to the application until the user agent connects a source to a track. Once a source has been "released" to the application (either via a permissions UI, pre-configured allow-list, or some other release mechanism) the application will be able discover additional source-specific capabilities.
Sources do not have constraints -- tracks have constraints. When a source is connected to a track, it must conform to the constraints present on that track (or set of tracks).
Sources will be released (un-attached) from a track when the track is ended for any reason.
On the MediaStreamTrack
object,
sources are represented by a sourceType
attribute. The behavior of APIs associated with the source's
capabilities and state change depending on the source
type.
Sources have capabilities
and state
. The capabilities and state
are "owned" by the source and are common to any (multiple)
tracks that happen to be using the same source (e.g., if two
different tracks objects bound to the same source ask for
the same capability or state information, they will get back
the same answer).
State refers to the immediate, current value of the source's (optionally constrained) capabilities. State is always read-only.
A source's state can change dynamically over time due to environmental conditions, sink configurations, or constraint changes. A source's state must always conform to the current set of mandatory constraints that all of the tracks it is bound to have defined, and should do its best to conform to the set of optional constraints specified.
A source's state is directly exposed to audio and video track objects through individual read-only attributes. These attributes share the same name as their corresponding capabilities and constraints.
Events are available that signal to the application that source state has changed.
A conforming user-agent must support all the state names defined in this spec.
Source capabilities are the intrinsic "features" of a source object. For each source state, there is a corresponding capability that describes whether it is supported by the source and if so, what the range of supported values are. Capabilities are expressed as either a series of states (for enumerated-type capabilities) or as a min/max range.
The values of the supported capabilities must be normalized to the ranges and enumerated types defined in this specification.
Capabilities return the same underlying per-source capabilities, regardless of any user-supplied constraints present on the source (capabilities are independent of constraints).
Source capabilities are effectively constant. Applications should be able to depend on a specific source having the same capabilities for any session.
Constraints are an optional feature for restricting the range of allowed variability on a source. Without provided constraints, implementations are free to select a source's state from the full range of its supported capabilities, and to adjust that state at any time for any reason.
Constraints may be optional or mandatory. Optional constraints are represented by an ordered list, mandatory constraints are an unordered set. The order of the optional constraints is from most important (at the head of the list) to least important (at the tail of the list).
Constraints are stored on the track object, not the source. Each track can be optionally initialized with constraints, or constraints can be added afterward through the constraint APIs defined in this spec.
Applying track level constraints to a source is conditional based on the type of source. For example, read-only sources will ignore any specified constraints on the track.
It is possible for two tracks that share a unique source to apply contradictory constraints. Under such contradictions, the implementation will mute both tracks and notify them that they are over-constrained.
Events are available that allow the application to know when constraints cannot be met by the user agent. These typically occur when the application applies constraints beyond the capability of a source, contradictory constraints, or in some cases when a source cannot sustain itself in over-constrained scenarios (overheating, etc.).
Constraints that are intended for video sources will be ignored by audio sources and vice-versa. Similarly, constraints that are not recognized will be preserved in the constraint structure, but ignored by the UA. This will allow future constraints to be defined in a backward compatible manner.
A correspondingly-named constraint exists for each corresponding source state name and capability name. In general, user agents will have more flexibility to optimize the media streaming experience the fewer constraints are applied.
A MediaStreamTrack
object represents a media
source in the user agent. Several MediaStreamTrack
objects can represent the same media source, e.g., when the user chooses
the same camera in the UI shown by two consecutive calls to
getUserMedia()
.
Note that a web application can revoke all given permissions
with MediaStreamTrack.stop()
.
The MediaStreamTrack.kind
attribute MUST return the string "audio
" if the object
represents an audio track or "video
" if object represents
a video track.
Unless a MediaStreamTrack
object is created as
a part a of special purpose algorithm that specifies how the track id
must be initialized, the user agent MUST generate a globally unique
identifier string and initialize the object's id
attribute to that string.
An example of an algorithm that specifies how the track id must be
initialized is the algorithm to represent an incoming network
component with a MediaStreamTrack
object.
[[!WEBRTC10]]
MediaStreamTrack.id
attribute MUST return the value
to which it was initialized when the object was created.
User agents MAY label audio and video sources (e.g., "Internal
microphone" or "External USB Webcam"). The MediaStreamTrack.label
attribute MUST return the label of the object's corresponding track,
if any. If the corresponding track has or had no label, the attribute
MUST instead return the empty string.
Thus the kind
and label
attributes do not
change value, even if the MediaStreamTrack
object
is disassociated from its corresponding track.
The MediaStreamTrack.enabled
attribute, on getting, MUST return the last value to which it was
set. On setting, it MUST be set to the new value, and then, if the
MediaStreamTrack
object is still associated with
a track, MUST enable the track if the new value is true, and disable
it otherwise.
Thus, after a MediaStreamTrack
is
disassociated from its track, its enabled
attribute still
changes value when set; it just doesn't do anything with that new
value.
The MediaStreamTrack.muted
attribute MUST return true
if the track is muted, and false
otherwise.
mute
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.unmute
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.readonly
attribute MUST return the value true
.
Otherwise, it must return the value false
.
RTCPeerConnection
, the remote
attribute MUST return the value true
.
Otherwise, it must return the value false
.
The readyState
attribute represents the state of the track. It MUST return the value
to which the user agent last set it.
started
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.ended
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.The static
getSources()
method collects authorized
information for all available sources.
mandatory
field will not be present (it will be
undefined). If no optional constraints have been defined,
the optional
field will not be present (it will be
undefined). If neither optional, nor mandatory constraints have been
created, the value null
is returned.
Returns a dictionary with all of the capabilities for the
track type. If the track type is VideoStreamTrack
,
the AllVideoCapabilities
dictionary is returned. If the track type
is AudioStreamTrack
,
the AllAudioCapabilities
dictionary is returned.
Given that implementations of various hardware may not
exactly map to the same range, an
implementation should make a reasonable attempt to
translate and scale the hardware's setting onto the mapping
provided by this specification. If this is not possible due
to the user agent's inability to retrieve a given
capability from a source, then for CapabilityRange
-typed
capabilities, the min
and max
fields will not be present on the returned dictionary, and
the supported
field will
be false
. For CapabilityList
-typed
capabilities, a suitable "notavailable"
value
will be the sole capability in the list.
An example of the user agent providing an alternative mapping: if a source supports a hypothetical fluxCapacitance state whose type is a CapabilityRange, and the state is defined in this specification to be the range from -10 (min) to 10 (max), but the source's (hardware setting) for fluxCapacitance only supports values of "off" "medium" and "full", then the user agent should map the range value of -10 to "off", 10 should map to "full", and 0 should map to "medium". Constraints imposing a strict value of 3 will cause the user agent to attempt to set the value of "medium" on the hardware, and return a fluxCapacitance state of 0, the closest supported setting. No error event is raised in this scenario.
CapabilityList
objects should order their enumerated
values from minimum to maximum where it makes sense, or in
the order defined by the enumerated type where
applicable.
See the AllVideoCapabilities
and AllAudioCapabilities
dictionaries for details on the expected types for the various
supported state names.
This API will replace all existing constraints with the provided constraints (if existing constraints exist). Otherwise, it will apply the newly provided constraints to the track.
overconstrained
, MUST be supported by
all objects implementing the MediaStreamTrack
interface.Clones the given MediaStreamTrack
.
When the clone()
method
is invoked, the user agent MUST run the following steps:
Let trackClone be a newly constructed
MediaStreamTrack
object.
Initialize trackClone's id
attribute to a newly
generated value.
Let trackClone inherit this track's underlying
source,
kind
,
label
and
enabled
attributes.
Return trackClone.
When a MediaStreamTrack
object's stop()
method
is invoked, the user agent MUST run following steps:
Let track be the current
MediaStreamTrack
object.
If track has no source attached
(sourceType
is "none") or if the source is
provided by an RTCPeerConnection
, then abort these
steps.
Set track's readyState
attribute to ended
.
Permanently stop the generation of data for track's source. If the data is being generated from a live source (e.g., a microphone or camera), then the user agent SHOULD remove any active "on-air" indicator for that source. If the data is being generated from a prerecorded source (e.g. a video file), any remaining content in the file is ignored.
This will effectively
end all other
MediaStreamTrack
objects sharing the same
source as track.
The task source for the tasks
queued for the stop()
method is the DOM
manipulation task source.
The track is active (the track's underlying media source is making a best-effort attempt to provide data in real time).
The output of a track in the live
state can be
switched on and off with the enabled
attribute.
The track has ended (the track's underlying media source is no longer providing data, and will never provide more data for this track). Once a track enters this state, it never exits it.
For example, a video track in a
MediaStream
ends if the user unplugs the
USB web camera that acts as the track's media source.
"new"
or "ended"
readyState
.VideoStreamTrack
s. The source is a local
video-producing camera source.AudioStreamTrack
s. The source is a local
audio-producing microphone source.SourceInfo
objects representing
the result of a call to MediaStreamTrack.getSources()
.MediaSourceStates
dictionary.label
attribute will be filled in
with exactly the same value as would have been returned from
a call to getUserMedia()
with a
constraint specifying this
source's sourceId
.When either the "noaccess" or "peerIdentity" constraints is applied to a MediaStreamTrack, the track shall be isolated so that its content is not accessible to the content JS. An isolated media stream may be used for two purposes:
Displayed in an appropriate tag (e.g., a video or audio element). The video element MUST have a unique origin so that it is inaccessible to the content JS. This is the same security mechanism as is used with an ordinary audio or video element which has a src= property from a separate origin.
Used as the argument to addStream() for a PeerConnection, subject to the restrictions detailed in the WebRTC document.
When the noaccess=true constraint applies to a track, that track may be added to any PeerConnection.
Open Issue: The editors worry that the above paragraph is just wrong. If the track can be added to a PeerConnection that is connect to another PeerConenction in the same application, the application could get access to the data. We sugest this should be changed from "may be added" to "may not be added". This will allow noaccess=true to be used for things like hair check dialogs.
When the peerIdentity=foo constraint applies to a track, then that track may be added only to PeerConnections with compatible peer identities as described in the WebRTC document.
Both the noaccess and peerIdentity constraints must be mandatory. Any use of them in the optional block must trigger an error.