该应用程序可以决定在发现每个候选者之后（Trickle ICE）尽快将其转移到远程方，或者决定等待 ICE 收集阶段完成，然后立即发送所有候选者。

（1）创建了 PeerConnection 对象； （2）将频道标记为已开始； （3）激活 doCall() JavaScript 函数。

该通道将用于以对等方式与 Joiner 交换文本数据：

返回了对 create 或 join 请求的连接响应：

Softmax is that the gradient is dependent on the summation across all classes

The first term tries to maximize the probability of occurrence for actual words that lie in the context window, i.e. they co-occur. While the second term, tries to iterate over some random words j that don’t lie in the window and minimize their probability of co-occurrence.

we don’t try to predict the probability for juice to be a nearby word i.e P(juice|orange), we try to predict whether (orange, juice) are nearby words or not by calculating P(1|<orange, juice>).

probability that our training sample words are neighbors or not.

allows us to only modify a small percentage of the weights, rather than all of them for each training sample.

we try to reduce the number of weights updated for each training sample.

for each training sample, we are performing an expensive operation to calculate the probability for words whose weight might not even be updated or be updated so marginally that it is not worth the extra overhead

the calculation of the final probabilities using the softmax is quite an expensive operation

sparse updates

only the weights corresponding to the target word might get a significant update.

for less frequent words and decrease the probability for more frequent words.

equation, and the one which performed best was to raise the word counts to the 3/4 power:

randomly select just a small number of “negative” words (let’s say 5) to update the weights

Negative sampling addresses this by having each training sample only modify a small percentage of the weights

Subsampling frequent words to decrease the number of training examples.

= 3 million weights each!