The first term tries to maximize the probability of occurrence for actual words that lie in the context window, i.e. they co-occur. While the second term, tries to iterate over some random words j that don’t lie in the window and minimize their probability of co-occurrence.