Stanford’s research paper, entitled “Deep Visual-Semantic Alignments for Generating Image Descriptions,” explains how specific details found in photographs and videos can be translated into written text.
“We present a model that generates free-form natural language descriptions of image regions,” the paper’s abstract states. “Our model leverages datasets of images and their sentence descriptions to learn about the inter-modal correspondences between text and visual data.” .... http://www.prisonplanet.com