Several types of named entities (specifically, organizations and companies) get tagged as belonging to _typeGroup : "socialTag" rather than _typeGroup : "entities". The structure of "socialTag" group presupposes linking its members to URLs rather than giving exact position in text:
_typeGroup : "socialTag" id : "http://d.opencalais.com/..." socialTag : "http://d.opencalais.com/..." forenduserdisplay : "true" name : "Goodwill Industries" importance : "1" originalValue : "Goodwill Industries"
This format of output (with no offsets specified) doesn't allow to map the extracted entity to the text.
Do you happen to know if there is a way to get offsets for such entities?
Answer by Tomasz Adamusiak · Nov 19, 2016 at 06:41 PM
There's no offset because social, topic, and industry tags describe what the input document is about as a whole rather than identifying specific entities in text.
From the API user guide:
A Social Tag is an association of the submitted text to related Wikipedia categories, or articles. Social tags attempt to emulate how a person would tag a specific piece of content.
For example, if you submit a story about President Barack Obama and a piece of legislation, at least one reasonable tag would be “U.S. Legislation.” A story about the relative merits of BMWs, Ferraris, and Porsches would probably be tagged with “sports cars,” “luxury makes,” “auto racing,” and “motorsport.” The story about the Apple Watch Launch generated the following social tags: IOS, Smartwatches, Wearable Computers, Human-computer interaction, Ubiquitous computing, Consumer electronics, Apple Inc., Wearable Technology, and Apple system on a chip.
The SocialTag function does not identify individual items within the text, but rather attempts to provide common sense tags for the piece of content as a whole. Social tags are derived from the Wikipedia folksonomy.
Answer by tetiana.myronivska · Nov 24, 2016 at 02:00 AM
@Tomasz Adamusiak, here is an example. Fo the input sentence
"We want you to know why your support of Goodwill is so important . " we have "Goodwill" detected by OpenCalais in the following way:
http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1 _typeGroup : "socialTag" id : "http://d.opencalais.com/dochash-1/bc75a003-4d8d-3215-b5ed-881cb2dfac96/SocialTag/1" socialTag : "http://d.opencalais.com/genericHasher-1/501ba8d5-c75c-3e13-bdfd-4a76ac225f73" forenduserdisplay : "true" name : "Goodwill Industries" importance : "1" originalValue : "Goodwill Industries"
This is the only mention of "Goodwill" I get in the JSON response.