Using EVN Tagger
This document describes TopBraid EVN Tagger, a web-based tool
for linking controlled vocabulary terms to content. Content
resources are tagged, or annotated, through a visual user
interface that displays the context for both the content and the
vocabulary. The result is a set of metadata properties that
establish a named relationship between the content and
vocabulary, and vice-versa. For example, a resource
representing a news story can be linked to vocabulary topics of
Election and Weather through a property named "has subject,"
stating that a given news story has topics of Election and
Weather.
These relationships—tags—can be used to enrich search, browsing, and other applications by managing metadata on concept-to-vocabulary
relationships. The role of EVN Tagger is to make it easy to manage and create these relationships.
EVN Tagger can also be used to create mappings between two vocabularies. In order to do this, simply select one vocabulary as the Content Graph
and then another vocabulary as the Concept Vocabulary. You will then
be able to use tag properties to build a "crosswalk" between two vocabularies.
The output of EVN Tagger is a Tag Graph, which consists of a set of triples of the form:
{ <content> <tag-property> <vocabulary-term> }
This means that a content resource has been tagged by the property with a vocabulary term.
This data is stored in a graph as a set of tag triples that can be imported into the content or tag set, imported into other data, referenced
as linked data to establish connections between contents and terms, or used by applications.
Introduction
EVN Tagger lets you assign terms from a controlled vocabulary (typically, a taxonomy or a thesaurus) to content resources with a
specific relationship. This is referred to as tagging, or annotation, and a collection of such assignments is a tag set. For example, you might
need to tag a news story about a sports team being sold so it can be found by a search engine or appear in a list. The news story is represented by an RDF
resource that references the story. By creating a relationship named mainSubject between the story and the Business concept and a relationship
secondarySubject with the Sports concept, the story is tagged by these relationships and can be used to find other data. For example, a query for all
items tagged with a secondarySubject of Sports will find the story.
In EVN Tagger terms, Business and Sports are the controlled vocabulary terms used to tag the news story, and mainSubject and secondarySubject
are the relationships, or tag properties, of the news story. These relationships are saved in the Tag Graph, which consists of metadata about
the graphs used as content and vocabulary and triples representing the tags. For example, the triples in the tag graph from the example above would have the general form:
{ <story> <mainSubject> <Business> .
<story> <secondarySubject> <Sports> .
}
Tag triples are saved in a separate graph for flexibility. Tags can be maintained with EVN working copy change management.
Tags can be used in a variety of contexts, including federated SPARQL queries, using owl:imports to attach tags to the content or vocabulary
graph, or other contexts.
The choice of terms and properties to use when tagging is up to the administrator of the EVN Tagger installation.
The section Creating a tag set below describes how to identify the content to tag, the tag properties, and the controlled
vocabulary to use for a given tag set, and the section Adding and removing tags shows how to assign tags to content with
your choice of properties.
Creating a tag set
The first step is to have the EVN Administrator set up the content and property graph choices, as described in
Configuring content and property graphs, which describes not only the setup but the information that must be in these graphs for Tagger to display them properly.
Once this has been completed, select the Content Tag Sets tab at the EVN main screen and
click Create New Tag Set. (If it does not appear, contact your system administrator about getting
the necessary administrator privileges.) This invokes the Tag Set Wizard to configure your tag set. The first screen defines the following:
Label The name that will appear for this tag set in the list of tag sets on the EVN main screen.
Description A description of the tag set to help people understand what it's for.
Content Graph The list of content resources to tag. A single content graph might list a set of newspaper articles or a
collection of journal articles.
Tag Property Graph The list of relationships that you will choose from when tagging content resources with concept terms.
For example, when assigning the concept term Business to a news story, a given tag property graph might offer tag property choices such as
mainSubject and secondarySubject.
Concept Vocabulary The concepts used to tag the contents. This must be a SKOS vocabulary using
skos:broader to display the concept hierarchy. A news story vocabulary would include terms such as Business, Sports, and Weather,
while an academic journal vocabulary would typically have technical terms more specific to a particular academic field such as
medicine or mathematics.
Content Graph, Tag Property Graph and Concept Vocabulary
are all drop-down fields that offer you a specific choice to select from. Concept Vocabulary lists all vocabularies
displayed on the system, while the selection listed on the other two drop-downs are configured by your
EVN administrator as described in the section Configuring content and property graphs.
Note that there are four graphs defined in a Tag Set:
- Content graph: The data to be tagged.
- Tag Property Graph: A graph of properties used for tagging.
- Concept Vocabulary: The controlled vocabulary used for tagging.
- Tag Graph: Graph containing all of the {<content> <property> <vocabulary-term>} triples.
After you set these values and click the Next button, the second configuration screen lets you
customize the Tagger interface for this tag set with the following fields:
Default Tag Property If most tagging will be done with the same property, selecting it here can make the tagging go more quickly.
Tag Properties This lists all properties in the Tag Property Graph, as selected on the previous screen. Only properties with an
rdfs:range value of skos:Concept will appear in this list. Only checked properties and the default tag property will appear as choices to users tagging
with this content tag set.
Root Content Type The class/subclass structure of an RDFS or OWL model will appear here. One class can be chosen as the
root class that will appear in EVN Tagger. Expand the tree to find the content type that you want to serve as the root and select it.
Once you have finished configuring your new tag set, it will appear as a link on the Content Tag Sets tab.
Deleting a tag set
On a Tag Set's administration screen, users with Manager privileges for that tag set can delete on the Manage tab. will see Delete this Content
Tag Set at the bottom. After you select it, a dialog box will confirm that you really want to delete. This operation cannot be undone.
Managing tag sets
To work with a given tag set, click its name on the Content Tag Sets tab of the
main EVN screen. (If it is not displayed as a hypertext link, your user ID has not been granted access to this particular tag set.)
This displays the Content Tag Set management screen:
For managing your tag set as a resource, the tabs on the tag set's main screen offer the same features that the equivalent tabs offer when managing a vocabulary, as described in the section The EVN home screen of the EVN User Guide. For example, the User Roles tab lets you assign who has what levels of access to the tag set, and the General tab has a Keywords feature that lets you assign keywords that you can use to categorize content tags sets into groups.
The Manage tab has two additional features for working with tag sets that are not available on other EVN Manager screens:
Add Tag Property Graphs lets you add additional sets of properties to use when tagging. You will select from the choice made available by your system administrator as described at Configuring content and property graphs.
Select Tag Properties lets you modify, by checking or unchecking checkboxes, the list of which properties from the property graphs should be available on the Tagger drop-down property list.
Tagging your documents
From a content tag set's main screen, click Edit Production Copy to edit it. (You can also work with a temporary copy of the production tag set known as a Working Copy; see User Roles and Workflow
for more on the use of working copies.)
Whether you are editing a working copy or a production tag set, the interface is the same. The following screen shot shows EVN Tagger
with a Tag Set named "December article subjects":
The following steps are a typical workflow to tag content:
- Choose a class where the content instances are located. Choosing a class contextualizes the search. If you do not know the class of the content resource, choose the root class and search from there.
- Set the search criteria and select the "Search" button. In this example, no search criteria was selected, so all instances of the selected class are displayed. Adding search criteria will narrow the search to the specified property values.
- Select an instance in the Matching Instances window. The data for the selected instance will appear in the middle pane, which is named after the label of the chosen instance.
- Navigate to a term in the Concept Hierarchy. Any term in any level of the hierarchy can be chosen. When chosen, the term's data will appear in the named pane beneath Concept Hierarchy.
- Choose the tag property from the drop-down list.
- Click the green '+' button to add the tag. This adds the tag triple and displays the tag in the movable Current Tags widget. To remove a tag, click the red 'x' next to it.
In this example
the content, whose label is "One year on. Egyptians mark...", has just been tagged with the vocabulary term "demonstration" on the property "Subject".
The screen shot also shows a previous tag stating that the content item is related to the vocabulary term "Africa" by the property "Subject".
Informally, the tags are saying that the article "One year on. Egyptians mark..." has two subjects, 'Africa' and 'demonstration'.
The Tagger screen
The Tagger main screen has six panes. Note the color-coding of the pane tabs, with the content on the left and vocabulary on the right. The color coding is displayed in a key in the EVN Tagger header.
Content Types The upper-left of the Tagger screen shows the hierarchy of content being tagged, because content resources are often grouped according to their class of content. Selecting a node on the hierarchy lets you limit the content resources displayed on the Matching Instances list.
Search Instances of Content The search form under the Content Types pane lets you narrow down the content resources you want displayed in the Matching Instances pane based on search criteria that you enter on this form. If you don't enter any criteria and just click the Search button, all of the content resources associated with the selected node in the Content Types hierarchy will be displayed in the Matching Instances pane.
Matching Instances Use this list in the lower-left to identify the content resource to tag; selecting one displays metadata about it in the Content Properties pane where you can then tag it.
Instance Form The pane down the middle of the screen displays data from the item selected from Matching Instances in a property-value form.
Current Tags Widget Displays tags currently assigned to the content resource. The subject of a tag triple is the instance in the form behind the Current Tags. The property is selected from the drop-down list inside the Current Tags
widget and in the top line of an assigned tag. The object of a tag triple is the vocabulary instance shown in the Concept Instance pane or the hyperlink in an assigned tag.
Concept Hierarchy The pane in the upper-right shows the concepts available for tagging content. These are typically arranged in a hierarchical taxonomy and maintained by TopBraid Enterprise Vocabulary Net (EVN).
Concept Instance This pane in the lower-right displays the instance data from the Concept Hierarchy pane in a property-value form.
Tagger panes can be resized by dragging the separators between them. Double-click the dark strip in the middle of a separator to minimize the pane next to it. (After doing so, the separator will appear at the edge of the screen; click the dark strip to restore that pane.)
Adding and removing tags
When a particular content resource is selected, the Current Tags list on the Content Properties pane shows any tags currently assigned to it and each tag's relationship to that content resource. To add a new tag,
Select the content resource that you want to tag in the Matching Instances pane and the concept to tag it with in the Concept Hierarchy pane.
At the bottom of the Current Tags list, a drop-down list lets you select the tag property that describes the relationship you want to define between the content resource and the concept that you have selected. If the currently displayed property is what you need, you can skip to the next step.
Click the "Add selected concept as tag" button to add a new tag associating the concept with the content resource using the selected tag property.
In the screen shot above, we can see that the content resource with a title beginning "One year on" has just been tagged with a Subject value of "demonstration."
To remove a tag, click the "Delete this tag" button on the tag.
User roles and workflow
A content tag set's creator has Manager privileges, and may assign Manager, Editor, or View privileges to other users for that particular tag set. The capabilities of these roles, and the steps for assigning them, are the same in Tagger as they are in the vocabulary editor; see Capabilities and assignment of user roles: Viewer, Editor, and Manager in the EVN documentation for further details.
Another EVN feature that is available in Tagger is the use of Working Copies. Instead of editing the Production Content Tag Set directly, you can create a separate temporary copy known as a Working Copy. (You can actually create as many working copies as you like.) Managers of a given Working Copy can assign separate Manager, Editor, and Viewer roles to that working copy's users. See Vocabulary change management: working with working copies for further information.
Exporting and importing tag set data
On the Export tab of the administrative screens for production and working copies of your Content Tag Set, the Turtle, N-Triple, and RDF/XML choices let you export the data stored in your tag set using one of these formats. See Exporting your vocabulary as RDF in the EVN documentation for further background on how to save this data in a disk file.
On the Import tab of a Content Tag Set's production copy or working copy, the Import RDF File link lets you import RDF from a disk file. It leads you to a screen that prompts you for the name, location, and format (for example, Turtle or RDF/XML) of the file storing the triples to import into your tag set graph.
Configuring content and property graphs
This section describes how an EVN administrator adds choices to the Content Graph and Tag Property Graph lists that appear when a Tagger user creates a new tag set. It also provides advice on modeling of the graphs that makes the tagging of content resources easier.
To configure the graph choices, first pick Server Administration from the main EVN page. On the "TopBraid Enterprise Vocabulary Net — Server Administration" page that appears, select EVN Configuration Parameters. This leads to the configuration screen:
The configuration screen has three sections:
EVN parameters is the section where an administrator stores information about back-end storage for EVN. These include the EVN Tagger license number provided when you installed EVN Tagger. See Configuring a relational database manager to store your EVN data for details on the remaining parameters on this section.
Tagger Content Graphs lists graphs available to display as content graphs in Tagger. Check the ones that you want to appear on the Content Graph drop-down list that Tagger displays when you create a new tag set.
Tagger Properties Graphs lists graphs available to display on the Tag Property Graph drop-down list when a Tagger user creates a new Content Tag Set. Check the ones that you want to appear there. Only properties with an rdfs:range of skos:Concept will appear in the list of properties in the wizard.
There are a few notes to keep in mind when setting up graphs for use by Tagger:
For the Content Types hierarchy to display properly, the root classes must include assertions that they are subclasses of owl:Thing. When a Tagger user selects a class from this hierarchy in the upper-left of the main Tagger screen, as shown in step 1 of Tagging Your Documents, the search panel shown in step 2 of that section will search instances of that class of content to determine which instance titles to display in the lower-left of the screen, as shown in step 3.
The Tagger interface displays titles of content resources in the content graph using the rdfs:label property, or any subproperty of rdfs:label. If the title (label) for the resources uses a property other than rdfs:label, such as dc:title, define rdfs:label to be a subproperty of dc:title. The title will then display properly in Tagger.
For Tagger Properties to display in the Current Tags widget of the Tagger interface, they need an rdfs:range of skos:Concept, because the process of tagging is assigning concepts stored using the SKOS standard to content resources.
Ensuring these conditions may require slight customization of the graphs that you're using; for standard graphs obtained from third parties, this is usually achieved most easily by creating a new graph, importing the standard one, adding customizations to this new one, and then selecting that one on this configuration screen.
Using tag set data
When you use Tagger to tag content with a concept, it's stored in the tag set as a triple, which is a statement expressed using the W3C standard RDF. RDF uses URIs to represent resources such as content resources, tag properties, and the concepts.
For example, if the URI associated with the news story "'Gangnam Style' becomes most watched YouTube video ever" is http://en.wikinews.org/w/index.php?&oldid=1711859, and you use Tagger to tag it as having a Dublin Core subject of "dance" from the IPTC set of news codes, the triple created by Tagger is:
{ <http://en.wikinews.org/w/index.php?&oldid=1711859>
<http://purl.org/dc/elements/1.1/subject>
<http://cv.iptc.org/newscodes/subjectcode/01006000> }
Having the data stored using this standard lets you use it in a variety of applications such as TopBraid EVN and other applications that support the RDF standard. To access a given tag set labeled "my tag set" for use in applications, the URI identifying the tag set itself will be urn:x-evn-master:my_tag_set.
The triples can also be exported in the Turtle, N-Triple, and RDF/XML serializations of the RDF data model using those choices on the Export tab of the Content Tag Set administrative screen. See Exporting and importing tag set data for more information.