- 1 Overview
- 2 Configuration - Essentials
- 2.1 Setting up the integration – Globally
- 2.2 Setting up the integration – In the UTA
- 2.3 Viewing Similar Content
- 2.4 Coming Soon (November 2019 upgrade)
Our partner, DAMVAD Analytics has designed a Similarity Identification integration. This integration can be used to identify plagiarism as well as find and evaluate similar applications. To use this integration you will need a token from DAMVAD Analytics. The integration utilizes the Doc2Vec algorithm (Open pdf on Doc2Vec) to find similarities.
Configuration - Essentials
Similarity Identification is only available for Universal Tracking Application Level 1 records. This integration compares the contents of custom fields of the type Text Multiple Lines within the same Universal Tracking Application. As part of the setup you will need to get a token from DAMVAD Analytics which can be acquired at similarityid.damvad.io.
Setting up the integration – Globally
- Contact DAMVAD Analytics to create an account pay the fee and obtain an activation code similarityid.damvad.io.
- In your instance go to Menu Icon > Global Settings > Integrations tab > Integration Key Management > Click the New Integration button (looks like a plus sign).
- For Key Type select Damvad.
- For DAMVAD Analytics token enter the 32 character token DAMVAD has provided you.
- Click Save.
Setting up the integration – In the UTA
- Go to the Universal Tracking Application you wish to use the integration with, example Menu Icon > Submission Manager.
- Click the Configuration Settings button in the action bar (looks like a gear).
- On the Connectivity tab in the Service Settings section toggle on Enable Similarity Identification by DAMVAD Analytics.
- You will need to define the content that will be sent to DAMVAD Analytics as a training set for analysis. The training set is comprised of all multiple line text fields from the types and statuses you choose in a single UTA. The training set content is used to “teach” the similarity identification engine about your data.
- Select the Similarity Threshold, for example if you chose 80, any applications that are 80% similar will be identified as similar.
- Select the roles you want to have access to this feature, example internal staff should be able to see similar applications but applicants should not.
- Select the training set types and statuses of applications you would like to include in the training set, example I only want to see similar applications that are of the type international grants and in the status of approved.
- Click Save.
- Click the Send Training Set Button which will now be enabled. This will send the training set you defined to DAMVAD Analytics. You will see a message Training set data will be sent, please check back later.
- Click OK to hide the message.
Note: Analyses of the data can take a few hours and you can only send one training set at a time. Once the analysis is complete the message on this page saying Analyzing training set data, please check back later will be replaced with a message saying Training set last sent [date]. All system administrators will also receive a notification via email when the training set is ready.
Once the above is complete you can do a full data refresh by updating the training set, note: each time you update the training set you will need to wait a few hours for the analyses of the new data set.
Viewing Similar Content
Once the DAMVAD Analytics Similarity Identification integration has been setup, and a training set of data has been sent and analyzed you can begin comparing the content of applications. By default applications with a similarity of 80% or higher will be identified but you can change this threshold when you send the training set.
- Go to the desired Universal Tracking Application example Menu icon > Submission Manager.
- Edit a Level 1, and click the tab called Compare Content in the left navigation.
- If there are other applications that have a similarity ranking of the threshold you set (example 80% or higher) you will see these applications listed ranked by most similar. Click the Compare Content button on the desired application to evaluate it against the current application.
You will now see the current applications multiple line text fields compared with the selected similar application’s multiple line text fields side by side for easy comparison.
Lines that are more than 80% similar will appear hyperlinked. Click the desired hyperlinked line to see that lines number and similarity score. It will also highlight the corresponding comparison section.
Click the Compare Section button for a multiple line text field to see multiple line numbers with their similarity scores.
Coming Soon (November 2019 upgrade)
In the November 2019 upgrade we plan to introduce the following enhancements to this feature
New Settings Page
The settings for this feature are moving to their own page. You will still access the settings via menu icon > Desired UTA > Configuration Settings' > Connectivity tab > Scroll down to Service Settings. If Enable Similarity Identification is toggled on, you will now see a link called Similarity Identification Settings. This new link will take you to the new settings page.
See the count of similar applications in a list view
Previously you had to click into a record such as an application to see if there were any similar applications. After the upgrade you will be able to see the number of similar applications in the list view.
To set this up:
- Create a Text Box - Number custom field in the desired UTA to hold the count
- On the Permissions & Availability tab of the new custom field Deny Modification of the field
- Go to menu icon > Desired UTA > Configuration Settings > Connectivity tab > Similarity Identification Settings
- For Similarity Count Field select the custom field you created to hold the count.
- Send training set (Once the training set has finished processing the count of similar applications will get updated)
- Navigate to the desired UTA and add the new field to your desired List View.
Exclude multiple line text fields from the training set
Previously all multiple line text fields of the specified status and type selected were sent for processing. Now you can choose to exclude specific fields from the data set you send. This gives you greater control over what text is sent for analysis.
See full application
Previously when you compared an application to a similar application you could only see the fields sent in the training set. Now we are adding the ability to open the full application. This way you can see all fields such as status, year of submission, that you normally have access to see on the similar application.