Re: Tool to Analyze Text for Possible Snippets
Hi,^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I am looking at 8 different Word documents. The end game for these documents is to import them into my HAT (RoboHelp 2015) and maintain them in HTML. No problem - I know how to do all that.
What I want to pick your brains about is how to determine the frequence of the duplicated text. I know there is duplicate text across the documents because I took the 8 Word documents, inserted each into a single Word document, stripped out the graphics, and sorted the paragraphs.
I ended up with 280 sentences.
Sure, I can visually scan the list and find a sentence like this - "Create and confirm a 4-digit Citrix PIN." - and see that it exists twice. I know I could paste the list of 280 sentences into Excel and remove the rows that are duplicated - that's NOT what I'm looking for.
Instead, I'm looking for something close to this site: https://www.online-utility.org/text/analyzer.jsp, BUT I want to know how many times a sentence exists. For example, I pasted in the 280 sentences and the site came back with this information:
|
Some top phrases containing 8 words (without punctuation marks) Occurrences
configure secure hub configure secure hub configure secure 4
|
However, that text is the following text:
|
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
Configure Secure Hub
|
So what I want to do is paste in the 280 sentences and get a report that "Configure Secure Hub" exists in the list of 280 "6" times.
Have you found an easy way to do this?
The next step, after I figure out how to get the list of duplicated text is to generate .hts files (snippet files that RoboHelp recognizes) so that I can analyze the text outside of RoboHelp, create the .hts files, import the snippets into RoboHelp and then run find and replace actions to replace "Configure Secure Hub" with the reference to the snippet that will store the "Configure Secure Hub" text. I know how to create the snippet file, using a DOS command to "Copy [template.hts file] [name of snippet file]" but have yet to figure out how to get the actual text I want to store in the snippet INTO the snippet without manually pasting the text - Configure Secure Hub - into the snippet... but that's after I figure out to analyze the text automatically to know that "Configure Secure Hub" is repeated 6 times in the 280 sentences.
Visit TechWhirl for the latest on content technology, content strategy and content development | http://techwhirl.com
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-
To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com
Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit
http://www.techwhirl.com/email-discussion-groups/ for more resources and info.
Looking for articles on Technical Communications? Head over to our online magazine at http://techwhirl.com
Looking for the archived Techwr-l email discussions? Search our public email archives @ http://techwr-l.com/archives
References:
Tool to Analyze Text for Possible Snippets: From: Paul Hanson
Previous by Author:
Re: Agile, Jira tickets, and document planning
Next by Author:
Re: What are the worst things that have happened due to content mistake?
Previous by Thread:
Tool to Analyze Text for Possible Snippets
Next by Thread:
Re: Tool to Analyze Text for Possible Snippets
Search our Technical Writing Archives & Magazine
Visit TechWhirl's Other Sites
Sponsored Ads