This site uses cookies.
Some of these cookies are essential to the operation of the site,
while others help to improve your experience by providing insights into how the site is being used.
For more information, please see the ProZ.com privacy policy.
How to remove all special html characters from a file using Regex Tagger?
Автор темы: Michael Beijer
Michael Beijer Великобритания Local time: 05:24 Член ProZ.com c 2009 голландский => английский + ...
Jul 14, 2011
Can someone tell me how to Remove all special html characters from an Excel file or text file imported into memoQ using the new Regex Tagger?
so that I can get rid of stuff like this before trying to extract terms from the text to create a glossary:
Thanks!
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Michael Grant Япония Local time: 13:24 японский => английский
Remove? Or replace?
Jul 15, 2011
I have more of a question than an answer for you but I am wondering whether it would be better to replace the HTML special characters with their equivalents, rather han simple remove them...?
For example, if you simply remove ’ from the text you quoted, and go from this:
Apple’s Friend-Aggregator is a full-feature ...
I have more of a question than an answer for you but I am wondering whether it would be better to replace the HTML special characters with their equivalents, rather han simple remove them...?
For example, if you simply remove ’ from the text you quoted, and go from this:
Apple’s Friend-Aggregator is a full-feature ...
to this:
Apples Friend-Aggregator is a full-feature ...
it will change the meaning of the term _Apple_...correct?
In any case, take a look at the memoQ help article here:
One possible regex to match specialchars might be: ?[^;]+; which would include specialchars that may(or may not) have a # sign, and include numbers and/or letters...However, I do not have memoQ so I cannot test this...(sorry!)
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
Gergely Vandor Венгрия Local time: 06:24 английский => венгерский
memoQ has built-in support for this
Aug 25, 2011
Hello All,
For the regex tagger, memoQ contains a "bundled" configuration called "Tags and entities". You can import the Excel file, and then run the regex tagger from the Format menu with this configuration.
Or you can even create a cascading filter (filter chain), where the regex tagger is chained after the Excel filter with this configuration.
best regards, Gergely
Subject:
Comment:
The contents of this post will automatically be included in the ticket generated. Please add any additional comments or explanation (optional)
The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value