#1
|
|||
|
|||
Strange characters in documents edited in Japan
I have a word macro the regularises units as typed in by the user. This macro is applied once I receive the document for further editing.
e.g 123°C 123 °C 123C etc become 123 °C after I run the macro. The macro is basically a set of search and replace instructions using the wildcard function. However I have a document that was edited by Japanese colleagues where the ° symbol is not found by my macro. If I print the AscW value of the character used by my Japanese colleagues it is reported as 40 rather than 176. The character normally assigned to the value 40 is the left parenthesis character (. A similar problem also occurs where Tab characters appear as a square symbol. In this case the square symbol has an Ascw value of 12288. I've attached a small document which includes text with these strange characters. I'd really appreciate some help as trying to resolve this issue is driving me bonkers. |
#2
|
||||
|
||||
I have had similar issues with documents that came from other writers. The only solution I found was to copy the offending character and use Find and Replace one at a time.
|
#3
|
|||
|
|||
Hi Boatwrenchv8
Thanks for the update. I've tried the copy search and replace solution but it doesn't work for me. In addition I'm looking for a way to do this programtically so that I can process a whole directory of word documents at a time. |
#4
|
|||
|
|||
Bump. Surely someone has encountered a similar problem.
|
#5
|
||||
|
||||
I suspect the issue is that your editors have used a similar character from the Symbols font set, probably via Insert|Symbol. Although using AscW usually returns an accurate Unicode value for a character, Word can't tell which characters are in some symbol fonts - some may not even have a representation in Unicode. So Word uses code page values above &HF000 (a "private use" range programs can use for their own purposes) for them. Another problem is that Word protects symbols from symbol fonts that users have inserted from "Insert > Symbol" against changes (so they don't get messed up if you change the font or style). The end result is that Word won't readily tell you what font was used, and AscW will report a code of 40 = "(".
The following macro will unlock all such characters so you can get at them via Find/Replace. Code:
Sub SymbolsUnprotect() Dim StrFnt As String, CharVal As Long With ActiveDocument.Range With .Find .Text = "[" & ChrW(61472) & "-" & ChrW(61695) & "]" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchWildcards = True .Execute End With Do While .Find.Found With Dialogs(wdDialogInsertSymbol) StrFnt = .Font CharVal = .CharNum End With .Font.Name = StrFnt .Text = ChrW(CharVal) .Collapse wdCollapseEnd .Find.Execute Loop End With End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#6
|
|||
|
|||
Hi macropad
I tried your code above and it correctly caught the symbol posing as a degree sign. However it did not catch the square symbol which should be acting as a tab character. I single stepped through your macro (F8) and the do while .find.found loop was only executed once. Can you offer any further advice on the square symbols (which should be tab characters) in the document I uploaded. |
#7
|
||||
|
||||
The square boxes that appear in place of the tabs aren't even special symbols and don't need the macro - you can replace those by simply copying/pasting one into the Find textbox...
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#8
|
|||
|
|||
Hi Macropod
The problem is that I want to implement the find and replace as part of a macro that processes a number of documents. The square symbol has the Ascw value of 12288 (Hex 3000). If I search and replace using a .find.text string of "^u12288" then I replace not only the square symbol but also each and every space character in the document. With ActiveDocument.StoryRanges(wdMainTextStory).Find .ClearFormatting .Replacement.ClearFormatting .Text = "^u12288" .Replacement.Text = vbtab .Wrap = wdFindContinue .Execute Replace:=wdReplaceAll End With If I follow your advice of cutting and pasting into a manual search and replace then the same thing happens. All space characters are converted to tab characters. Finally, would you be kind enough to explain the following line in your macro .Text = "[" & ChrW(61472) & "-" & ChrW(61695) & "]" What are the chrw(61472) and chrw(61695) and do I need to replace the - with my character e.g. ^u12288. I'm really grateful for your input so far |
#9
|
||||
|
||||
The simple solution for a Find/Replace macro to change the boxes to tabs is to specify:
.Text = ChrW(12288) .Replacement.Text = "^t" The: .Text = "[" & ChrW(61472) & "-" & ChrW(61695) & "]" in the code I posted tells Word to look for characters having values in the 61472 to 61695 range. You shouldn't modify that part of the code, since all it's doing is finding all locked symbols using Word's protected code page area and unlocking them so you can access them with whatever other code you're using.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#10
|
|||
|
|||
Hi Macropod
Apologies for being stupid. I should have recognised the regex expression. My only excuse is that I didn't recognise the two Chrw so just assumed they were something special in word. Your suggestion of using chrw(12288) was one of the methods that I tried once I worked out what the character value was. As I reported above all that happens is that each and every space character in the document is replaced by a tab. The specific code I used is as below Code:
With ActiveDocument.StoryRanges(wdMainTextStory).Find .ClearFormatting .Replacement.ClearFormatting .Text = ChrW(12288) .Replacement.Text = "^t" .Wrap = wdFindContinue .Execute Replace:=wdReplaceAll End With Last edited by macropod; 10-08-2013 at 02:01 AM. Reason: Added code tags & formatting |
#11
|
||||
|
||||
With your code on my Win 7 system, using both Word 2003 and Word 2010, only the 'tab' boxes get replaced - the spaces are left alone.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#12
|
|||
|
|||
I'm using Word 2013 (via Microsoft Office Small Business Premium via online subscription) on Windows 7 64 bit. The document is being edited in compatibility mode (Word 97-2003).
This *IS* fun isn't it ;-) |
#13
|
||||
|
||||
I suggest you try repairing the Office installation (via Programs & Features > Microsoft Office > Change in the Windows Control Panel), then re-starting Windows to see if the problem's fixed.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#14
|
|||
|
|||
I've completed the repair and rebooted. Unfortunately the result is the same. the square symbols and each space character are replaced by a tab character.
|
#15
|
||||
|
||||
Maybe it's a bug in Word 2013. I suggest trying on another PC, preferably one not running Word 2013 (though if it works on another PC running Word 2013, we'll know it's an installation issue with your PC).
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
Thread Tools | |
Display Modes | |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Word 2010 to PDF: PDF gets strange characters | Dianne | Office | 2 | 09-18-2013 09:12 AM |
Strange characters in Header appear AFTER Word doc converted to PDF?? | souicat | Word | 2 | 06-12-2013 02:56 PM |
How do I delete strange characters from a document. | Stokkers | Word | 1 | 06-12-2013 04:21 AM |
Strange Characters Removal | OceansBlue | Word | 2 | 04-03-2013 10:01 AM |
Referencing a value over 255 characters within a table PLUS strange formula behaviour | TishyMouse | Excel | 2 | 01-08-2013 09:39 AM |