#1
|
|||
|
|||
Save as txt : How to detect if Unicode encoding is required?
Hello, using Notepad, if the wrong encoding is chosen when saving, an alert displays a message indicating that another encoding must be chosen to avoid losing certain characters. In MS-Word, how can we detect that, for example, Unicode is required or not when a text export is performed? Thanks for the help.
|
#2
|
|||
|
|||
I believe the following function will solve your problem:
Code:
Function CanSaveDocAsTxt() As Boolean Dim i As Integer With ActiveDocument For i = 1 To .Characters.Count .Characters(i).Select 'if AscW(.characters(i).Text) > 127 then exit function With Dialogs(wdDialogInsertSymbol) If .Font <> "(normal text)" Then Exit Function End With Next i End With CanSaveDocAsTxt = True End Function Last edited by macropod; 11-20-2018 at 01:55 PM. Reason: Added code tags to restore formatting |
#3
|
||||
|
||||
VBorNotVB: When posting code, please use the code tags, indicated by the # button on the posting menu. Without them, your code loses much of whatever structure it had. See your edited post.
Looping through every character in a document is slow and unnecessary - excruciatingly so when every character is unnecessarily selected. In any event, testing whether characters fall in ASCII range 0-127 says nothing about whether they're Unicode; they might also be characters that fall in ASCII range 128-255! Instead, try something based on: Code:
Sub Demo() Application.ScreenUpdating = False Dim i As Long With ActiveDocument With .Range.Find .ClearFormatting .Replacement.ClearFormatting .Text = "[^1-^255]" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue .Format = False .MatchWildcards = True .Execute Replace:=wdReplaceAll End With i = Len(.Range.Text) - .InlineShapes.Count .Undo End With If i > 1 Then MsgBox "Document contains Unicode characters." Else MsgBox "Document contains only ASCII characters." End If Application.ScreenUpdating = True End Sub
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#4
|
|||
|
|||
Thank you very much, I will try this !
|
#5
|
|||
|
|||
Paul,
I created a blank document, added the paragraph "This is a test.", and ran your routine. It reported it as having Unicode characters! I know my routine is slow, worse yet using wdDialogInsertSymbol makes it even slower! But I believe it's the only surefire way to catch Unicode characters. Using "(normal text)" as the litmus test forces Word to do the work for us, otherwise special characters like Wingdings can slip through if their character code point falls below 255. As for testing for ASCII characters 0-127, the first 128 characters of Unicode are ACSII, but represented in long. Thus the use of AscW() which tests for wide characters and guards against reading character code points as negative values! |
#6
|
||||
|
||||
Quote:
Quote:
You still seem to be missing the point that ASCII characters 128-255 are no different in that regard.
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
#7
|
|||
|
|||
Your code works really well and it's fast too. Well done!
|
#8
|
|||
|
|||
Paul, your code works fine, many thanks! But... It works only with small text samples. With entire book, your code is very very slow... and it doesn't works: there are false positive, saying that the document contains Unicode characters whereas it is false. In order to check that, we can compare using WinMerge the text with and without Unicode encoding. Then... I found the solution!!! I just have to save as text with and without Unicode and reload it and compare the results: it is very fast. Thanks anyway for helping.
P.S.: Your code says that for example character É is a Unicode character, this is ambiguous, because it can finally be saved using ANSI encoding. Unicode is really required for example for exotic characters like narrow no-break space (NNBSP : Alt+8239 : U+202F). |
#9
|
||||
|
||||
Exactly what do you expect with a large document? To complain about the speed betrays a totally unrealistic expectation. As even VBorNotVB said, the code:
Quote:
Quote:
__________________
Cheers, Paul Edstein [Fmr MS MVP - Word] |
Tags |
activedocument.saveas, encoding, txt |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Macro help - converting proprietary encoding to unicode, keep formatting | kmawhood | Word VBA | 3 | 04-29-2016 04:06 PM |
Does 'Save as Unicode text' equal UTF-8 | Chayes | Excel | 1 | 07-20-2012 03:07 AM |
Custom Dictionary & Unicode encoding? | markus staubmann | Word | 3 | 03-28-2012 05:23 AM |
Save Word doc in unicode html (utf-8) | rybrns | Word | 5 | 09-26-2011 02:18 PM |
Unicode Encoding Type | Rose44 | Excel | 2 | 08-09-2009 09:05 PM |