Microsoft Office Forums

Go Back   Microsoft Office Forums > >

Reply
 
Thread Tools Display Modes
  #1  
Old 11-17-2018, 12:02 AM
Patrice Dargenton Patrice Dargenton is offline Save as txt : How to detect if Unicode encoding is required? Windows 10 Save as txt : How to detect if Unicode encoding is required? Office 97-2003
Novice
Save as txt : How to detect if Unicode encoding is required?
 
Join Date: Nov 2018
Posts: 3
Patrice Dargenton is on a distinguished road
Default Save as txt : How to detect if Unicode encoding is required?

Hello, using Notepad, if the wrong encoding is chosen when saving, an alert displays a message indicating that another encoding must be chosen to avoid losing certain characters. In MS-Word, how can we detect that, for example, Unicode is required or not when a text export is performed? Thanks for the help.
Reply With Quote
  #2  
Old 11-20-2018, 09:15 AM
VBorNotVB VBorNotVB is offline Save as txt : How to detect if Unicode encoding is required? Mac OS X Save as txt : How to detect if Unicode encoding is required? Office 2016 for Mac
Novice
 
Join Date: Oct 2018
Location: Southern California
Posts: 25
VBorNotVB is on a distinguished road
Default

I believe the following function will solve your problem:

Code:
Function CanSaveDocAsTxt() As Boolean
    Dim i As Integer
    With ActiveDocument
        For i = 1 To .Characters.Count
            .Characters(i).Select
            'if AscW(.characters(i).Text) > 127 then exit function
            With Dialogs(wdDialogInsertSymbol)
                If .Font <> "(normal text)" Then Exit Function
            End With
        Next i
    End With
    CanSaveDocAsTxt = True
End Function
It goes through all characters in the active document and returns True if all characters in the document are part of (normal text) character set. Bear in mind that this character set has ANSI characters (character code points between 128 and 255). If you want to limit your text file to characters in ASCII range (0-127), then uncomment the commented statement above. This function does not check the document for shapes (images, etc.) - But I assume you know which documents do!

Last edited by macropod; 11-20-2018 at 01:55 PM. Reason: Added code tags to restore formatting
Reply With Quote
  #3  
Old 11-20-2018, 01:56 PM
macropod's Avatar
macropod macropod is offline Save as txt : How to detect if Unicode encoding is required? Windows 7 64bit Save as txt : How to detect if Unicode encoding is required? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

VBorNotVB: When posting code, please use the code tags, indicated by the # button on the posting menu. Without them, your code loses much of whatever structure it had. See your edited post.

Looping through every character in a document is slow and unnecessary - excruciatingly so when every character is unnecessarily selected. In any event, testing whether characters fall in ASCII range 0-127 says nothing about whether they're Unicode; they might also be characters that fall in ASCII range 128-255!

Instead, try something based on:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim i As Long
With ActiveDocument
  With .Range.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "[^1-^255]"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = True
    .Execute Replace:=wdReplaceAll
  End With
  i = Len(.Range.Text) - .InlineShapes.Count
  .Undo
End With
If i > 1 Then
  MsgBox "Document contains Unicode characters."
Else
  MsgBox "Document contains only ASCII characters."
End If
Application.ScreenUpdating = True
End Sub
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #4  
Old 11-21-2018, 07:17 AM
Patrice Dargenton Patrice Dargenton is offline Save as txt : How to detect if Unicode encoding is required? Windows 10 Save as txt : How to detect if Unicode encoding is required? Office 97-2003
Novice
Save as txt : How to detect if Unicode encoding is required?
 
Join Date: Nov 2018
Posts: 3
Patrice Dargenton is on a distinguished road
Default

Thank you very much, I will try this !
Reply With Quote
  #5  
Old 11-21-2018, 07:32 AM
VBorNotVB VBorNotVB is offline Save as txt : How to detect if Unicode encoding is required? Mac OS X Save as txt : How to detect if Unicode encoding is required? Office 2016 for Mac
Novice
 
Join Date: Oct 2018
Location: Southern California
Posts: 25
VBorNotVB is on a distinguished road
Default

Paul,

I created a blank document, added the paragraph "This is a test.", and ran your routine. It reported it as having Unicode characters!

I know my routine is slow, worse yet using wdDialogInsertSymbol makes it even slower!
But I believe it's the only surefire way to catch Unicode characters. Using "(normal text)" as the litmus test forces Word to do the work for us, otherwise special characters like Wingdings can slip through if their character code point falls below 255.

As for testing for ASCII characters 0-127, the first 128 characters of Unicode are ACSII, but represented in long. Thus the use of AscW() which tests for wide characters and guards against reading character code points as negative values!
Reply With Quote
  #6  
Old 11-21-2018, 03:14 PM
macropod's Avatar
macropod macropod is offline Save as txt : How to detect if Unicode encoding is required? Windows 7 64bit Save as txt : How to detect if Unicode encoding is required? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by VBorNotVB View Post
I created a blank document, added the paragraph "This is a test.", and ran your routine. It reported it as having Unicode characters!
I forgot that a paragraph break would remain in the document. The 'If i > 0 Then' should have been 'If i > 1 Then'. Fixed.

Quote:
Originally Posted by VBorNotVB View Post
I know my routine is slow, worse yet using wdDialogInsertSymbol makes it even slower!
But I believe it's the only surefire way to catch Unicode characters.
My approach clearly demonstrates that is not so.
Quote:
Originally Posted by VBorNotVB View Post
As for testing for ASCII characters 0-127, the first 128 characters of Unicode are ACSII, but represented in long.
You still seem to be missing the point that ASCII characters 128-255 are no different in that regard.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
  #7  
Old 11-22-2018, 11:38 AM
VBorNotVB VBorNotVB is offline Save as txt : How to detect if Unicode encoding is required? Mac OS X Save as txt : How to detect if Unicode encoding is required? Office 2016 for Mac
Novice
 
Join Date: Oct 2018
Location: Southern California
Posts: 25
VBorNotVB is on a distinguished road
Default

Your code works really well and it's fast too. Well done!
Reply With Quote
  #8  
Old 11-23-2018, 01:49 AM
Patrice Dargenton Patrice Dargenton is offline Save as txt : How to detect if Unicode encoding is required? Windows 10 Save as txt : How to detect if Unicode encoding is required? Office 97-2003
Novice
Save as txt : How to detect if Unicode encoding is required?
 
Join Date: Nov 2018
Posts: 3
Patrice Dargenton is on a distinguished road
Default

Paul, your code works fine, many thanks! But... It works only with small text samples. With entire book, your code is very very slow... and it doesn't works: there are false positive, saying that the document contains Unicode characters whereas it is false. In order to check that, we can compare using WinMerge the text with and without Unicode encoding. Then... I found the solution!!! I just have to save as text with and without Unicode and reload it and compare the results: it is very fast. Thanks anyway for helping.

P.S.: Your code says that for example character É is a Unicode character, this is ambiguous, because it can finally be saved using ANSI encoding. Unicode is really required for example for exotic characters like narrow no-break space (NNBSP : Alt+8239 : U+202F).
Reply With Quote
  #9  
Old 11-23-2018, 02:08 PM
macropod's Avatar
macropod macropod is offline Save as txt : How to detect if Unicode encoding is required? Windows 7 64bit Save as txt : How to detect if Unicode encoding is required? Office 2010 32bit
Administrator
 
Join Date: Dec 2010
Location: Canberra, Australia
Posts: 21,962
macropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond reputemacropod has a reputation beyond repute
Default

Quote:
Originally Posted by Patrice Dargenton View Post
With entire book, your code is very very slow...
Exactly what do you expect with a large document? To complain about the speed betrays a totally unrealistic expectation. As even VBorNotVB said, the code:
Quote:
works really well and it's fast too
Quote:
Originally Posted by Patrice Dargenton View Post
it doesn't works: there are false positive, saying that the document contains Unicode characters whereas it is false.

Your code says that for example character É is a Unicode character, this is ambiguous, because it can finally be saved using ANSI encoding.
Your fault-finding is rude! You can't reasonably expect a test for Unicode characters to say a given Unicode character isn't Unicode just because there's an ASCII equivalent to represent the same character (e.g. É). If you don't want to use a Unicode character when there's an ASCII equivalent, that's your responsibility, but don't go criticizing others when the solution is correct for the problem as described and any problems are due to your poor description of what you want to achieve.
__________________
Cheers,
Paul Edstein
[Fmr MS MVP - Word]
Reply With Quote
Reply

Tags
activedocument.saveas, encoding, txt

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Macro help - converting proprietary encoding to unicode, keep formatting kmawhood Word VBA 3 04-29-2016 04:06 PM
Save as txt : How to detect if Unicode encoding is required? Does 'Save as Unicode text' equal UTF-8 Chayes Excel 1 07-20-2012 03:07 AM
Save as txt : How to detect if Unicode encoding is required? Custom Dictionary & Unicode encoding? markus staubmann Word 3 03-28-2012 05:23 AM
Save as txt : How to detect if Unicode encoding is required? Save Word doc in unicode html (utf-8) rybrns Word 5 09-26-2011 02:18 PM
Save as txt : How to detect if Unicode encoding is required? Unicode Encoding Type Rose44 Excel 2 08-09-2009 09:05 PM

Other Forums: Access Forums

All times are GMT -7. The time now is 09:20 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Search Engine Optimisation provided by DragonByte SEO (Lite) - vBulletin Mods & Addons Copyright © 2024 DragonByte Technologies Ltd.
MSOfficeForums.com is not affiliated with Microsoft