PDA

View Full Version : Potential, harmless typo in ANSI-UTF.VDM



rejto
September 3rd, 2011, 05:52 PM
Gentlemen,
I assume that I have found a totally harmless typo in ANSI-UTF.VDM. Specifically, line 210 says that

B5 03 EE Greek small letter gamma

Actually, in the macro the character is replaced by the Vedit OEM character with Hexadecimal value EE. I believe that the word "gamma" should be replaced by the word "epsilon"


To check that my assumption is correct, I have visited the www.unicode.org/charts/PDF
website and downloaded the file 0307.pdf. This file, indeed, has the icon of a "Greek small letter epsilon" and it has the code 03B5 under it. In other words, the same code as your code read backwards.

This change also would make the macro compatible with the Vedit {Misc, ASCII table ..} menu command. For the hexadecimal value EE, this table displays the same icon as your line 210 . In other words, it displays an "epsilon".

I also checked it "Code page 437" on microsoft.msdn.com and under unicode 03B5 it als has an "epsilon".

In any case, a big thank you for ANSI-UTF.VDM. I like it very much! In fact, I do prefer your table in this macro to the one of the Vedit menu command. My reason is that in your table the value of the Font_Charset parameter is explicit, while in the Vedit menu command it is not.


Incidentally, I would love to add the line Char Set = Vedit ANSI or Char Set = Vedit OEM,
whichever the case to the table displayed by this Vedit menu command. However, I do not know how to do it. In fact, I do not know whether this is an internal Vedit command or Vedit is executing this command via a macro.

Thanks again, I have learnecd quite a bit from your macro.

-peter

ian binnie
September 4th, 2011, 03:30 AM
Christian and I generated this macro a few years ago, but the translation tables were produced by me.
I have a file dated 2004 which was the source of the table. This contains the same error.

The actual translations should be correct (they were generated programatically) but the descriptions were added manually.
After all this time I can't remember the details, but suspect a simple editing error as U03B3 Greek small letter gamma does not exist in CP 437.

mrvedit
September 12th, 2011, 12:18 PM
I don't fully understand this, but since Ian also thinks it should read "epsilon", I have changed the macro.

pal
September 13th, 2011, 06:03 AM
Are the macros ansi-utf.vdm and utf-ansi.vdm needed any more?
Utf-conv.vdm can replace them both.

But Utf-conv.vdm has the same error in the comment, as expected.

By the way, the menu still says "Unicode to ASCII" and "ASCII to Unicode".
It should be "Unicode to ANSI" and "ANSI to Unicode".
(And if you use utf-conv.vdm to implement these menu commands, you can remove the "(UTF-16)" part from the menu text.)

--
Pauli

chriz
September 16th, 2011, 09:32 AM
I only did some small tests but yes, there seems to be some redundancy in the {Edit, Translate} menu.
The three Unicode related items should be reducable to one (the third one).

pal
September 17th, 2011, 08:27 AM
Or, in case someone prefers direct conversion, the first two items could just call the labels FROM_UTF and TO_UTF in utf-conv.vdm.
But these items are not really needed, since the macro usually can automatically detect the direction.
In any case, the old macros are not needed any more.

ian binnie
September 18th, 2011, 05:21 AM
Or, in case someone prefers direct conversion, the first two items could just call the labels FROM_UTF and TO_UTF in utf-conv.vdm.
But these items are not really needed, since the macro usually can automatically detect the direction.
In any case, the old macros are not needed any more.

I am not sure that vedit is not calling the entry points in utf-conv.vdm. It is not possible to determine directly, although it wouldn't be that difficult to work out (by modifying or deleting macros).

It is (almost) impossible to distinguish between UTF-8 and ANSI so the macro can not automatically detect the direction.

In actual fact there are a lot of other options which can not be directly called - they would need separate menu items.

I might add, that as author of these macros, I do not actually use the published versions.
I have my custom macros to convert UTF-8 <-> UTF-16LE and ANSI <-> UTF-16LE - these cover virtually all my needs, and do not ask silly questions every time I run them. Occasionally I need to run 2 macros in sequence e.g. UTF-8 <-> UTF-16LE <-> ANSI (or ASCII on the very rare occasions I need to handle OEM).

My approach to macros is much the same as c programming or UNIX shell programming.
I prefer functions which do one task, and link them together to perform more complex tasks.
This is more flexible and easier to debug.

GreenviewData
September 22nd, 2011, 05:10 PM
I agree that this can all be done with just the utf-conv.vdm macro. However, like Ian, I don't like the FROM-UTF and TO_UTF labels as they still display dialog boxes.
However, I like the UTF_ANSI and ANSI_UTF16 labels which skip the dialog boxes.

Therefore, I plan to:

* Rename the "Unicode (UTF-16) to ASCII" function to "Unicode to ANSI" and implement it with CallF(122,"utf-conv","UTF_ANSI")

* Rename the "ASCII to Unicode (UTF-16)" function to "ANSI to Unicode (UTF-16LE) and implement it with CallF(122,"utf-conv","ANSI_UTF16")

* Add another function ""ANSI to Unicode (UTF-8) and implement it with CallF(122,"utf-conv","ANSI_UTF8")

* Leave the "Unicode (UTF-16 and UTF-8)" menu item

From my brief tests, it appears the "UTF_ANSI" label can tell the difference between UTF-8 and UTF-16 and therefore only one function is needed in this direction.

It appears these three functions could handle most conversions, with the dialog item handling the rest.

If this sounds good, I will implement it ASAP.

ian binnie
September 23rd, 2011, 09:20 PM
Sounds OK.

I have soma pathological test cases which can cause UTF-16 to be confused with UTF-8 or ANSI, but these are deliberately constructed, and unlikely to occur in practice.
At any event most UTF files users will encounter have BOM.

rejto
January 7th, 2014, 11:36 PM
Gentlemen,

Thank you for this interesting discussion.

I vaguely remember that I got interested in the UTF ASCI conversion when I was editing a copy of my the Win XP x64 Registry
in Vedit. Somehow, I used Vedit to convert the UTF characters to ASCII characters. However, my Win XP x64 system crashed and my records got lost.

Finally, I replaced the crashed Win XP x64 system by a Win 7 x64 system. In short, I am looking for a safe way to edit my Win 7 x64 Registry.

Any suggestion would be appreciated.


-peter

rejto12
January 30th, 2016, 02:58 PM
Gentlemen,

Thank you for this interesting discussion.

I vaguely remember that I got interested in the UTF ASCI conversion when I was editing a copy of my the Win XP x64 Registry
in Vedit. Somehow, I used Vedit to convert the UTF characters to ASCII characters. However, my Win XP x64 system crashed and my records got lost.

Finally, I replaced the crashed Win XP x64 system by a Win 7 x64 system. In short, I am looking for a safe way to edit my Win 7 x64 Registry.

Any suggestion would be appreciated.


-peter

Hello,

I do not know why it took me about five years to thank Ted for incorporating these suggestions into VEDIT. As you know I am not a Vedit Programmer, and so I will speak only
for myself. So, let me copy my harmless typo of five or so years ago:

I assume that I have found a totally harmless typo in ANSI-UTF.VDM. Specifically, line 210 says that

B5 03 EE Greek small letter gamma

Actually, in the macro the character is replaced by the Vedit OEM character with Hexadecimal value EE. I believe that the word "gamma" should be replaced by the word "epsilon"

In short, that typo has been corrected. Specifically, the line

B5 03 EE Greek small letter epsilon

in the :CREATE_TABLE: subroutine part of the macro does check with the table on the unicode.org website.

Note that as per Pauli's suggestion the use of the original macros has been streamlined. Accordingly, the name of the new distribution macro is UTF-CONV.VDM

Finally a big thank you for the lovely table in the subroutine :CREATE_TABLE: . I also should say that during these five years I have completely forgotten about this technical subject. Until, a couple of days ago, I needed to find out the ASCII code of U+2218. Then I posted this question on a separate thread. Then Scott showed me
how to find this specific code. So, I certainly do appreciate the effort needed to work with unicode tables.


-peter