PDA

View Full Version : What is the ASCI(I number of the Unicode U+2218 ?



rejto12
January 28th, 2016, 04:23 PM
Hello,

I would like to know the ASCII number of the Unicode character U+2218 ?

So, I opened Vedit and entered and highlighted U+2218.
Then, I have tried the menu command, {Edit, Translate, Unicode to ASCII } but it did not work for me.

I have a hunch that I am missing something simple.
Thanks as always.

-peter

Scott Lambert
January 28th, 2016, 04:50 PM
Hi Peter,

Don't think Vedit can help you with that.

Plan B:

1. visit: https://en.wikipedia.org/wiki/List_of_Unicode_characters

2. In this case, scroll down to Mathematical Operators (about 3/4 way down)

3. on the line you starting with U+221x, in the column labelled 8, it looks like a small zero.

4. open Vedit, and goto MISC > Ascii Table

5. closest symbol seems to be ascii 186 (to my eyes anyways)

Plan B won't work in all cases, but worth a shot.

Scott

rejto12
January 29th, 2016, 02:20 AM
Hi Peter,

Don't think Vedit can help you with that.

Plan B:

1. visit: https://en.wikipedia.org/wiki/List_of_Unicode_characters

2. In this case, scroll down to Mathematical Operators (about 3/4 way down)

3. on the line you starting with U+221x, in the column labelled 8, it looks like a small zero.

4. open Vedit, and goto MISC > Ascii Table

5. closest symbol seems to be ascii 186 (to my eyes anyways)

Plan B won't work in all cases, but worth a shot.

Scott


Thanks Scott,

Yes, I believe that his is the answer that I was looking for. That is to say I believe that
U+2218 = ASCI(186).
Now let me backtrack.

How did you pull out the magic number 186 ? Could you please expalin it to me.

I did a variation of your step 4. In other words I did a variation of your Step 4 to ask Vedit to print out the character with ASCII number 186.

Here are the details:
I remembered the Vedit command, Cur_Char, which gives the ASCII number of the character of the character of the cursor position. However, my problem now is different.
The ASCII number is given and I try to find the corresponding character. In Mathematical terms I need the inverse function/command of the Cur_Char function/command.
I re3membered that Vedit does have the inverse comand to the Cur_Char command, butI did not remebered the name. I vaguely remembered that it had to something with "dumping" and with character. But I forgot the order. Now came vedi.vdf to the rescue. I searched that dictionary file for "dump" and sure enough I ended up with the
Vedit command Dump_Char. Then, I did the usual routine of escaping into Command Mode and launching the Vedit command
Dump_Char(186). Voila, the little circle appeared on the screen.

Now I also answered my own question: How did you pull out the magic number 186 ? This is your Step 5. This is the Hard job of visual inspection of the Vedit ASCCI table and table of your Wikipedia reference. Also, once I had the magic number 186, I could easily ask Vedit to check it for me.

I do hope that Christian will have a chance to read this email. I vaguely remember that on the old Bulledtin Board Christian and Ian had an extensive discusion of Unocade versus ASCII. I also have a hunch that Ted incorporate a lot of their work into Vedit. In other words, i have a hunch that their code is somehow behind the Vedit menu command {Edit, Translate, Unicode to ASCII } . (.. and of course, Pauli knows everything, in particular this.) Well it is getting too late and I can not check it now.

Thanks again for the visual inspection.






-peter

Scott Lambert
January 29th, 2016, 08:40 AM
Hi Peter,

"How did you pull out the magic number 186 ? Could you please expalin it to me."

Once I knew what the unicode character look like, I just went to the ascii table in vedit and looked down the columns for something similar. Fortunately the strange characters start at ascii 128, so I only had to scan right half the table.

Scott

rejto12
January 30th, 2016, 01:54 AM
Hi Peter,

"How did you pull out the magic number 186 ? Could you please expalin it to me."

Once I knew what the unicode character look like, I just went to the ascii table in vedit and looked down the columns for something similar. Fortunately the strange characters start at ascii 128, so I only had to scan right half the table.

Scott

Hi Scott,

Thanks again for doing the "visual inspection" for me. So far as my original question is concerned it settles it.

Now I would like to integrate it into Vedit. My problem is that this integration is a little over my head and I need your help again.

So, let me give you some specifics.

I found the file UTF-ANSI.VDM in my HOME/macros directory. This is an old, 2004 fie of Christian and Ian. I have a hunch that this is the file that implements the Vedit
menu command {Edit, Translate, Unicode to ASCII } . I also have a hunch that if you would open up this file on your computer you would see right away where your
result saying that

U+2218 = ASCI(186)

would fit in. (Hmmmm; I used the notation ASCI(186), which is sloppy. )

Looking forward to hearing from you.


-peter

Scott Lambert
January 30th, 2016, 09:33 AM
Hi Peter,

That macro is over my head too, talks about code pages which I know nothing about.

"Now I would like to integrate it into Vedit. "

If you just want an easy way to enter the character why not a simple keystroke macro?

CRTL-SHFT-O [VISUAL EXIT]ins_char(186,nocr)

Scott

rejto12
January 30th, 2016, 10:12 AM
Hi Peter,

That macro is over my head too, talks about code pages which I know nothing about.

"Now I would like to integrate it into Vedit. "

If you just want an easy way to enter the character why not a simple keystroke macro?

CRTL-SHFT-O [VISUAL EXIT]ins_char(186,nocr)

Scott

Hi Scott,

A big thank you for your keystroke macro.

I picked up on the web that I can enter Unicodes in Windows by Pressing the Alt-key and entering the Unicode. It sure makes me feel good that now I can also enter this
particular unicode in Vedit too.

Thanks for all your help.

-peter

rejto12
January 30th, 2016, 01:04 PM
Hi Scott,

I would like to share with you the URL which gives instructions on how to enter a Unicode character either in Microsoft Windows or in HTML.
For the special cas of our unicode character this gives.

http://www.fileformat.info/info/unicode/char/2218/index.htm

I tried to do it in Microsft Windows and I could not. f you have time, I would appreciate your help.

Of course, I am hoping that this procedure would lead to a .txt file that has only 1 character, our unicode character ?. Then opening this hypothetical file in Vedit and using the
Cur_Char command, it would give us he ASCII code.


Thanks, as always,

-peter

rejto12
January 30th, 2016, 09:10 PM
Hi Peter,

That macro is over my head too, talks about code pages which I know nothing about.

[VISUAL EXIT]ins_char(186,nocr)

Scott

Scott,

Hi again.

I thought that I knew code pages, kind of... Well what do I know about code pages ? For me it is a hidden parameter that distinguishes between ANSI and OEM fonts in Vedit. Furthermore, if I let winstall.vdm choose the default font, then I am going to end up with the "right" font. So, this is another hidden parameter that I can forget
about. So, my definition of the "right" font hinges on the hidden parameter. Not quite. My definition of the "right" font is the one that displays in Scribe the German words correctly. So the font which is not right for me, is the one that garbles up the German words. Once again, as long as I am choosing the default options in winstall correctly
I can ignore the hidden parameters safely. I also have a hunch that if I would use the forthcoming(?) French version of Scribe then I would no longer have the luxury of ignoring the hidden parameter issue.


So, I went back to the ANSI versus OEM font issue in the Vedit User Manual. This is that I learned,

================================================== ====================
Support non-English characters (0=Off, 1=ANSI, 2=On)
Determines whether the patternmatching codes “|A”, “|U” and “|V”willmatch
non-English letters in the extended character set with decimal value 128 - 255.

It also determines whether {EDIT, Convert, Upper/Lower/Switch case}
recognizes non-English letters.

0 Off. None of the extended characters (128-255) are recognized as letters.
This works best with English.

1 When anANSI font is displayed, non-English letters in the extended ANSI
character set are recognized.

When anOEMfont is displayed, non-English
letters are not recognized.

2 non-English letters are recognized, depending upon the font. When an
ANSI font is displayed, non-English letters in the extended ANSI character
set are recognized. For example, decimal value 252 which is an umlaut
“u” is treated as a lower case letter.

-----------------------------------------------------------------------------
When an OEM font is displayed, non-English letters in the extended
IBM-PC (OEM) character set are recognized. For example, decimal value
129 which is an umlaut “u” is treated as a lower case letter.
-----------------------------------------------------------------------------

Note:
(Technical) VEDIT queries Windows for information about ANSI non-English
letters. Therefore, if you notice any inconsistencies, be sure thatWindows
has been set to the correct language (code page).


================================================== =======================

So, I used the {Misc, ASCII Table} Vedit user menu command and checked whether or not I have the lower case u with umlaut in the 252 position. Low and behold, I do.
So, I have the right font !

Now the fancy stuff, and here I am at a loss. I learned from the UTF-CONV.VDM macro that
" ANSI (Code Page 1252) or OEM (Code Page 437)"

Then I searched the web for code page 1252 and code page 437 and downloaded these two .pdf files from one of the Microsoft websites. Actually, these two .pdf files
are beautiful. The display is nicer and bigger then the one of Vedit.

Now my disappointment. I could not find our uincode character at position, I believe 186, in either of these two code pages. I do hope that this is a simple oversight on my part. However, I did quite a few font experiments today and I am getting tired.

Thanks as always,

-peter

ian binnie
January 31st, 2016, 06:04 PM
Christian and I wrote the conversion macros. The basic principle is simple - it is just a table lookup (apart from the UTF-8). The complication is in the optimisation to make it faster.

No amount of fiddling is going to make Vedit display a character which is not in the current ANSI or OEM codepage. The original question (which is about mapping a Unicode code point to a single byte is not possible - it is the equivalent to asking what integer corresponds to √-1).

rejto12
January 31st, 2016, 10:39 PM
Christian and I wrote the conversion macros. The basic principle is simple - it is just a table lookup (apart from the UTF-8). The complication is in the optimisation to make it faster.

No amount of fiddling is going to make Vedit display a character which is not in the current ANSI or OEM codepage. The original question (which is about mapping a Unicode code point to a single byte is not possible - it is the equivalent to asking what integer corresponds to √-1).

Hi Ian,

A belated thank you for verifying that the Potential, Totally Harmless Typo that I have found in your translation macro with Christian, was indeed a typo.
More specifically, a belated thank you for the following comment of yours, about five years ago :

The actual translations should be correct (they were generated programatically) but the descriptions were added manually.
After all this time I can't remember the details, but suspect a simple editing error as U03B3 Greek small letter gamma does not exist in CP 437.

I also find your two present comments interesting.

1.: "The basic principle is simple - it is just a table lookup." Now, let me tell you why I find this interesting: I remember that many , many years ago when Scott wrote his Add Column macro, Fritz was very enthusiastic about it. I did share Fritz's enthusiasm because this was the first Spreadsheet Command that was imported into Vedit. I did hope that more imports would follow. Certainly the various lookup functions are standard Spreadsheet functions. In short, I would appreciate it if you would import the simplest possible lookup function into Vedit.

2.: My apologies for being sloppy about asking my original question. I simply did not know how to map (extended) ASCII codes into unicodes and so, I was in no position to ask my question in terms of that map. In other words, I was in no position to ask my question in terms of the map that you are referring to in your email.

Therefore, let us restrict the codes to the ones that are displayed by the {Misc, ASCII table} Vedit menu command and let us assume that we have chosen the default font in winstall.vdm. Then there are two questions: a.) Does there exist a character in the displayed ASCII table that has the unicode 2218 ? b.) If there exists such a character then what is the extended ASCII code of that character ?

Actually, Scott went on the unicode.org website and performed a visual lookup of the character with code U+2218. Then he opened up the ASCII table in Vedit and again performed a visual lookup of that character. Finally, he concluded from the position of that character that the extended ASCII code of that character is 186.

I have a hunch that your tables would answer two simple questions:
1.: What is the unicode of the extended ASCII code 186 with reference to the ANSI table.
2.: What is the unicode of the extended ASCII code 186 with reference to the OEM table.


Please do not loose any sleep over these two questions. However, if you do have the answer, then I would appreciate hearing from you.

I learned quite a bit from your macro !

Thanks,

peter

ian binnie
February 1st, 2016, 06:26 PM
U+0xBA is MASCULINE ORDINAL INDICATOR
U+2218 is ∘ RING OPERATOR
They are different characters.

I don't have access to a Windows machine so can't check OEM.

From memory the tables were derived from a Microsoft API which provided close equivalents.

rejto12
February 2nd, 2016, 04:24 AM
Ian,

A big thank you for the information. In particular, a big thank you for telling me that U+0xBA and U+2218 are different characters.

It took me some time to realize this fact. Here are the details.

I have never heard of the name, "MASCULINE ORDINAL INDICATOR" before. I did visit the MSDN website and looked up the ANSI CodePage which has , I believe the number 1252 embedded in the name. There I learned that the name of the character in position 0xBA of the ANSI Code Page is DEGREE. In short, the ANSI Code Page on the MSDN website is a slight enhancement of the result of the Vedit, {Misc, ASCII Table} command. Specifically, MSDN does give the names of the characters in the ASCII Table, while Vedit does not. (I certainly did hear the name DEGREE before. In fact, I learned it in High School as a unit of angle. Calculators, however refer to it as deg. I slowly start to understand why. I guess that Calculators can display ASCII characters but can not display Unicode. So, they create their own Mark Up Language.)

Recently, very recently I have seen the word RING OPERATOR on the web. I kind of "understand" it. In fact, I ended up with the unicode U+2218, by searching the web for for something like composition. Furthermore, the standard mathematical notation for the composition of two functions is that you put your the "little circle" on your second line between them.

I also appreciate your telling me that

"From memory, the tables were derived from a Microsoft API which provided close equivalents."

In short, I shall not try to follow up the references to these tables. After all, in your other email you mentioned that you notes go back to 2004. In other words, they are about 12 years old.

I certainly learned quite a bit from asking the question, "What is the ASCII code of U+2218" Specifically, I leaned about a specific Mathematical Character that can not be displayed in Vedit. Of course, I knew to begin with that Vedit is not a Mathematical Typesetting program. Now, I can add U+2218 to my favorite list of counter examples.

Thanks again, for all your help.


-peter

rejto12
February 4th, 2016, 02:22 AM
[QUOTE=ian binnie;2259]
I don't have access to a Windows machine so can't check OEM.

/QUOTE]

Hi Ian,

I am re-reading your email again. I do not know why it took me that long to answer my Original Question. As per your suggestion, I had to reformulate it to make it precise.

So here it is:
What is the ASCII code of the character that corresponds to the code U+2218 in the Code Page 437 ? (A valid possible answer is that there is no such character in the Code Page 437.

I claim that there is no such character in the Code Page 437.

Here is my proof: I googled Code Page 437 and hit one of the M(icro) S(oft) D(eveloper) N(etwork) hits. More, specifically, I went to the URL:

https://msdn.microsoft.com/en-us/goglobal/cc305156.aspx

This opened up an .html file in my Firefox Browser. I quote the first two lines of that .html file:

OEM 437

This table is provided to help developers move their applications to Unicode.

Then, I used the "Find" command and typed in U+2218 into the popup window. My find command did not yield any hits. Therefore, the table contains no character with
unicode U+2218.

Actually, this MSDN website, also contains many other Code Pages as well. In particular, it contains some Mac Code Pages. So, I do hope that you will also get something out of these email exchanges.

The same procedure also works for Code Page 1252, which also displays the parameters Latin 1, ASCII. I believe that the Latin 2, ASCII parameter corresponds to the Turkish alphabet. Since I ended up with the U+2218 code by searching for the unicode of a Mathematical Symbol, and since it is unlikely that the Turkish alphabet does contain Mathematical Symbols, I did not search the Latin 2, ASCII code page.

Now the pieces start to fit together. In your previous email you mentioned that the difficulty that you treated in your macro was to optimize the lookup procedure.
Furthermore, a lookup procedure is just a fancy collection of Search commands.

So, if I need to lookup only a few unicodes, then I might just as well Google those codes.

I also remember that a few years ago I used your macro to get ans ASCII version of a copy of my Windows registry. (When I copied my registry, I renamed it right away to something different.) To my surprise the upshot was that most of the stuff in my Windows Registry was unicodes from various languages..
For the benefit of other Vedit users, I shall upload the MSDN Code Page 437 in my next message.

Thanks again, for all your help.

-peter

chriz
February 7th, 2016, 03:28 PM
I do hope that Christian will have a chance to read this email.

From time to time I'm reading the messages here, just for fun now...

Peter:
Regarding Unicode: VEDIT isn't the right tool.
The conversion macros were helpful as a workaround those days, but not much more.

VEDIT is still a great editor and multi-purpose-tool, but for Unicode you should consider using another software. In my humble opinion.


Regards

Christian

rejto12
February 7th, 2016, 07:52 PM
From time to time I'm reading the messages here, just for fun now...

Peter:
Regarding Unicode: VEDIT isn't the right tool.
The conversion macros were helpful as a workaround those days, but not much more.

VEDIT is still a great editor and multi-purpose-tool, but for Unicode you should consider using another software. In my humble opinion.


Regards

Christian


Hello Christian,

It was good to hear from you. In particular, it was good to hear that you confirmed officially that Scott has told me unofficially. Specifically, Scott searched the web for my problem and came up with an answer. Now your opinion is official inasmuch as you own (33+1/3)% of that macro.

Actually, it was an interesting learning experience for me to answer my question, which was not precisely formulated, as per Ian's critique. So, here is the precise question. Does there exist a character in either Code Page 437(OEM) or Code Page 1252 (ANSI) such that its unicode is U+2218 ?

My answer is no and I can even prove it. Here is my proof: I searched the web, and I ended up at MSDN website. Specifically, I ended up at the URL:

https://msdn.microsoft.com/en-us/goglobal/cc305156.aspx

As it turned out, this URL is an .html file. So, I opened it in my Firefox browser and was glad to see that it has two tables. The first one is an enhancement of the Vedit {Misc, table..} and the second one is the table of corresponding unicodes and names of the characters. Then, I launched the Firefox Find Command. I typed in the U+2218 into the popup window. Since Firefox did not make any hits, this proves that, indeed, there is no character with this unicode in the table. Then, I treated the
remaining case of the Code Page 1252, (ANSI) by Mathematical Induction.

As Ian would say Firefox performed a look up procedure for me on a 16 x 16 table. I have a hunch, that your macro has a similar table and that your macro could also perform a similar look up procedure in Vedit. Since I do have the option of looking up what is under the hood of the Vedit look up procedure, I would say this is even more impressive. So, all of us should all say a big thank you to TeD for his family of Search commands.

-peter

chriz
February 8th, 2016, 04:15 AM
Just for reference:

There are plenty of sites with Unicode conversion tables and infos. For example:


The "master" site: http://unicode.org/ (The Unicode Consortium)

Microsoft with links to code page tables supported by Windows: https://msdn.microsoft.com/en-us/goglobal/bb964653
e.g.:
https://msdn.microsoft.com/en-us/goglobal/cc305156 OEM 437
https://msdn.microsoft.com/en-us/goglobal/cc305145 Windows 1252
https://msdn.microsoft.com/en-us/goglobal/cc305167 Windows 28591 / ISO-8859-1 (Latin 1)
https://msdn.microsoft.com/en-us/goglobal/cc305176 Windows 28605 / ISO-8859-15 (Latin 9)



Christian

rejto12
February 8th, 2016, 11:37 AM
Just for reference:

There are plenty of sites with Unicode conversion tables and infos. For example:


The "master" site: http://unicode.org/ (The Unicode Consortium)

Microsoft with links to code page tables supported by Windows: https://msdn.microsoft.com/en-us/goglobal/bb964653
e.g.:
https://msdn.microsoft.com/en-us/goglobal/cc305156 OEM 437
https://msdn.microsoft.com/en-us/goglobal/cc305145 Windows 1252
https://msdn.microsoft.com/en-us/goglobal/cc305167 Windows 28591 / ISO-8859-1 (Latin 1)
https://msdn.microsoft.com/en-us/goglobal/cc305176 Windows 28605 / ISO-8859-15 (Latin 9)



Christian

Thanks Christian,

My final question is, which one of your references treats the "German" alphabet ? Like umlaut and friends.

My (next+1) project is to use Scotts german.vdf file to spell check a German text. That is to say to use his German dictionary file and his Scribe 8 to spell check a German text.

Now I am going to back to my next and favorite project. In other words, I would like to play with the HiSync project . You know, the project that was started by Fritz.
Incidentally, have you heard from him, recently ?

-peter

chriz
February 8th, 2016, 11:55 AM
My final question is, which one of your references treats the "German" alphabet ? Like umlaut and friends.


Each one!
They are simply different representations for the respective codepoints/characters.

That's why Unicode has been developed: To be able to handle this mess...

EOT