Bei der Gruppe, für die Sie eine Mitteilung verfassen, handelt es sich um eine Usenet-Gruppe. Wenn Sie in dieser Gruppe Nachrichten posten, ist Ihre E-Mail-Adresse für jeden im Internet sichtbar
Hi All, The current UTF-8 keyboard input (for the console) of the Linux kernel does not support "composing" or writing characters with accents. This affects quite a few languages that require accents (French, German, Danish, Swedish?, Greek, cyrillic-based?, others?.).
In general, UTF-8 console support is good to display text in different character sets, enabling to configure a distribution to use UTF-8 locales for both console/Xorg. However, while it was possible to write in German, Spanish, French, etc, now it is not possible anymore.
While looking into the problem, I noticed that there is work to make Linux console handle Unicode better.
> The current UTF-8 keyboard input (for the console) of the Linux kernel > does not support "composing" or writing characters with accents.
Yes, i recently find it out when trying to switch all my system to UTF-8. But the patch from Chris you mention below works very well for me (and for anybody that needs to type compose characters for languages based in the latin1 encoding i guess).
> affects quite a few languages that require accents (French, German, > Danish, Swedish?, Greek, cyrillic-based?, others?.).
Chris told me in the utf-8 mailing list that he doesn't think his patch to make the kernel generate UTF-8 characters in the compose tables will be included in the main kernel. Basically because is not a full solution that cover all the cases... But there is nothing better, so maybe it will be a good idea to include it. Current state is, for 2.6 kernel, text console is broken in UTF-8 mode because it cannot generate UTF-8 composed characters.
> Is there an interest for re-submission of mentioned patches for > inclusion in the kernel (yeah, provided coding style is "normalised")?
>> The current UTF-8 keyboard input (for the console) of the Linux kernel >> does not support "composing" or writing characters with accents.
That's weird, because "ö" (LATIN O WITH DIAERESIS) -- which clearly lies outside the 7-bit range, is working on my system without myself poking the kernel. Both hitting the key or using compose mode. This also applies to A-with-DIAERESIS, U-with-DIAERESIS, sharp german S, but does not for anything else, e.g. compose-'-e to generate E with accent aigu.
>Yes, i recently find it out when trying to switch all my system to >UTF-8. But the patch from Chris you mention below works very well >for me (and for anybody that needs to type compose characters for >languages based in the latin1 encoding i guess).
>> Is there an interest for re-submission of mentioned patches for >> inclusion in the kernel (yeah, provided coding style is "normalised")?
>At least, I am _really_ interested :)
So am I. I have to use xterm for anything fancy now... (especially for the even-more fancy stuff that begins at three-byte UTF8 sequences, such as Japanese :-)
> >> The current UTF-8 keyboard input (for the console) of the Linux kernel > >> does not support "composing" or writing characters with accents.
> That's weird, because "ö" (LATIN O WITH DIAERESIS) -- which clearly lies > outside the 7-bit range, is working on my system without myself poking the > kernel.
Indeed is weird. Are you sure you keyboard is generating an UTF-8 enconded "ö"? Just check it with echo:
$ echo -n ö | od -t x1
0000000 c3 b6 0000002
I'm using kernel 2.6.9 + Chris patch
> So am I. I have to use xterm for anything fancy now... > (especially for the even-more fancy stuff that begins at three-byte UTF8 > sequences, such as Japanese :-)
I know :)). By the way, and this is offtopic, have you checked uim? I was testing it the other day with good results, and like it a lot as a japanese (or another script, although i only use this japanese) input method. I've used it with anthy, just have to check it with skk.
>Indeed is weird. Are you sure you keyboard is generating an UTF-8 >enconded "ö"? Just check it with echo:
>$ echo -n ö | od -t x1
>0000000 c3 b6 >0000002
Yes it does generate 0xC3B6 (otherwise it would show up as garbage, because it would not be utf8-compliant if it only output 0xF6)
>I'm using kernel 2.6.9 + Chris patch
I am using SUSE's KOTD 20041202 (2.6.8 + 2.6.9-rc2)
>I know :)). By the way, and this is offtopic, have you checked uim? I >was testing it the other day with good results, and like it a lot as >a japanese (or another script, although i only use this japanese) input >method. I've used it with anthy, just have to check it with skk.
On Saturday 11 December 2004 16:39, Jan Engelhardt wrote:
>>Indeed is weird. Are you sure you keyboard is generating an UTF-8 >>enconded "ö"? Just check it with echo:
>>$ echo -n ö | od -t x1
>>0000000 c3 b6 >>0000002
>Yes it does generate 0xC3B6 (otherwise it would show up as garbage, > because it would not be utf8-compliant if it only output 0xF6)
Which is exactly (0xF6) what I'm getting. Kernel version 2.6.10-rc2-mm3-V0.7.32-18
As an american, I've often wondered how to go about getting those accented characters out of a std american keyboard. I used to be able to get all those accented and other stuffs out of my amiga's keyboard, stuff like the Beta sign and so on. No can do now, and I miss it.
[...]
-- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.30% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
>I am a bit confused. Could you please comment on the following, as a >common test steps?
I do it a bit faster (i.e. without "od"): if after a Compose operation, I see something, it must have been UTF8. If not (like there has been only 8 bits output), the current line screws up a little. <- My test strategy; does not need `od` to confirm.
>I am not sure how you wrote the above characters. According to UTF-8, >characters with codepoints above 0x79 require two bytes so that to be >valid. When you compose "ö" (you press something like ";", then "o") in >the console?
ö is a "native key" on my keyboard, i.e. i do not need to play with compose to generate ö.
>For simplicity, let's assume you do something like
I did that (a shortened form of yours), and saw by `dumpkeys` that there was now only one compose table entry, so I think `loadkeys --unicode` overwrites the table? Rightly so. Still and despite there are now no compose table entries, with the exception of that one, I can still generate ö. <compose><"><o> rightly gives two 7-bit characters (rightly so at this point).
>Good. I hope more people raise their hands for this.
...want kanji-on-console, but I guess that will not come true with VGA, which only supports 256 (512) chars. OTOH, [free]BSD's mouse support uses a graphical mouse pointer rather than a "block" one like gpm does, and as I think of it, such a graphical mouse (most Norton apps for DOS also had such) needs some VGA magic or so to set single "pixels"/bits. If single bits within the 8x{16,etc.} char cell can be set, we could have kanji. Can anyone elaborate on this graphical mouse stuff?
> >I am not sure how you wrote the above characters. According to UTF-8, > >characters with codepoints above 0x79 require two bytes so that to be > >valid. When you compose "ö" (you press something like ";", then "o") in > >the console?
> ö is a "native key" on my keyboard, i.e. i do not need to play with compose to > generate ö.
Aaahh ;), you've should said that before. The whole problem with the kernel is with the compose tables. If you have a native key for "ö" in your keyboard you'll not have problems. I can type for example a 'n with tilde' in my keyboard because is too is a native key, but for accentuated characters, for utf-8 output is neccesary to apply the patch :-/
Jan Engelhardt wrote: > >> The current UTF-8 keyboard input (for the console) of the Linux kernel > >> does not support "composing" or writing characters with accents.
> That's weird, because "ö" (LATIN O WITH DIAERESIS) -- which clearly lies > outside the 7-bit range, is working on my system without myself poking the > kernel. Both hitting the key or using compose mode. This also applies to > A-with-DIAERESIS, U-with-DIAERESIS, sharp german S, but does not for anything > else, e.g. compose-'-e to generate E with accent aigu.
I am a bit confused. Could you please comment on the following, as a common test steps?
I am not sure how you wrote the above characters. According to UTF-8, characters with codepoints above 0x79 require two bytes so that to be valid. When you compose "ö" (you press something like ";", then "o") in the console?
For simplicity, let's assume you do something like % loadkeys --unicode keycode 53 = 0x0d2f compose '/' 'q' to U+00F6 compose '/' 'w' to U+00F7 compose '/' 'e' to U+00F8 compose '/' 'r' to U+00F9 compose '/' 't' to U+0100 compose '/' 'y' to U+0101 keycode 2 = U+00F6 keycode 3 = U+00F7 keycode 4 = U+00F8 keycode 5 = U+00F9 keycode 6 = U+0100 keycode 7 = U+0101 ^D %
Dead key (due to "0d") is the character "/" (0x2f). Keycodes 2-7 are keys for numbers 1-6. To test, I type % cat > test.txt <we try out all key compositions to generate U+00F6-U+0101> ^D
When we try keys 1-6, we get % od -x text.txt 0000000 b6c3 b7c3 b8c3 b9c3 80c4 81c4 000a 0000015 % which is correct.
When we try using the dead key "/" and q-y, we get % od -x test.txt 0000000 f7f6 f9f8 0100 000a 0000007 %
To get the keyboard in a sane mode, "loadkeys --unicode -d".
>From here we see there is no conversion to UTF-8 whatsoever.
In the second case, the kernel cannot return the full character when it is in Unicode mode.
> >Yes, i recently find it out when trying to switch all my system to > >UTF-8. But the patch from Chris you mention below works very well > >for me (and for anybody that needs to type compose characters for > >languages based in the latin1 encoding i guess).
> >> Is there an interest for re-submission of mentioned patches for > >> inclusion in the kernel (yeah, provided coding style is "normalised")?
> >At least, I am _really_ interested :)
> So am I. I have to use xterm for anything fancy now... > (especially for the even-more fancy stuff that begins at three-byte UTF8 > sequences, such as Japanese :-)
Good. I hope more people raise their hands for this.
Simos
[I am sending this again. It did not make it to the kernel mailing list in the first^Wsecond post for some reason..]
On Sun, Dec 12, 2004 at 01:05:49AM +0100, Jan Engelhardt <jeng...@linux01.gwdg.de> wrote: > Can anyone elaborate on this graphical mouse stuff?
What norton does is simply use a few characters that happen to look like a mouse cursor on characters (or norton forces to look, more correctly). You can do that for a single object (like the mouse cursor), and a few more, but of course you can display much less characters that way than with a standard method, as it eats 4 characters/object.
-- The choice of a | -----==- _GNU_ | ----==-- _ generation Marc Lehmann +-- ---==---(_)__ __ ____ __ p...@goof.com |e| --==---/ / _ \/ // /\ \/ / http://schmorp.de/ --+ -=====/_/_//_/\_,_/ /_/\_\ XX11-RIPE | | - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> > >I am not sure how you wrote the above characters. According to UTF-8, > > >characters with codepoints above 0x79 require two bytes so that to be > > >valid. When you compose "ö" (you press something like ";", then "o") in > > >the console?
> > ö is a "native key" on my keyboard, i.e. i do not need to play with compose to > > generate ö.
> Aaahh ;), you've should said that before. The whole problem with the > kernel is with the compose tables. If you have a native key for "ö" in > your keyboard you'll not have problems. I can type for example a 'n > with tilde' in my keyboard because is too is a native key, but for > accentuated characters, for utf-8 output is neccesary to apply the patch :-/
And that's the whole issue.
As soon as the kernel is in Unicode mode for the console, currently there is no way to input accented characters through a dead key (composed). Some years back when 8-bit encodings where used there was no problem, however now all distros are broken with regards to this.
I do not know what is the next step to consider adding the patch. Do we get a kernel maintainer related to console I/O speak up and say "Hmm, I *might* consider a patch, if I see it and people say they are happy"?
>> Aaahh ;), you've should said that before. The whole problem with the >> kernel is with the compose tables. If you have a native key for "ö" in >> your keyboard you'll not have problems. I can type for example a 'n >> with tilde' in my keyboard because is too is a native key, but for >> accentuated characters, for utf-8 output is neccesary to apply the patch :-/
>As soon as the kernel is in Unicode mode for the console, currently >there is no way to input accented characters through a dead key >(composed). >Some years back when 8-bit encodings where used there was no problem, >however now all distros are broken with regards to this.
Take it; AFAIK, the DOS box in Windows XP does not support UTF-8 either.
>I do not know what is the next step to consider adding the patch. Do we >get a kernel maintainer related to console I/O speak up and say "Hmm, I >*might* consider a patch, if I see it and people say they are happy"?
The proposed patch is working and that's ok. I am happy ÷) (first composed smiley hehe <compose><:><-><)> )
> > Aaahh ;), you've should said that before. The whole problem with the > > kernel is with the compose tables. If you have a native key for "ö" in > > your keyboard you'll not have problems. I can type for example a 'n > > with tilde' in my keyboard because is too is a native key, but for > > accentuated characters, for utf-8 output is neccesary to apply the patch :-/
> And that's the whole issue.
> As soon as the kernel is in Unicode mode for the console, currently > there is no way to input accented characters through a dead key > (composed).
True.
> Some years back when 8-bit encodings where used there was no problem, > however now all distros are broken with regards to this.
I guess that some distros use their own patches, like it seems with SuSE, but it's something that it's broken in the linux console and should be fixed.
> I do not know what is the next step to consider adding the patch.
Submitting the patch to lkml to discuss about its possible inclusion would be a good start. I don't know who's the console maintainer, Vojtech Pavlik perhaps?
On Sun, Dec 12, 2004 at 10:08:22PM +0000, Simos Xenitellis wrote: > I do not know what is the next step to consider adding the patch. Do we > get a kernel maintainer related to console I/O speak up and say "Hmm, I > *might* consider a patch, if I see it and people say they are happy"?
You can send me patches if you want. If I like them I'll submit them.
Very long ago I used to take care of console stuff.