How can I use gedit to create a RTF file?

**The Coder** · 11-09-2008, 12:41 PM

Most of the time I need a real simple text editor, just to type up some basic things, without really much formatting. Therefore I use gedit, which seems fine to me. However I would like to keep all my files as RTF documents so that I can share the files between Windows and Linux, if I have to. Recently I create a bunch of text documents in gedit. When I tried to open them up in OOWriter, it asks me what encoding I want. What is this all about? How can I just get the file to open up without this prompt? Can I get gedit to create the file as an RTF file?

**psych-major** · 11-09-2008, 02:03 PM

I believe gedit, like notepad in Windows, can create only text files. You would need to create the document in OOWriter in order to save it as an rtf.

**The Coder** · 11-09-2008, 03:21 PM

Ok, so gedit can only create text files. But why is it that when OOwriter open them that it asks about the character encoding etc, why not just open it up?

**psych-major** · 11-09-2008, 03:35 PM

Did you save them as .rtf?

**bwkaz** · 11-09-2008, 06:04 PM

Originally Posted by The Coder

Ok, so gedit can only create text files. But why is it that when OOwriter open them that it asks about the character encoding etc, why not just open it up?

Because, um, it doesn't know the character encoding?

Background:

There are about a million different ways to encode characters into bytes. Some are even standardized (e.g. ASCII, ISO8859-x (x from 1 to 15), KOI8-R, etc.). All of the encodings mentioned above are alike in that they represent each character (that they're able to represent) in one byte or less. ASCII uses 7 bits (all defined ASCII characters' bytes have the highest bit clear), while the others use all 8 bits. (The others are also "compatible" with ASCII, in that the first 128 bytes have the same characters as in ASCII. So if you have a true 7-bit-ASCII file, it can be loaded using any of these character sets, and you'll get the right characters.) But none of them is compatible with the others: they all assign different characters to the byte values above 127. KOI8-R assigns Cyrillic characters; ISO8859-1 assigns various characters with accents for Western European languages, etc.

There's also Unicode, which defines two different levels of conversion. One is from a character to a code point -- a given character always has the same code point value. Code points are (currently) up to 20-some bits long.

Code point assignments for ASCII characters (e.g. "LATIN SMALL LETTER A") have the same numerical value as that character does in ASCII (e.g. 97). Code point assignments for ISO8859-1 characters may match, but I'm not sure if they actually do. Code point assignments for any other 8-bit encoding will not match.

But then again, when you create a file using Unicode, you don't store the code point directly as bytes in the file, either: you use a separate encoding that translates code point values to byte sequences. Possible encodings are UTF-32, UTF-16, UTF-8, and UCS-2/4 (AFAIK UCS-2 should not be used because it can't represent the entire Unicode character range, and UCS-4 is the same as UTF-32).

In UTF-32, you store each code point as it is, embedded in a 32-bit word. So you're wasting (somewhere up to) 12 bits for every character, and even more than that if you're only using "low" code points (e.g. ASCII characters). And you can't take an ASCII file and interpret it as UTF-32, either: you'll be taking a group of four characters and trying to interpret them as one.

In UTF-16, you store any code point below 65536 as-is, and use combining characters (that is, pairs of 16-bit units) to represent the code points in the other planes. (In Unicode, each 16-bit code point group is a plane. The first -- the ones that UTF-16 can represent directly -- is the Basic Multilingual Plane, but it isn't quite big enough to store everything.) So most code points in UTF-16 are two bytes, but some are four.

(This is where UTF-16 and UCS-2 differ: UCS-2 doesn't allow combining characters. So it can only represent code points in the BMP; this isn't enough. UCS-2 does have a range of values that it can't use, though: these are the values that UTF-16 uses as the first of a combining character.)

(You also can't take an ASCII file and interpret it as UTF-16, for the same reason as UTF-32.)

In UTF-8, each code point is split into a variable number of bytes. The first 128 code points are stored in a single byte, and are the same as in ASCII (so you can interpret an ASCII file as UTF-8), but the next byte values don't match any of the old 8-bit encodings. If the high bit is set in the first byte of a character, that means that more bytes are to come: the number of bytes to come depends on which of the highest few bits are set in the first. And once you get to the last byte of a given code point, there's some kind of signal in its upper bits that it is actually the last byte -- this is for error checking. This ends up being able to encode all of the Unicode code points, but higher ones end up taking either 5 or 6 bytes to encode (or so), because you have to split up the bits between that many bytes, to leave enough room for the various flag bits.

ANYWAY.

What OO is asking you is which of those encodings it should use when it interprets the bytes in the file you gave it. Files don't store their encoding in the file (because most can't), and most filesystems don't store the encoding in the metadata either (though I believe Macs might). (To be fair, though, some file formats (like XML) do contain their encoding: but it has to be an encoding that lets you read far enough in the file to find the encoding, without interpreting any bytes incorrectly. RTF and plain-text are not formats of this type.)

All that OO sees is a bunch of bytes; it has no idea how to turn those into characters.

Now, if your file contained only plain ASCII characters (Latin letters, Arabic numerals, and some punctuation -- but no accents), then you can probably choose any encoding you like from the list, and it probably won't make any difference. (Just don't use EBCDIC.) But if you used UTF-8, and you used non-ASCII characters, then if you choose anything except UTF-8 as the encoding to use when opening the file, you'll see wrong characters. OTOH, if you used ISO8859-1 and used non-ASCII characters, but then try to open the file as UTF-8, the reader will probably break -- UTF-8 bytes have to have a certain structure, and your file's bytes very likely do not have that structure. If a UTF-8 reader encounters a malformed set of bytes, it raises some kind of error.

It has nothing to do with the file being RTF. It has everything to do with the file not advertising its own encoding (because it can't). Plain text files (which are probably what you have anyway, even though you named it <something>.rtf -- but this is fine as long as you don't need formatting, which gedit can't do anyway) will ask you the same question.

**ph34r** · 11-10-2008, 09:18 AM

If you really want an RTF editor, look at "ted"

**Darkbolt** · 11-16-2008, 03:13 AM

abiword will also do the job you're looking for

gedit is a text editor, not a word processor

Thread: How can I use gedit to create a RTF file?

Thread Tools

Display

How can I use gedit to create a RTF file?

Posting Permissions