OmegaT features highly customizable filters, enabling you to configure numerous aspects. File filters are pieces of code capable of:
Reading the document in some specific file format. For instance, plain text files.
Extracting the translatable content out of the file.
Automating modifications of the translated document file names by replacing translatable contents with its translation.
Most users will find the default file filter options sufficient. If this is not the case, open the main dialog by selecting Options → File Filters... from the main menu.
Warning! Should you change filter options whilst a project is open, you must reload the project in order for the changes to take effect.
This dialog lists available file filters. Should you wish not to use OmegaT to translate files of a certain type, you can turn off the corresponding filter by unticking the check box beside its name. OmegaT will then omit the appropriate files while loading projects, and will copy them unmodified when creating target documents. When you wish to use the filter again, just tick the check box. Click Defaults to reset the file filters to the default settings. To edit which files in which encodings the filter is to process, select the filter from the list and click Edit.
Five filters (Text files, XHTML files, HTML and XHTML files, OpenDocument/OpenOffice.org files and Microsoft Open XML files) have one or more specific options. To modify the options select the filter from the list and click on Options. The available options are:
Text files
Paragraph segmentation on line breaks, empty lines or never: if sentence segmentation rules are active, the text will further be segmented according to the option selected here.
PO files
Allow blank translations in the target
file: If on, when a PO segment (which may be a whole
paragraph) is not translated, the translation will be empty in the
target file. Technically, msgstr
will be empty. As this
is the standard behavior for PO files, it is on by default. If the
option is off, the source text will be copied to the target
segment.
XHTML Files
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.
Skip text matching regular expression: The text matching the regular expression will be skipped.
Do not translate the content attribute of meta-tags ... : the attribute key-value pairs, separated by commas, will be left untranslated
Microsoft Office Open XML files
You can select which elements are to be translated. They will appear as separate segments in the translation.
Word: Non-visible Instruction Text, Comments, Footnotes, Endnotes, Footers
Excel: Comments, Sheet names
Power Point: Slide Comments, Slide Masters, Slide Layouts
Global: Charts, Daigrams, Drawings, Wordart
Other Options:
Aggregate tags: If checked, tags without translatable text between them will be aggregated into single tags.
Preserve spaces for all tags: if checked, "white space" (i.e., spaces and newlines) will be preserved, even if not set technically in the document
HTML and XHTML files
Add or rewrite encoding declaration in HTML and XHTML files
Always (default), Only if (X)HTML file has a header, Only if (X)HTML file has an encoding declaration, Never
Translate the following attributes: the selected attributes will appear as segments in the Editor window.
Start a new paragraph on: the <br> HTML tag will constitute a paragraph for segmentation purposes.
Skip text matching regular expression: The text, matching the regular expression, will be skipped.
Do not translate the content attribute of meta-tags ... : the attribute key-value pairs, separated by commas, will be left untranslated
Text files
Paragraph segmentation on line breaks, empty lines or never: if sentence segmentation rules are active, the text will further be segmented according to the option selected here.
OpenDocument/OpenOffice.org files
You can select which of the following items are to be translated:
Index entries, Bookmarks, Bookmark references, Notes, Comments, Presentation notes, Links (URL), Sheet names
This dialog enables you to set up the source filename patterns of files to be processed by the filter, customize the filenames of translated files, and select which encodings should be used for loading the file and saving its translated counterpart. To modify a file filter pattern, either modify the fields directly or click Edit. To add a new file filter pattern, click Add. The same dialog is used to add a pattern or to edit a particular pattern. The dialog is useful because it includes a special target filename pattern editor with which you can customize the names of output files.
When OmegaT encounters a file in its source folder, it attempts to
select the filter based upon the file's extension. More precisely,
OmegaT attempts to match each filter's source filename patterns against
the filename. For example, the pattern *.xhtml
matches any file with the .xhtml
extension.
If the appropriate filter is found, the file is assigned to it for
processing. For example, by default, XHTML filters are used for
processing files with the .xhtml extension. You can change or add
filename patterns for files to be handled by each file. Source filename
patterns use wild card characters similar to those used in Searches. The '*' character matches zero or more
characters. The '?' character matches exactly one character. All other
characters represent themselves. For example, if you wish the text
filter to handle readme files (readme, read.me
, and
readme.txt
) you should use the pattern
read*
.
Only a limited number of file formats specify a mandatory
encoding. File formats that do not specify their encoding will use the
encoding you set up for the extension that matches their name. For
example, by default .txt
files will be loaded using
the default encoding of your operating system. You may change the source
encoding for each different source filename pattern. Such files may also
be written out in any encoding. By default, the translated file encoding
is the same as the source file encoding. Source and target encoding
fields use combo boxes with all supported encodings included.
<auto> leaves the encoding choice to
OmegaT. This is how it works:
OmegaT identifies the source file encoding by using its encoding declaration, if present (HTML files, XML based files)
OmegaT is instructed to use a mandatory encoding for certain file formats (Java properties etc)
OmegaT uses the default encoding of the operating system for text files.
Sometimes you may wish to rename the files you translate automatically, for example adding a language code after the file name. The target filename pattern uses a special syntax, so if you wish to edit this field, you must click Edit...and use the Edit Pattern Dialog. If you wish to revert to default configuration of the filter, click Defaults. You may also modify the name directly in the target filename pattern field of the file filters dialog. The Edit Pattern Dialog offers the following options:
Default is ${filename}
– full filename of
the source file with extension: in this case the name of the
translated file is the same as that of the source file.
${nameOnly}
– allows you to insert only the
name of the source file without the extension.
${targetLocale}
– target locale code (of a
form "xx_YY").
${targetLanguage}
– the target language and
country code together (of a form "XX-YY").
${targetLanguageCode}
– the target language
only ("XX").
${targetCountryCode}
– the target country
only ("YY").