LV Homepage

lv: a Powerful Multilingual File Viewer

The latest version is ver 4.21: Download

Japanese page is here .

Copyright
Feature
Download lv
Installation
Usage
Management of logical lines
Coding systems
Annotation about coding systems
Auto selection of a coding system
- Default coding system
- How does lv select a coding system?
Extension for displaying a colored text
Customization
Known bugs
Bug report
Release note
Acknowledgement
Reference

Copyright

: lv is a freeware. We grant you to use and copy lv and all contents of its archive. You are also permitted to modify lv and distribute the modified software if there is an obvious annotation which represents the software is lv-derived in your documentation. Naturally, we disclaim any kind of warranty around lv, that is to say, you can use lv on your own risk.
All rights reserved. Copyright (C) 1994,1997 by NARITA Tomio.

Feature

Multilingual file viewer
lv is a multilingual file viewer which enables you to read texts written in multiple languages, especially, in Asian languages. Apparently, lv looks like less, a representative file viewer on UNIX as you know, so UNIX people (and less people on other OSs) don't have to learn a burdensome new interface. lv can be used on MSDOS ANSI terminals and almost all UNIX platforms. lv is a currently growing software, so your feedback is welcome and helpful for us to refine the future lv.
Multiple coding systems
lv can decode and encode texts through many coding systems, for example, ISO 2022 based coding systems such as iso-2022-jp, and EUC (Extended Unix Code) like euc-japan. Furthermore, localized coding systems such as shift-jis and big5 are also supported. lv can be used not only as a file viewer but also as a code conversion filter among several coding systems.
Supporting the Unicode standard
lv provides Unicode facilities which enables you to read Unicode texts encoded in UTF-7 or UTF-8, and to convert their code-points between Unicode and another charset. You can display Unicode or foreign texts on your terminal, using the code conversion function among several charsets via Unicode. (However, MSDOS version of lv has none of the Unicode facility.)
Regular expressions of multi-bytes characters
Because keyboard input can be decoded in the same way as file input, you can input a string that consists of multi-bytes characters, and search the string as a regular expression.
Colored text through
lv can recognize ANSI escape sequences which make texts colored or decorated with character attributes. So you can display pre-decorated texts such as colored source codes generated by another software through lv on ANSI terminals.
Completely original
lv is a completely original software including no code drawn from less and other programs at all.

Download lv

: You can download lv ver 4.21 archive. Changes between older versions are described in release note (in Japanese).

Installation

lv is written in ANSI C, so an ANSI C compiler is required to make lv. lv binary for MSDOS is made by LSI C-86 Compiler ver 3.30 [Aug 01 1991] (limited version of LSI C-86 for sample usage). Meanwhile, please use gcc or ANSI C supported cc to compile lv on UNIX.
lv is configurable for BSD or SystemV on UNIX platforms, and also for termcap or terminfo terminal controls. You have to edit Makefile to switch compiler flags corresponding to your environment (or add the flags if there is no adequate ones).
Pre-configured compiler flags are available for gcc on SunOS, FreeBSD, NEWS-OS (RISC News, CISC News), Solaris, HP-UX, Linux, or cc on IRIX.
Please take notice that the target name for MSDOS binary is ``dos''.
Installation:
1. Expand lv archive, using LHa on MSDOS, or gunzip/tar on UNIX.
2. Edit ``CONFIG'' to configure compiler flags.
3. Execute ``make dos'' on MSDOS, or only ``make'' on UNIX.
4. Copy ``lv.hlp'', brief help description, to the same directory as lv.exe located on MSDOS, or /usr/local/lib/lv/ on UNIX.
However, lv archive for MSDOS does not contain source files. ``lv.exe'' is bundled instead.
MSDOS version of lv directly outputs ANSI escape sequences without regard to termcap and terminfo. In standard, you need an ANSI escape sequence driver named ``ANSI.SYS'' (or more sophisticated one) on MSDOS or DOS prompt on MS-Windows. Because WindowsNT does not seem to include such drivers for the DOS prompt in default, please look into the driver, when lv fails to handle the screen correctly.

Usage

How to run lv?
When you wish to display a file on a terminal, please run lv from command line as follows:

% lv <options> <file>

Or, using pipe or redirect:

% <another command> | lv <option>
% lv <option> < <file>

Compressed files that have suffix ``gz'', ``z'', or ``Z'' are extracted by lv using zcat. Please install zcat that can expand all of them, which is usually a symbolic link to gzip.
In case that standard output is connected not to a terminal but to redirect or pipe, lv works as a code conversion filter among several coding systems.
Command line options
-A<coding-system>
Set all coding systems to coding-system.
-I<coding-system>
Set input coding system to coding-system.
-K<coding-system>
Set keyboard coding system to coding-system.
-O<coding-system>
Set output coding system to coding-system.
coding-system
a: auto-select (input only)
c: iso-2022-cn
j: iso-2022-jp
k: iso-2022-kr
e: Extended Unix Code

ec: euc-china
ej: euc-japan
ek: euc-korea
et: euc-taiwan

u: UCS transformation format

u7: UTF-7
u8: UTF-8

l: iso-8859-1..9

l1..9: iso-8859-1..9

s: shift-jis
b: big5
r: raw mode
Code conversions:

iso-2022-cn, -jp, -kr can be converted into euc-china or -taiwan, euc-japan, euc-korea, respectively, and vice versa. shift-jis uses the same internal code-points as iso-2022-jp and euc-japan. Similarly, big5 can be handled like iso-2022-cn and euc-taiwan inside lv. You can convert charsets among these coding systems from input to output.
The search function of lv does not work correctly when lv additionally performs ``code'' conversion (not ``coding system conversion''), because visible codes and internal codes are different from each other. This problem happens when you output CNS characters through big5, or in other conversions, especially, via Unicode. In such case, however, you can avoid such problem by reading a pre-converted stream. For example, when you wish to read a big5 stream on a UTF-8 terminal, you can use a pipe like this:
```
	``lv -Ib foo.big5 -Ou | lv -Au''
	
```
-W<number>
Screen width
-H<number>
Screen height
-z
Assert there is no delete/insert-lines control
Please set this option on MSDOS ANSI terminals which do not have control to delete and/or insert lines. Meanwhile, as to termcap and terminfo version, it will be set automatically.

-Ss<seq>
Set ANSI Standout sequence to (default "7")
-Sr<seq>
Set ANSI Reverse sequence to (default "7")
-Sb<seq>
Set ANSI Blink sequence to (default "5")
-Su<seq>
Set ANSI Underline sequence to (default "4")
-Sh<seq>
Set ANSI Highlight sequence to (default "1")

-T<number>
Set Threshold-code which divides Unicode code-points in two regions. Characters belonging to the lower region are assumed to have a width of one, and the higher characters are equated to a width of two. (Default: 12288)
-m
Force Unicode code-points which have the same glyphs as iso-8859-* to be Mapped iso-8859-* in a conversion from Unicode to another character set which also has the corresponding code-points, in particular, Asian charsets.

-c
Allow ANSI escape sequences for text decoration (Color)
-d
Make regexp-searches ignore case (case folD search)
-f
Substitute Fixed strings for regular expressions
-p
Force non-regular files to be Printed immediately
-s
Force old pages to be swept out from the screen Smoothly
-u
Unify several character sets, eg. JIS X0208 and C6226. In addition, lv equates ISO 646 variants and unknown charsets with ASCII.

-@
Clear all options
You can turn OFF specified options, using ``+<option>'' like +c, +d, ... +z.

-
Treat the following arguments as filenames

-v
Show lv version
-h
Show this help
Configuration
Options can be described in the configuration file ``.lv'' (``_lv'' on MSDOS) located at you home directory and/or current working directory. They can be also described in the environment variable LV.
Every configuration is overloaded in the following order if there is. Command line options are always read finally.
1. .lv located at your home directory
2. .lv located at current working directory
3. Environment variable LV
4. Command line options
Examples:
- MSDOS (Input coding system is shift-jis, Screen height is 25 lines)
  set LV=-Is -H25
- UNIX csh (Keyboard and output coding systems are both euc-japan)
  setenv LV '-Kej -Oej'
Run-time commands

0-9:
Argument
g, <:
Jump to the line number (default: top of the file)
G, >:
Jump to the line number (default: bottom of the file)
p:
Jump to the percentage position in line numbers (0-100)
b, C-b:
Previous page
u, C-u:
Previous half page
k, C-k, y, C-y, C-p:
Previous line
j, C-j, e, C-e, C-n, CR:
Next line
d, C-d:
Next half page
f, C-f, C-v, SP:
Next page
/<string>:
Find a string in the forward direction (regular expression)
?<string>:
Find a string in the backward direction (regular expression)
n:
Repeat previous search in the forward direction
N:
Repeat previous search in the backward direction (not REVERSE)
C-l:
Redisplay all lines
r, C-r:
Refresh screen and memory
R:
Reload the current file
C-g:
Show file information (filename, position, coding system)
V:
Show LV version
C-z:
Suspend (call SHELL or ``command.com'' under MSDOS)
q, Q:
Quit
How to input search strings?
You can input a string which consists of multi-bytes characters and search the string as a regular expression. lv's regular expression is similar to Mule's one.
The following keys have special meanings in the keyboard input:

C-m (CR)
Enter the current string
C-h (BS)
Delete one character (backspace)
C-u
Cancel the current string and try again
C-p
Restore a few old strings incrementally (history)
C-g
Quit
Regular expressions
- `. (period)'
  matches any single character. For example, ``a.b'' matches any three-character string which begins with `a' and ends with `b'.
- `*'
  constructs repetition of an expression more than 0 times. For example, ``ab*'' matches `a', `ab' `abb', etc.
- `+'
  constructs repetition of an expression more than once. For example, ``ab+'' matches `ab', `abb', but not `a'.
- `?'
  matches the preceding expression either once or not at all. For example, ``ca?r'' matches `car' or `cr'; nothing else.
- `[ ... ]'
  makes a character set. For example, ``[ab]+'' matches any string composed of just `a's and `b's. You can also include character ranges in a character set, by writing two characters with a `-' between them. For example, ``[a-z]'' matches any lower-case letter. If the characters implies a multi-bytes charset, lv makes a multi-bytes range, ordering code-points as unsigned integer. Mutually overlapping ranges (or charset) are not guaranteed.
- `[^ ... ]'
  makes a complemented character set. For example, ``[^a-z0-9A-Z]'' matches all characters *except* letters and digits.
- `^'
  matches the empty string at the beginning of a line.
- `$'
  is similar to `^' but matches only at the end of a line.
- `\'
  quotes the special characters.
- `\1'
  matches characters each of which has a width of 1 column.
- `\2'
  matches characters each of which has a width of 2 columns.
- `\|'
  specifies an alternative. For example, ``foo\|bar'' matches either `foo' or `bar' but no other string.
- `$ ... $'
  $, $ is a grouping construct. For example, ``ba$na$*'' matches `ba', `bana', `banana', etc.

Management of logical lines

Up to 1023 bytes per a logical line
lv manages file location pointers, separating lines by LF (line feed) logically. The length of a logical line is limited up to 1023 bytes, which is configurable, and lv insert a LF forcibly when a line has a length over 1023 bytes. Note that all of CRs (carriage return) are omitted during decoding.
Up to 64 physical lines per a logical line
A logical line is divided into physical lines to fall into the screen width. lv limits physical lines up to 64 lines per a logical line for management of them. Of course, you can change this limitation. Note that when a logical line has more than 64 physical lines, the rest of 64 are not displayed at all.
Limitation of the number of logical lines
The number of logical lines is also limited. Currently, lv can handle up to about 0.5 million lines on UNIX (65000 lines on MSDOS). This value is configurable. Not that lines which exceed this limitation cannot be displayed at all.

Coding systems

ISO 2022 based coding systems
lv handles ISO 2022 based coding systems as they are stateless on the logical line level. So you have to specify a coding system before decoding, and lv maybe adds redundant codes during encoding.
- iso-2022-cn
  RFC 1922 tailored coding system.
  
  G0 G1 G2 G3
  Designation ASCII GB 2312-80, CNS 11643-1992 Plane 1, ISO-IR-165 CNS 11643-1992 Plane 2 CNS 11643-1992 Plane 3..7
- iso-2022-jp
  RFC 1468 and 1554 tailored coding system. All 94charsets use G0, and all 96charsets use G2 with single shift inside lv.
- iso-2022-kr
  RFC 1557 tailored coding system. All charsets except ASCII use only G1 with locking shift inside lv.
Extended Unix Code
lv can decode mixture texts of euc-* and iso-2022-*, when you select euc-* as the input coding system.
- euc-china
  
  G0 G1 G2 G3
  Designation ASCII GB 2312-80 not used not used
- euc-japan
  
  G0 G1 G2 G3
  Designation ASCII JIS X 0208 JIS X 0201 Katakana JIS X 0212
- euc-korea
  
  G0 G1 G2 G3
  Designation ASCII KS C 5601-1987 not used not used
- euc-taiwan
  
  G0 G1 G2 G3
  Designation ASCII CNS 11643 Plane 1 CNS 11643 Plane 2-7 not used

	G0	G1	G2	G3
Designation	ASCII	GB 2312-80, CNS 11643-1992 Plane 1, ISO-IR-165	CNS 11643-1992 Plane 2	CNS 11643-1992 Plane 3..7

	G0	G1	G2	G3
Designation	ASCII	GB 2312-80	not used	not used

	G0	G1	G2	G3
Designation	ASCII	JIS X 0208	JIS X 0201 Katakana	JIS X 0212

	G0	G1	G2	G3
Designation	ASCII	KS C 5601-1987	not used	not used

	G0	G1	G2	G3
Designation	ASCII	CNS 11643 Plane 1	CNS 11643 Plane 2-7	not used

UCS transformation format

UTF-7
A Mail-Safe Transformation Format of Unicode. See RFC 1642 (Experimental) and UTF-7 Encoding Form .
UTF-8
8bit Unicode encoding. See UCS Transformation Format 8 (UTF-8).

lv can convert character codesets between Unicode and the following charsets: GB 2312-80, JIS X 0208, JIS X 0212, KSC 5601-1987, Big Five, CNS 11643-1992 Plane 1-2, and ISO 8859-1..9.

Currently lv's mapping table is based on Unicode 1.1.

Encoding Charset used for mapping
iso-2022-cn GB 2312-80 (primary), CNS 11643-1992 (secondary), (ISO 8859-*)
iso-2022-jp JIS X0208, JIS X0212, JIS X0201, (ISO 8859-*)
iso-2022-kr KSC 5601-1987, (ISO 8859-*)
euc-china GB 2312-80
euc-japan JIS X0208, JIS X0212, JIS X0201
euc-korea KSC 5601-1987
euc-taiwan CNS 11643-1992 Plane 1-2
shift-jis JIS X0208, JIS X0201
big5 Big Five

Encoding	Charset used for mapping
iso-2022-cn	GB 2312-80 (primary), CNS 11643-1992 (secondary), (ISO 8859-*)
iso-2022-jp	JIS X0208, JIS X0212, JIS X0201, (ISO 8859-*)
iso-2022-kr	KSC 5601-1987, (ISO 8859-*)
euc-china	GB 2312-80
euc-japan	JIS X0208, JIS X0212, JIS X0201
euc-korea	KSC 5601-1987
euc-taiwan	CNS 11643-1992 Plane 1-2
shift-jis	JIS X0208, JIS X0201
big5	Big Five

When you output Unicode CJK unified ideographs through iso-2022-cn, GB 2312-80 is used primarily, and the rest which are not included in GB are mapped into CNS 11643-1992.

Other coding systems
- iso-8859-*
  ASCII and one of ISO 8859/1-9 are designated on G0, G1 invoked to GL, GR, respectively.
- shift-jis
  lv can decode mixture texts of shift-jis and iso-2022-jp, when you select shift-jis as the input coding system.
  (*) euc-japan and shift-jis are mutually exclusive for decoding.
- big5
  Since big5 characters can be partially converted into CNS 11643-1992 Plane 1-2, lv can load big5 texts and output them through ISO 2022 based coding systems or euc-taiwan. In such case, however, the search function of lv does not work correctly, because visible codes and internal codes are different from each other. This problem also happens when you output CNS characters through big5. Several big5 characters which have no correspondence to CNS are output as ``?'' (question mark).
- raw mode
  No decoding and encoding are performed.

Annotation about coding systems

Handling of invalid codes
Characters belonging to invalid character sets, for example, JIS X 0212 for shift-jis, are printed as ASCII at its code-point up to originally supposed width.
Invalid characters which cause error state under specified coding system might be ignored partially. If it is printable, it will be output as a control character.
Backspace
BS (backspace) characters included in files are interpreted as follows:
- <char> BS <char>
  Highlighted <char>
- ``_'' BS <char>
  Underlined <char>
- Otherwise
  BS deletes a character on the left side of it.
How to look in a binary file?
Decoding of lv is robust even for binary files. You can look in a binary file and decode embedded strings in it. However, there might be ignored characters if you decode binary files through a particular coding system. Option -Ir, raw decoding, saves such ignored characters other than CRs.

Auto selection of a coding system

Default coding system
Default input coding system is auto-select described below. In auto selection state, lv decodes an input stream as iso-2022-kr. Default output coding system is iso-2022-jp on UNIX, or shift-jis on MSDOS (as long as Japanese version of lv).
If you don't specify any input coding system, that is, when auto-select is specified, lv will select input coding system automatically. Japanese version of lv can detect 8bit codes and select either euc-japan or shift-jis.
How does lv select a coding system?
Currently, auto selection is implemented in an easy way. When a 8bit code is found during file loading and the input coding system is auto-select, lv examines ``the first line that contains the first 8bit code''. If the line contains code-points in shift-jis region, the coding system may be shift-jis, otherwise it may be euc-japan. Meanwhile, auto selection state continues unless 8bit code is found, and the selection is performed on demand. If a text contains only JIS X 0201 Katakana in shift-jis, or by another unfortunate reason, it will be misinterpreted as an euc-japan text.
If the result of auto selection is incorrect and you know the input coding system, please set it by command line options, which disables auto selection.

Extension for displaying a colored text

Option -c enables ANSI escape sequences in the form of ESC [ ps ; ... ; ps m, where ps takes following values:
- 1: Highlight
- 4: Underline
- 5: Blink
- 7: Reverse
- 30: Black
- 31: Red
- 32: Green
- 33: Yellow
- 34: Blue
- 35: Magenta
- 36: Cyan
- 37: White
- 40-47: Reverse of 30-37
Every sequence is independent of one another. lv will reset all values before new value is set.
Every sequence is effective only within a logical line. On crossing logical lines, all attributes are reset automatically.
You can specify one color at once. When multiple colors are specified, the last one is effective.
As to reversed characters, a specified color is applied to the ``reversed background color''. You cannot specify the color of ``out-clipped characters''.
You can customize actual sequences to be output to the screen. Please specify them by option -S.

Customization

Customization for command key bindings
Please modify the keybind table in keybind.h.
Customization for terminal controls
When you add a new terminal control, please add codes to console.c. When you wish to change interpretation of escape sequences, please modify console.c and escape.c. However, some ANSI escape sequences are configurable through options.
Changing default screen size of MSDOS ANSI terminals
Default screen size is 80 columns by 24 rows. To change this, please modify console.c. However, screen size can be specified through options.
Changing default coding systems
Currently, Japanese version of lv uses following values:

MSDOS UNIX
Input: auto-select auto-select
Keyboard: shift-jis iso-2022-jp
Output: shift-jis iso-2022-jp

To change above, please modify lv.c. However, those coding systems can be specified through options.
Customization for coding systems
Currently, an ISO 2022 universal decoder, and EUC, shift-jis, big5, UTF-7, UTF-8 decoders are implemented. When you wish to add another coding systems, please add source codes, referencing ctable_t.h, ctable.c, encode.c, decode.c, iso2022.c, etc.
Customization for character sets
Please add your favorite character sets, referencing itable_t.h, itable.c, etc. Currently recognized character sets are itemized below. You have to specify code length (bytes) and graphical width (columns) of each character as attributes. There is no necessity that code length and graphical width equal each other. Current implementation does not support per character length, but you can specify the maximum length of them in itable, it may not cause problems. You cannot add charsets whose code length is more than 3 bytes. (If you desire to do it, you can add only little modification to lv, so up to 4bytes charsets can be supported by lv.)

ISO 646 United States (ANSI X3.4-1968)
JIS X0201-1976 Japanese Roman
JIS X0201-1976 Japanese Katakana
ISO 8859/1 Latin alphabet No.1 Right part
ISO 8859/2 Latin alphabet No.2 Right part
ISO 8859/3 Latin alphabet No.3 Right part
ISO 8859/4 Latin alphabet No.4 Right part
ISO 8859/5 Cyrillic alphabet
ISO 8859/6 Arabic alphabet
ISO 8859/7 Greek alphabet
ISO 8859/8 Hebrew alphabet
ISO 8859/9 Latin alphabet No.5 Right part
JIS C 6226-1978 Japanese kanji
GB 2312-80 Chinese hanzi
JIS X 0208-1983 Japanese kanji
KS C 5601-1987 Korean graphic charset
JIS X 0212-1990 Supplementary charset
ISO-IR-165
CNS 11643-1992 Plane 1
CNS 11643-1992 Plane 2
CNS 11643-1992 Plane 3
CNS 11643-1992 Plane 4
CNS 11643-1992 Plane 5
CNS 11643-1992 Plane 6
CNS 11643-1992 Plane 7
Big5 Traditional Chinese
Unicode

These charset are only recognized by lv, and it is depend on your terminal's capacity that you can actually display them or not.
Inversely, you can handle non-listed charsets above as latin-1 in such case as a 8bit coding system is displayed on a 8bit terminal. (If there is no code conversion and each character has one column.)

	MSDOS	UNIX
Input:	auto-select	auto-select
Keyboard:	shift-jis	iso-2022-jp
Output:	shift-jis	iso-2022-jp

Known bugs

When you use lv from sh on UNIX, the suspend command does not work well.

Bug report

: Please send a bug report to narita@mt.cs.keio.ac.jp when you find any bugs around lv.

Release note

: Click here. (in Japanese)

Acknowledgement

: Click here. (in Japanese)

Reference

JIS X 0202-1991 $B>pJs8r49MQId9f$N3HD%K!(B
Information processing - ISO 7-bit and 8-bit coded character sets - Code extension techniques
JIS X 0208-1990 $B>pJs8r49MQ4A;zId9f(B
Code of the Japanese graphic character set for information interchange
JIS X 0212-1990 $B>pJs8r49MQ4A;zId9f(B - $BJd=u4A;z(B
Code of the supplementary Japanese graphic character set for information interchange
RFC 1922 Chinese Character Encoding for Internet Messages
RFC 1642 (Experimental) UTF-7 - A Mail-Safe Transformation Format of Unicode
RFC 1557 Korean Character Encoding for Internet Messages
RFC 1554 ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP
RFC 1468 Japanese Character Encoding for Internet Messages
Understanding Japanese Information Processing ($B!XF|K\8l>pJs=hM}!Y(B)
Ken Lunde O'Reilly & Associates, Inc. ISBN 1-56592-043-0
CJK.INF Version 2.1 (July 12, 1996)
Online Companion to "Understanding Japanese Information Processing"
Ken Lunde
UTF-7 Encoding Form
UCS Transformation Format 8 (UTF-8)
Compilers - Principles, Techniques, and Tools
Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman Addison-Wesley, ISBN 0-201-10088-6

NARITA Tomio
email: narita@mt.cs.keio.ac.jp
Homepage: http://www.mt.cs.keio.ac.jp/person/narita.html

LV Homepage

Table of Contents

Copyright

Feature

Multilingual file viewer

Multiple coding systems

Supporting the Unicode standard

Regular expressions of multi-bytes characters

Colored text through

Completely original

Download lv

Installation

Usage

How to run lv?

Command line options

coding-system

Code conversions:

Configuration

Run-time commands

How to input search strings?

Regular expressions

Management of logical lines

Up to 1023 bytes per a logical line

Up to 64 physical lines per a logical line

Limitation of the number of logical lines

Coding systems

ISO 2022 based coding systems

Extended Unix Code

UCS transformation format

Other coding systems

Annotation about coding systems

Handling of invalid codes

Backspace

How to look in a binary file?