Ini agent description

Ini agent description
  Purpose
  Specification
    Ini file entities
    Patterns
    Configuration file for ini files
      Configuration map
      Params:
      Comments:
      Sections:
      Subindent:
      Options:
      rewrite
    Configuration file for sysconfig files (former rc files)
  Interface for ini-agent
    Accessing keys
      All-at-once Read/Write
    SysConfig/flat mode
    Final write
  TODOs
    Allowed section names and key names
    Browser
    Re-ordering
    Section renaming
    I am the highest authority!
  Examples
    Dialup providers (package providers)
    smb.conf
    /etc/lilo.conf
    Rewrite rules
  Implementation
  Changes


Purpose

The ini agent is a common agent to read a variety of ini files. The main purpose for this agent were configuration files for samba (/etc/samba.conf).

Specification

Texts like this mean that described feature will be implemented later. (That means when all other features are implemented.)

Ini file entities

The agent sees an ini file as containing these basic entities: entries, sections and comments. Entries have a name and a value and an optional comment. Sections have a name and can contain entries or other sections (TODO comments?). Entries and Sections may be of multiple kinds. (Different kinds of sections are useful for lilo.conf which distinguishes "image" and "other" sections. Different kinds of entries are currently used as a kludge to cope with single quoted, double quoted and unquoted values.) This general view can be restricted by some options (TODO link).

Ini file agent is line-oriented. It means that it reads input line-by-line and each line is processed separately. If it is required, line may continue to the next line using \ as the last character of the line. Such lines are read at once and combined into one string before further processing. See also the option line_can_continue.

Generalized ini file may look like this:

Comment
Key=Value
Key=Value
Comment
Begin-section-delimiter name
  Key=Value
  Key=Value
  Begin-sub-section-delimiter name
    Key=Value
    Key=Value
  End-sub-section-delimiter name
End-section-delimiter name

In most ini files there are no subsections. Classical ini file has no End-section-delimiter (and therefore it may have no subsections). In most ini files there are no Key/Values at the top-level. Definitely section name may be the same as key name. But maybe for some ini files, there may be more keys with the same name or more sections with the same name.

Patterns

Lets use regular expressions to find out what kind of line we are processing. We need regular expression to read and recognize line and a pattern-string to write anything back to file. Example:
To recognize section beginning we may use:
^\[([a-zA-Z0-9\-_]+)\]
Such section delimiter must start at the first column. Its name may contain [a-zA-Z0-9\-_]. No spaces are allowes. Pattern for write may be:
[%s]
There is no need to \n at the end of pattern. As mentioned above, agent is line-oriented so it writes \n automaticaly after each line shipped out.

Comments in ini files must occupy whole line by default. When they do not we have a small problem. Should we remove comments at first or process value at first? Example:
Values must be quoted by " and comments are delimited by #
Key = "Value # anything"
How will we understand this? Anyway it is solvable. There will be option that switches between 3 states:

See also the option comments_last.

Note: Comments starting from the beginning of line always have the top priority.

There may be Begin-section and End-section delimiter. Section end delimiter is not mandatory, but when there are no section delimiters, then there are no nested sections. We must specify, whether nested sections may be used. Default is: autodetect (which means, subsections are allowed if End-section delimiter is defined). We must say if toplevel values (not belonging to any section) are allowed. We might have more types of section delimiters.

There is only one problem with values. They may be by accident divided to more lines (without usage of backslash). Then we must detect if this is value that will continue on the next line. We will restrict this feature only to values that are quoted.

Of course it is possible to detect separated lines also in case no delimiters are used. It would mean to define regexp that detects that current line is not continuing line (so it would detect comments, empty lines, key=value pairs and section delimiters). But I believe we do not need it.


Configuration file for ini files

Example:

`IniAgent ("/file/name",
$[
  "params" : [
    $[
      "match" :     [ "regexp", "pattern" ],
      "multiline" : [ "regexp", "regexp" ],
    ],
  "comments" : [ "regexp", "regexp", ... ],
  "sections" : [
    $[
      "begin" : [ "regexp", "pattern" ],
      "end"   : [ "regexp", "pattern" ],
    ],
    ...
  ],
  "subindent": "string",
  "rewrite" : [
        [ "regexp", "pattern", ],
        [ "regexp", "pattern", ],
         ...
  ],
  "options"  : [ "option1", "option2", ... ]],
    ...
  ]
]
`IniAgent (File specification, Configuration map)

File Specification:

Single file mode
In most cases you will specify single file here as a string.
Multiple files mode
If you specify list of file specifications (that may contain wildcards), these filenames are used as section names and content is read "into" corresponding sections. Please keep in mind, that in this mode you may create files as well! If you do Write (.....s.new_section, "#comment\n"); then when saving new file new_section is created.

You should use absolute paths or you will have to take extereme care not to cd in your program :-)

See also the option read_only.

Configuration map

Params:

The most important item in the configuratin map is params. It specifies how entries are parsed and written. Recall that there may be several kinds of entries, so params is a list. Each param is a map with a mandatory "match" item and an optional "multiline" item.

"match" items are input-output pairs: The first pair item is an extended regular expression contatining two subexpressions ("...(...)...(...)..."), the name and the value of the entry. It is used at parse time. The second pair item is a printf format string containing two %s placeholders for the name and the value. (TODO check how integers are handled. Misquote on output?)

Format of regexps: glibc regexps will be used. Hence the regexp format must be understood by glibc. (try man 7 regex). See also the option ignore_case_regexps.

See also the options prefer_uppercase and first_upper.

"multiline" : [ "begin_re", "end_re" ]
- if there may be values spread over more than one line this should define its parsing. Please note that main purpose for this are lines broken by accident, for example if some editor breaks longer lines. Example:

      Key="value value
      still value
      still value"

Then begin regexp is: ([^=]+)="([^"]*) and end regexp is ([^"]"). These are compared at the end so they are the last possibility. But once we get into this "divided line" by accident, it becomes greedy, so be carefull to forgotten ". If "multiline" is not present, this mechanism does not take in effect of course.

See also the option join_multiline.

"names" : [list]
- list of allowed names
- now list may contain only strings. Maybe it will be enriched to regexps too.

See also the option global_values.

Comments:

A list of regular expressions to check. Note that if you combine all expressions that identify string into one, you will have faster processing. If you allow comments not starting at the first colunm, you must add "[ \t]*" before comment regexp. If you want to allow only comments on single line, prepend ^ before regexp.

See also the option no_finalcomment_kill.

See also the option comments_last.

Sections:

There may be more sections defined under the key "sections" to allow more types of section beginnings/ends, so this is a list. Each list item is a map with a mandatory "begin" pair and an optional "end" pair.

"begin": ["regexp", "pattern"]
- regexp to find section begin and pattern to write section begin

"end" : ["regexp", "pattern"]
- ...

See also the options prefer_uppercase and first_upper.

"names" : [list]
- list of allowed names
- now list may contain only strings. Maybe it will be enriched to regexps too.

See also the option no_nested_sections.

Subindent:

This string will be added before each data line in subsections. If you want to have indented subsections, use this.

Example: "subindent" : " ", or "subindent" : "\t".

Formerly the comments were indented too but it turns out that there are some configuration files that allow indenting data but not comments. Also the comment indentation would grow endlessly.

Options:

"ignore_case_regexps":
case is not significant when processing regexps.
"ignore_case":
case is not significant when reading keys/section names. Does not work with multiple files.
"first_upper"
if "ignore_case", save first letter upper and the other lower in keys and sections
"prefer_uppercase"
if "ignore_case", save keys and sections in upper case
"line_can_continue"
if line ends with backslash it continues on the next line
"no_nested_sections"
nested sections are forbidden and reported as error in log file
"global_values"
values at the top level are allowed
"repeat_names"
there may be more keys with the same name and more sections with the same name. Note that sections with the same name as keys are allowed.
This option also disables merging of data if the file is modified externally.
"comments_last"
line is parsed first for single-line comments, then for [key,value] and finally for additional comment. This comment is moved above the [key,value] pair
"join_multiline"
multiline values (parsed by "params":"multiline") are connected into one using a space character.
"no_finalcomment_kill"
do not kill empty spaces at the end of last comment
"read_only"
does not write file at the end
"flat"
special mode for files with flat structure -- only values without sections. Read/Write/Dir commands work without need to specify what to read (key/section) because everything is key.

rewrite

This list takes in effect only if multiple files are specified. There are rules for rewriting file name to section name and pattern back from the section name to file name. Example:

"rewrite" : [
    [ "/etc/sysconfig/network/isdn/(*)$",  "/etc/sysconfig/network/isdn/%s", ],
    [ "/etc/sysconfig/network/modem/(*)$", "/etc/sysconfig/network/modem/%s", ],
],

If filename of the processed file matches the pattern, its first subpattern is taken as a section name. When saving the section, filename is created as printf(pattern, section_name). First rewrite rule that matches is used for file name to section name mapping. If file name doesn't match to any rule it is let untouched.
Be extremly careful when using rewrite rules. There are 2 possible caveats.

Configuration file for sysconfig files (former rc files)

Ini agent may be easily used for parsing files in sysconfig directory, but care must be taken to keep the files clean. No such mess as in rc.config is allowed. Please do not place bash code into sysconfig files. Only key="value". Nothing else.

# comment

.mount.path
`ag_ini(
        `SysConfigFile (filename)
)

A SysConfigFile call is just a shortcut to an IniAgent call. The equivalent IniAgent is:

`IniAgent (filename,
    $[
        "options" : [ "line_can_continue", "global_values", "join_multiline", "comments_last", "flat", ],
        "comments": [ "^[ \t]*#.*$", "#.*", "^[ \t]*$", ],
        "params" : [
            $[
                "match" : [ "([a-zA-Z0-9_]+)[ \t]*=[ \t]*\"([^\"]*)\"", "%s=\"%s\"" ],
                "multiline" : [ "([a-zA-Z0-9_]+)[ \t]*=[ \t]*\"([^\"]*)", "([^\"]*)\"", ],
            ],
            $[
                "match" : [ "([a-zA-Z0-9_]+)[ \t]*=[ \t]*([^\"]*[^ \t\"]|)[ \t]*$", "%s=\"%s\"",],
            ],
        ],
])

Interface for ini-agent

Assume, agent is mounted on path .ini

Accessing keys

Because ini file structure may be hierarchical and there can be sections and keys with any name, we must use some keyword to distinguish between key/sections/comments. Paths will look like this:

.ini.value.sectionname.sectionname.key identifies value that belongs to key.
.ini.value_comment.sectionname.sectionname.key identifies comment that belongs to key.
.ini.value_type.sectionname.sectionname.key identifies type of the key, which is the index of rule this key was read by.
.ini.section.sectionname.sectionname identifies section key.
.ini.section_comment.sectionname.sectionname identifies comment that belongs to section key.
.ini.section_type.sectionname.sectionname identifies type of the section, which is the index of rule this section was read by.
.ini.section_private.sectionname a boolean write-only property for sections corresponding to files. If true, the file will not be readable by group and others.
.ini.section_file.sectionname.sectionname identifies type of the section, which is the index of rewrite rule this section was read by.
(So far, this is accessed by section_type if there are rewrite rules. Yuck.)
.ini.all.sectionname.sectionname
.ini.all
Accesses all the contents of a section or the whole file at once.
Abbreviations v,s,vc,sc,vt,st,sf are allowed. Imagine what happens if we decide not to use these prefixed indentifiers. If there is section S1 with subsection comment, then we would not know what .ini.S1.comment means.
These paths work with Read and Dir. Write is a special case. You may write new values and their comments. You may write new sections and their comments. But when you write section, section name is encoded in path, so parameter passed to Write is always understood as comment.
All Reads return strings. All Writes expect strings or nil as parameter. If nil is passed to Write, value or section is removed. All Dirs return list of strings.
In write, only strings can be section names and only strings or integers may be values. Integers are converted to strings when writting and are read back as strings.
If Write is requested to write (create) key or section on non-existing path, the sections on path are created (like in recursive mkdir).

With the repeat_names option, a list of values is returned/expected for a given key where a single value would be returned/expected without the option. That is also true if there is only one such key or none.
Deleting only one instance of many values/sections with the same name is not possible. All of them are deleted by Write (.ini.value|section.foo.bar, nil)

All-at-once Read/Write

Read (.ini.all.sectionname.sectionname) or simply Read (.ini.all) returns a map as follows:
    $[
      "kind": "section",
      "name": "Foo",
      "type": 1,                        /* section_type */
      "file": -1,                       /* "rewrite", section_file */
      "comment": "huhly",
      "value": [ /*recursion*/ ],
    ],
where the "value" list contains maps describing subsections (see above) or values:
    $[
      "kind": "value",
      "name": "Foo",
      "type": 0,                        /* value_type */
      "comment": "blah",
      "value": "Bar",
    ],
Write (.ini.all.section1.section2, new_section2_map) replaces section2 with new_section2. Write (.ini.all, new_map) replaces all the data.

Multiple names of the same kind are allowed.

With multiple files, files must be erased explicitly by writing nil to the section value. (Works since yast2-core-2.13.16.)

SysConfig/flat mode

This mode is used for reading sysconfig-files and is compatibile with rc-agent. So values are accessible under .ini.key and their comments are accessible under .ini.key.comment. Value may be removed by storing nil into it. If you specify unexisting key when writting value, new value will be created. You may not create value by writting its comment.

Writes are Cached

Changes are written to disk when Write (.ini, nil) is called or in agent destructor.
Agent handles external change of ini file, in other words, it writes only changed values (except when repeat_names is used). If you do not call Write at the end, agent writes changed data to disk in the destructor.
By default file is written only if something changed. File may be forced to be written by calling Write (.ini, "force");
If you do not want agent to write changed data, call Write (.ini, "clean");. It will set dirty flag to false. Use with care! It does not revert data to their original values! Most save is using it just before leaving module (and destroying the agent).
When write is unable to open some file for writting, it returns false. True is returned otherwise.

TODOs

I am going to implement following features when someone requests them. I do not want to fill the agent with features that will be never used.

Allowed section names and key names

There can be list of allowed names for sections/keys. No problem to implement, just tell if you need it. But do you really need it?

Browser

There could be some Read function that would return structure that could be passed to tree widget. It is really the fastest way to create it in C++ but I am not sure if it is possible to do it enough versatile. Another problem is with changes.

Re-ordering

We should be able to change order of values/sections.

Section renaming

We should be able to rename section. Easy to implement. Does anybody need it?

I am the highest authority!

Client should be able to tell agent not to re-read externally changed file. Sometimes, when client completly re-creates file, its re-read could cause big mess.

Examples

Each features should be tested in testsuite. Look for an inspiration there. Here are some most common or interesting examples:

Dialup providers (package providers)

.ini

`ag_ini(
  `IniAgent( [ "/usr/share/providers/*.wvdial.conf" ],
    $[
      "options" : [ "read_only" ],
      "comments": [ "^[ \t]*#.*", "^[ \t]*$" ],
      "sections" : [
        $[
        "begin" : [ "[ \t]*\\[Dialer[ \t]+(.*[^ \t])[ \t]*\\][ \t]*", "[Dialer %s]"        ],
      ],
      "params" : [
        $[
        "match" : [ "^[ \t]*([^=]*[^ \t=])[ \t]*=[ \t]*(.*[^ \t]|)[ \t]*$" , "%s = %s"   ],
    ],
    ]
  )
)

smb.conf

.ini

`ag_ini(
  `IniAgent("/etc/smb.conf",
    $[
      "options" : [ "no_nested_sections", "ignore_case",
                    "line_can_continue",
                ],
      "comments": [ "^[ \t]*;.*", ";.*", "^[ \t]*$" ],
      "sections" : [
        $[
        "begin" : [ "[ \t]*\\[[ \t]*(.*[^ \t])[ \t]*\\][ \t]*", "[%s]" ],
        ],
      ],
      "params" : [
        $[
        "match" : [ "^[ \t]*([^=]*[^ \t=])[ \t]*=[ \t]*(.*[^ \t]|)[ \t]*$" , "   %s = %s"],
      ],
    ],
    ]
  )
)

/etc/lilo.conf

.ini

`ag_ini(
  `IniAgent("/tmp/lilo.conf",
    $[
      "options" : [ "no_nested_sections", "ignore_case",
                "global_values",
                ],
      "subindent" : "  ",
      "comments": [ "^[ \t]*#.*", "#.*", "^[ \t]*$" ],
      "sections" : [
        $[
        "begin" : [ "other[ \t]*=[\t ]*(.*[^ \t])[ \t]*$", "  other\t= %s" ],
        ],
        $[
        "begin" : [ "image[ \t]*=[\t ]*(.*[^ \t])[ \t]*$", "  image\t= %s" ],
        ],
      ],
      "params" : [
        $[
        "match" : [ "^[ \t]*([^=]*[^ \t=])[ \t]*=[ \t]*(.*[^ \t]|)[ \t]*$" , "%s = %s"],
        ],
        $[
        "match" : [ "^[ \t]*([^=]*[^ \t=])()*$" , "%s"],
        ],
    ],
    ]
  )
)

Rewrite rules

See testsuites, multi/rewrite*. TODO FIXME: insert an example here.

Implementation

Modes:
  normal
  in-multiline

read line
if comment matches line
  -- add this line to last comment

if comments-first
  if line contains comment
    -- add comment to last comment
    -- strip comment off the line

if in-multiline mode
  if line matches params::current::multiline[1] (end)
    -- add value to last param->value
    -- set param->shipout = 1
  else
    -- add value to last param->value

else we are in normal mode
  if line matches section[*]->begin
    if !in-section
      -- create new sub-section and recursively-call parser
    if in-section && !defined (section[current_section]->end)
      -- unget line
      return
    if in-section && defined (section[current_section]->end) && subsections-allowed
      -- create new sub-section and recursively-call parser
    if in-section && defined (section[current_section]->end) && !subsections-allowed
      -- syntax error

  if line matches section[current_section]->end
    -- close section and return from recursive function
    next

  if line matches params[*]::match
    -- add pair key, value
    -- set param->shipout = 1

  if line matches params[*]::multiline[0]
    -- create pair key, value
    -- set param->shipout = 0

  if comments-last && contains line, comment
    -- add comment to last comment

  if param->shipout
    -- add last param to section
Note: The complete development documentation is available in the autodocs/ directory.

Changes

Martin Vidner <mvidner@suse.cz>,
based on original docs by Petr Blahos