Index SD-UX Mkpkg Mkbdl Glossary
SD-UX elements: bundle product subproduct fileset vendor

Mkpkg

  1. Introduction
  2. Overview
  3. Example
  4. Software packaging process
    1. Create manifest
    2. Determine dependencies
    3. Develop control scripts
    4. Gather components
    5. Assemble package
  5. Automation
    1. Package manifest generation
    2. Shared-library dependency detection
    3. Fileset and subproduct generation
    4. Assigning files to filesets
    5. Control script generation
    6. Error checking
  6. Actions
  7. Subproducts
  8. Filesets
  9. Control Scripts
  10. Build Attributes
  11. Defaults
  12. Internals
    1. Data structures
    2. Backends
  13. Software Library

Introduction

Mkpkg is a tool to help software publishers automatically create binary software distribution packages. It automates as much of the process as possible, and it provides functionality which catches most packaging mistakes that cause installed software to fail on user's machines.

Mkpkg is meant to be used after the software is debugged and can install itself on the packager's machine. Given software that is ready for distribution, mkpkg helps the publisher develop a description of the software package, including manifests, dependencies, and control scripts. Using mkpkg a publisher can create software packages with three minutes effort, even for large and complex software packages such as TeX.

Publishers use mkpkg actions to accomplish the various tasks associated with creating an installation package. In addition, publishers can specify or modify various bits of package configuration information through the various views of the package. Many actions will automatically fill in various bits of the package configuration, such as the manifest, but other configuration must be done by the publisher.

Packaging Overview

Mkpkg can help the user configure the package, build and save distributable copies of the software, and finally assemble the complete installation package.

In order to assemble a binary installation package, the installation tool needs to know a great deal about the software it is installing. The basic elements of an installation package are:

Package manifest
A list of all the files and directories that will be installed as part of the package
Dependencies
A list of all the packages that need to be present in order for the software to work correctly
Control scripts
A set of scripts which do the package-specific tasks necessary to complete the installation/de-installation of the package. This might include actions such as installing a configuration file, or adding a user to the remote system.
Package meta-data
Information about the package itself, such as whether the files may be relocated to a new location during installation

Example package

We will walk through the whole process of developing a simple package using mkpkg. This simple package will not utilize or require all of mkpkg's functionality, but it will demonstrate how to build a simple package with features common to most packages.

The first step is to prepare the software for packaging. Our software must be able to compile and install correctly on our machine. We should be able to automatically compile and install the software without human intervention. Of course, if we are building a package for pre-compiled software, we can skip compilation. In our case, we have a Makefile with three targets: all, install, and clean.

Our software sources are in less-1.0, and our package contains the following files:
/usr/local/bin/less
/usr/local/bin/X11/xless
/usr/local/lib/X11/app-defaults/Xless
/usr/local/man/man1/less.1
/usr/local/man/man1/xless.1
Since the package is small, and since it only has a few man pages, we will ship the whole product in a single fileset less-RUN.

We need to decide if we will distribute code that has been statically linked or dynamically linked. In general, software that is released as part of HP-UX will be dynamically linked, while software that is shipped by third parties or is shipped independently of HP-UX may be statically linked. The advantage of static linking is that the executables do not depend on specific shared libraries and are more likely to work correctly on a wider range of platforms, but at the cost of additional disk space consumption. Our package will be shipped with dynamically linked executables. We are now ready to begin building our Software Distributor (SD-UX) product.

We first start mkpkg within our project directory less-1.0. The mkpkg interface has a menu bar across the top. Under the View menu, we can see all of the pages that contain information that we may need to provide, verify, or modify. Under the Action menu there are all the actions that we need to produce an installation package.

First, we must provide mkpkg with enough information to be able to build, install, and locate the software in our product. We are currently viewing the configuration page for the product. You should notice that mkpkg has already provided default values for many of the attributes. Some of the defaults come from the default values on the system pages, but others have been computed. For example, the product name (less) and version (1.0) have been computed from the current directory name.

On the configuration page, we need to fill in the directory attribute with /usr/local. This attribute is used in the PSF file, but it is also used by mkpkg during manifest generation to locate the files installed as part of the package. In some cases, we may not know where every file will be installed. Mkpkg has a (long) list of directories to check for new software, in order to catch these wayward files.

We have told mkpkg where to look for installed files, now we need to tell it how to build and install our software. Go to the build page under the View menu. Here, we need to check the attributes build, install, and clean. In this case, the defaults make, make install, and make clean are correct because those are the targets used by our Makefile.

We need to create the manifest, so try the menu option Create file list on the Action menu. This action may take a long time, since it will compile and, install the software, and then it will search your system for newly installed files. For large packages, just compiling the software may take hours, while for large systems it may take hours to search the file system for installed files. Once this action is complete, you should see a fileset less-RUN and a subproduct Runtime under the View menu. You should now check every attribute on each page, correcting or providing information as necessary. You should also check that the fileset less-RUN contains all the files from our package (and no more!), and that the subproduct Runtime contains just one fileset. Also, less-RUN should have dependencies on various filesets from OS-CORE and X11 .

We are now ready to create the installation package with the Create dynamic package action. This action compiles and installs the software. It then copies the software to a "safe" place (check the parent directory of less-1.0, there should be a directory something like less-1.0__10.20__dynamic). It then generates a PSF file (in the "safe" place), and uses that PSF to create an installation package. The installation package is left in the parent directory.

Software packaging process

During software packaging the publisher must prepare all the elements needed by the installation tool. For many small packages, this is a very simple process, but for larger packages it can be quite difficult to ensure that all the elements have been properly and completely prepared.

Create manifest

The manifest is a list of all the files that will be installed by the package. For small packages it is easy to determine which files belong to a given package. However, for larger packages, such as X11R6 or TeX, it is not necessarily easy to locate all the files installed by the package. It is important to include all the files that belong to the package. Using manual techniques it is very easy to miss files and make mistakes.

Mkpkg can automatically determine which files were installed by the package on the publisher's machine. Since the technique is based on timestamps, it can reliably detect all files installed by the package. Independent system activity can update files and these files will incorrectly appear in the manifest. Most such files are modified by standard system daemons. Since mkpkg has a list of such files, it can automatically remove most of those spurious files from the manifest.

Determine dependencies

Many packages require other software in order operate correctly. For example, cvs uses rcs, so cvs depends on rcs. Packages can depend on other packages for a many reasons, but the two most common are caused by executing programs or linking with shared libraries from other . Publishers are usually aware of the dependencies caused by executing programs, but often overlook shared library dependencies.

Mkpkg automatically discovers all shared library dependencies by searching every executable in the package to create a list of all shared libraries used by the package. Mkpkg has a list of all the shared libraries installed on the publisher's machine and the package to which it belongs. It uses this list to identify all shared library dependencies.

When I first developed mkpkg, the vast majority of bugs were caused by shared library dependencies that I had overlooked. Once I added this module to mkpkg the number of bug reports diminished dramatically.

Develop control scripts

Some software packages require special configuration during the installation process. For example, many database systems require a special userid to be added to the system. The installation tools typically allow the publisher to add scripts to the package that are executed by the installation tool during software installation. Other scripts can be executed during de-installation to undo the actions and erase all trace of the software.

Writing these install/de-install scripts is very difficult, and many of the actions can be specified in a general fashion. In order to simplify the development of correct scripts the publisher simply specifies the desired result of executing the script, and mkpkg will generate all the scripts needed for the package.

Gather components

Once the package is specified, mkpkg will gather all the components, such as the customization scripts and installed files, and save them in a temporary location. In the case where multiple versions of a single package may be built (e.g. once with statically linked binaries and once with dynamically linked binaries), the system may gather multiple copies of the system and save them in different locations.

If the user does not add more files to the package manifest, then the user may modify the package configuration after the components have been gathered.

Assemble package

The last step is to assemble an installation package from the package configuration, customization scripts, and saved installation. Mkpkg will create a Product Specification File (PSF) and all the automatically generated customization scripts, and then it will use swpackage to assemble and generate the completed package.

Automation

Since many of the tasks associated with building binary installation packages are structured and common across packages, it is possible to automate most tasks. Accurate automation has the benefit of increasing the uniformity of package configuration and operation across packages.

Mkpkg has automated or partially automated the following tasks:

  1. package manifest generation
  2. dependency detection
  3. fileset and subproduct generation
  4. assigning files to filesets
  5. control script generation
  6. error checking

Package manifest generation

The first task faced by most package creators is creating a manifest, or list of all the files installed as part of the package. It is critical that all files belonging to a package be included in the package, so it is important to reduce human error. For some packages, creating a manifest is a trivial task that can be easily accomplished by visual inspection of the software. However, packages often include dozens of files, and some packages include thousands of files. In these cases, it is very difficult to manually generate a complete and accurate manifest.

Mkpkg can automatically generate a package manifest which includes all files installed as part of the software, and which may include some files not belonging to the package. The manifest generation scheme relies on file timestamps to detect files that were installed by the package. Mkpkg creates a new file that it will use as a timestamp, then it builds and installs the software. It then searches (part of) the file system for files with modification or creation times that are newer than the saved timestamp. Since running systems often have deamons which update log files independent of the software installation process, mkpkg has a list of "spurious files" which are removed from the raw list.

Shared-library dependency detection

Mkpkg automatically detects all shared library dependencies. It checks every file in the product to discover which shared libraries are used by the product. It has a list of all shared libraries on the system, with the fileset that contains each library. Mkpkg then automatically lists each fileset containing the linked shared libraries as a codependency.

Fileset and subproduct generation

Software Distributor allows a given product to contain multiple filesets and subproducts (or bundles of filesets). Hewlett-Packard has extensive standards for fileset and subproduct naming and semantics. For example, english-language man pages should be contained in XXX-MAN fileset, while foreign-language man pages should have a fileset per language (e.g. XXX-SPA-I-MAN for Spanish with ISO character set). Fortunately, it is possible to use simple regular expression patterns to recognize when particular filesets are needed. Similarly, there is an extensive set of standards for subproduct naming based on the filesets in a product (e.g. ManualsByLanguage include all filesets for non-english man pages).

Mkpkg has two ordered sets of rules for determining when to create filesets and subproducts. Each rule contains a regular expression, a threshold value, and a pattern. During fileset creation, the system iterates through the rules. It first creates a list of all files in the product that match the regular expression. If the number of files is greater than the threshold value, then a fileset is created using the pattern (if necessary), and all the matching files are assigned to the fileset.

Assigning files to filesets

The same rules that determine when to create filesets are used to assign files to filesets. This is particularly useful for large packages where manual assignment of files to filesets would be tedious. The same rules that are used to determine whether a fileset is needed can be used to assign files to those filesets.

Control script generation

One of the most difficult tasks is developing all the control scripts that customize the remote system. Fortunately, most control scripts are used to accomplish a handful of common tasks, and in many cases it is possible to automatically detect the need for these tasks.

Control scripts may be used at both the product and fileset levels. Mkpkg allows the user to specify customization actions at either level. The user specifies high-level actions, which mkpkg knows how to map into low-level script fragments for each of the ten possible control scripts. Mkpkg only generates control scripts when necessary, so it won't generate empty control files.

Error checking

In general, it is very difficult to perform error checking for binary packages. However, there are a number of common errors which can be detected. Mkpkg flags as many errors as possible, but there is still room for "pilot error."

Each attribute of a product, subproduct, or fileset can be marked as "required". Before assembling the package, mkpkg can check that every required attribute has a value.

Mkpkg can also check to ensure that hard links do not cross fileset boundaries. In other words, if two files are joined by a hard link, then they must be in the same fileset.

Actions

Users cause mkpkg to do work on their behalf through the various Actions. Each action automates a major step in the software packaging process. The actions available to the user are:
Create file list
Automatically builds a package manifest, creates filesets for the files according to the guidelines, and creates subproducts for the filesets according to the guidelines.
This action may take a very long time because it must compile and install the software, and then search the system for the newly installed files.
Create Static package
Automatically builds an installation package containing statically linked executables. It will build and install the software, copy the installed software to a safe place, write a PSF, and assemble a complete product depot.
Create Dynamic Package
Automatically builds an installation package containing dynamically linked executables. It will build and install the software, copy the installed software to a safe place, write a PSF, and assemble a complete product.
Search for system shared libraries
In order for mkpkg to correctly detect and assign the shared library dependencies, mkpkg must first build a complete picture of all shared libraries currently installed on the system. This action queries the packager's machine to determine which fileset contains each shared library.

Subproducts

Most products should have subproducts, and the subproduct names should conform to the conventions described in the Subproduct help. Mkpkg uses a set of rules to automatically generate subproducts based on the filesets contained in the product. Assuming that the product fileset names follow the fileset naming conventions, there should rarely be a need for the publisher to manually create any subproducts.

Filesets

The fileset is the atomic unit of installation; all other installation units (bundles, products, and subproducts) are composed of filesets. Filesets contain files, dependencies, and control files.

Fileset names should conform to the conventions described in the Fileset help. Mkpkg uses a number of rules to automatically generate filesets based on the files contained in the product. However, it is impossible to automatically follow all the guidelines (e.g. for -MIN or -AUX filesets), so the publishers must be aware of the guidelines and must use their best judgement in creating the filesets.

Build Attributes

clean
A single command that when executed will clean the build directory and remove any generated files (e.g. .o and executable files).

default is make clean

build
A single command that when executed will build/compile the software prepatory to installation.

default is make

install
A single command that when executed will install the software on the publisher's computer. Mkpkg copies this installation and builds the package that can replicate that installation on the end-user's machines.

default is make install

env variable
The name of the environment variable whose contents are modified based on the link type (e.g. static or dynamically linked executables).

default is CCOPTS

static compile flags
The value to which env_variable is set to build statically linked executables.

default is -Wl,-a,archive

dynamic compile flags
The value to which env_variable is set to build dynamically linked executables.

default is ""

Control Scripts

Writing control scripts is a difficult and exacting task, particularly since the publisher often needs to write several scripts for each action (e.g. configure, verify, and unconfigure). Mkpkg can automatically generate all the necessary scripts for most common configuration/customization tasks. The publisher merely specifies a list of actions, and mkpkg generates all of the necessary scripts with all the appropriate code fragments. In addition, mkpkg can automatically detect when some control actions are needed, such as adding directories to /etc/PATH.

The publisher can also specify control scripts. If mkpkg generates a control script for the same attribute, then the mkpkg script will execute the publisher's script which will be included in the fileset as an additional control file.

The configuration/customization actions which mkpkg can automatically generate are:

newconfig
install/deinstall a configuration file
daemon
start/stop a daemon process
obsolete
remove obsolete files installed by previous versions of the fileset
kernel
insert/delete a kernel driver/parameter
user
add/delete a user
group
add/delete a group

Defaults

Internals

Mkpkg has a very modular design that provides a framework for adding new modules and functionality. The basic unit of functionality is the operation. Actions are composed of sequences of operations. The user requests that mkpkg execute actions, such as "create the product manifest".

Operations have a uniform function interface, and mkpkg will execute each operation within an action in sequence. The operations may return an error code (in which case mkpkg may ask the packager if they would like to abort) and it can add text to the operation log. Operations are intended to function without user interaction, since mkpkg can be used by either a command-line interface or the GUI.

New functionality is added to the system by developing new operations, and then adding them to the appropriate action list or creating a new action.

Its greatest weakness its data structures for storing package configuration information.

Data structures

Mkpkg uses TCL arrays as the basic data structure container. There are two global arrays: database() and product(). Database() contains the system information that is used by all packages created on that system, while product() contains the information relevant to a particular product.

The array index is a comma-separated list of defining attributes of the data value. All of the context for a given piece of information is encoded in the index. For example, the list of prerequisites for mkpkg's fileset mkpkg-BIN is in product(mkpkg,fileset,mkpkg-BIN,prerequisite).

This system is cumbersome but effective for most of mkpkg's needs. As I have been developing the control script generation, its weaknesses for general hierarchical data have become more pronounced. Operations

Operations are the basic building block of mkpkg. Each operation is atomic, and may be used in many actions. Operations often modify the package or mkpkg state, but not always. For example, one action creates the timestamp file used during manifest generation while another action builds the application. Not all actions modify state, some are used to provide error checking.

Backends

The backends provide the installation system-specific code. Mkpkg is structured so that it can easily produce packages for a variety of software installation tools. In the past it has been able to create installation packages for both update and ninstall.

The backends provide two functions: dumpPSF, and package. DumpPSF creates the PSF file for the package using all the state and information available. Package merely executes the backend-specific packaging program to create an installation package.


(C) Copyright 1994, 1995, 1996 Hewlett-Packard Company