Cookie Consent by Cookie Consent

Hacking diskimage-builder for Fun and Profit

Cloud Computing 2020-02-24 MonRead Time: 14 min.

Introduction

For the past year I have been responsible for redesigning and keeping a multi-region OpenStack deployment up to date. I will eventually go in-depth on what that entailed, but for now I will focus on my first challenge on that infrastructure, keeping the cloud images provided to the users up to date and customized according to defined local policies.

To achieve this, and after researching my choices, I chose diskimage-builder. It may seem overkill at first glance, but because of its modular architecture it called to my OCD nature and in the end you can make your own modules that can depend on existing modules, so yes it seemed perfect for the job. Besides, it is made by the Openstack team, so it is more than trustworthy.

Before I continue let me reword myself, this is used for generating stripped down Linux bootable cloud images, you can use them wherever you would like that supports booting the resulting filesystem. These images are cloud-ready and should be able to retrieve any metadata you configure it to fetch from your cloud provider.

At the end of this article you should be able to generate your weekly up to date cloud images in whatever CI solution you master =)

Requirements

I recommend having some background prior to delving in this use case, as it can get very complex when you need to debug more advance generation scenarios (i.e: CI). The cloud images provided by your favorite distribution should be enough to satisfy most cloud computing needs. This process helps in reducing the cloud image footprint to your bare needs, and allows you to customize it beyond the distribution's provided cloud images.

That said, I advise:

diskimage-builder

As stated in its documentation, diskimage-builder is a tool for automatically building customized operating-system images for use in clouds and other environments.

It includes support for building images based on many major distributions and can produce cloud-images in all common formats (qcow2, vhd, raw, etc), bare metal file-system images and ram-disk images. These images are composed from the many included elements; diskimage-builder acts as a framework to easily add your own elements for even further customization.

One of the main reasons I chose to use diskimage-builder was for its modular design, but I quickly discovered that it is also used extensively by the TripleO project and within OpenStack Infrastructure, which fit perfectly in our Openstack infrastructure.

Elements

"An element is a particular set of code that alters how the image is built, or runs within the chroot to prepare the image." [2]

"Images are built using a chroot and bind mounted /proc /sys and /dev. The goal of the image building process is to produce blank slate machines that have all the necessary bits to fulfill a specific purpose in the running of an OpenStack cloud. Images produce either a filesystem image with a label of cloudimg-rootfs, or can be customised to produce whole disk images (but will still contain a filesystem labelled cloudimg-rootfs). Once the file system tree is assembled a loopback device with filesystem (or partition table and file system) is created and the tree copied into it. The file system created is an ext4 filesystem just large enough to hold the file system tree and can be resized up to 1PB in size." [2]

Take this in... envision it, and now we will split the image process in its ten phases, which each element can go through if it needs to. [3]

  1. root.d
  2. extra-data.d
  3. pre-install.d
  4. install.d
  5. post-install.d
  6. post-root.d
  7. block-device.d
  8. pre-finalise.d
  9. finalise.d
  10. cleanup.d

root.d

Create or adapt the initial root filesystem content. This is where alternative distribution support is added, or customisations such as building on an existing image.

Only one element can use this at a time unless particular care is taken not to blindly overwrite but instead to adapt the context extracted by other elements.

runs:
outside chroot
inputs:
  • $ARCH=i386 OR amd64 OR armhf OR arm64
  • $TARGET_ROOT=/path/to/target/workarea

extra-data.d

Pull in extra data from the host environment that hooks may need during image creation. This should copy any data (such as SSH keys, http proxy settings and the like) somewhere under $TMP_HOOKS_PATH.

runs:
outside chroot
inputs:
  • $TMP_HOOKS_PATH
outputs:
None

Contents placed under $TMP_HOOKS_PATH will be available at /tmp/in_target.d inside the chroot.

pre-install.d

Run code in the chroot before customisation or packages are installed. A good place to add apt repositories.

runs:
in chroot

install.d

Runs after pre-install.d in the chroot. This is a good place to install packages, chain into configuration management tools or do other image specific operations.

runs:
in chroot

post-install.d

Run code in the chroot. This is a good place to perform tasks you want to handle after the OS/application install but before the first boot of the image. Some examples of use would be:

  • Run chkconfig to disable unneeded services
  • Clean the cache left by the package manager to reduce the size of the image.
runs:
in chroot

post-root.d

Run code outside the chroot. This is a good place to perform tasks that cannot run inside the chroot and must run after installing things. The root filesystem content is rooted at $TMP_BUILD_DIR/mnt.

runs:
outside chroot

block-device.d

Customise the block device that the image will be made on (for example to make partitions). Runs after the target tree has been fully populated but before the cleanup.d phase runs.

runs:
outside chroot
inputs:
  • $IMAGE_BLOCK_DEVICE={path}
  • $TARGET_ROOT={path}
outputs:
  • $IMAGE_BLOCK_DEVICE={path}

pre-finalise.d

Final tuning of the root filesystem, outside the chroot. Filesystem content has been copied into the final file system which is rooted at $TMP_BUILD_DIR/mnt. You might do things like re-mount a cache directory that was used during the build in this phase (with subsequent unmount in cleanup.d).

runs:
outside chroot

finalise.d

Perform final tuning of the root filesystem. Runs in a chroot after the root filesystem content has been copied into the mounted filesystem: this is an appropriate place to reset SELinux metadata, install grub bootloaders and so on.

Because this happens inside the final image, it is important to limit operations here to only those necessary to affect the filesystem metadata and image itself. For most operations, post-install.d is preferred.

runs:
in chroot

cleanup.d

Perform cleanup of the root filesystem content. For instance, temporary settings to use the image build environment HTTP proxy are removed here in the dpkg element.

runs:
outside chroot
inputs:
  • $ARCH=i386 OR amd64 OR armh OR arm64
  • $TARGET_ROOT=/path/to/target/workarea

cloud-init

Cloud-init is a method developed by Canonical for cross-platform cloud instance initialization. It is supported across major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations.

Cloud instances are initialized from a disk image and instance cloud metadata.

Cloud-init will identify the cloud it is running on during boot, read any provided metadata from the cloud and initialize the system accordingly. This may involve setting up the network and storage devices to configuring SSH access key and many other aspects of a system. [4]

You can see here the availability for different Linux distributions as well as compatibility for different cloud execution environments.

We should note that there are two main metadata types, EC2 and Openstack. They both have different API specifications but you can usually find support for Openstack metadata links across different public clouds, or even both, be sure to know your cloud provider!

We will not go much into cloud-init operation, just know that it exist and what purpose it serves. Of course, you are free to explore the documentation, just be careful with the rabbit-hole Alice =)

Creating our Image

For this article we will be building a CentOS 8 cloud-ready image.

CentOS has released its version 8 cloud images last month, conveniently the diskimage-builder developers already added the much needed changes to their centos-minimal element. Fortunately I noticed that the cloud-init package was missing from the generated image (I don't know if it is also missing from the upstream), so we should also add that package to our image, or else we will not have our precious ssh-key added to access any launched instances.

Setup

First we should get the tutorial files:

t0rrant@testing:~$ wget https://implement.pt/files/hacking-diskimage-builder-for-fun-and-profit/dib-tutorial.tar.gz
t0rrant@testing:~$ tar xzf dib-tutorial.tar.gz
t0rrant@testing:~$ cd dib-tutorial

Now we should create a python virtual environment using virtualenv wrapper and install diskimage-builder and its dependencies.

t0rrant@testing:~/dib-tutorial$ mkvirtualenv dib-elements
(dib-elements) t0rrant@testing:~/dib-tutorial$ pip install diskimage-builder

Project Structure

For modularity and learning purposes we will have two elements, centos-eight and my_base. The my_base element will contain settings that may be common between different images, be it CentOS, Debian, Ubuntu, Gentoo, or any others. Whereas the centos-eight will take care only of things specific to CentOS 8.

Let us start with the my_base element.

We should have a directory structure that looks something like this:

 my_base/
 ├── element-deps
 ├── environment.d
 │   └── 10-cloud-init-datasources
 ├── package-installs.yaml
 ├── pkg-map
 ├── post-install.d
 │   └── 01-cloud-init-override
 └── README.rst

the element-deps file describes which other elements should be run alongside ours:

1
2
3
4
5
6
7
8
9
base
cloud-init-datasources
dhcp-all-interfaces
enable-serial-console
growroot
openssh-server
package-installs
pkg-map
vm

Besides the "mandatory" base and vm elements, we have some important elements that we do not need to configure.

  • growroot - grow the root partition on first boot [5]
  • dhcp-all-interfaces - autodetect network interfaces during boot and configure them for DHCP [6]
  • enable-serial-console - start getty on active serial consoles. [7]
  • openssh-server - ensures that openssh server is installed and enabled during boot [8]

Using the cloud-init-datasources element allows us to configure from where does cloud-init fetch metadata to feed to our instance, and the DIB_CLOUD_INIT_DATASOURCES environment variable must be set on image creation. One way of doing this is using the environment.d stage, and in our example we setup only the Openstack data source. Let us create the 10-cloud-init-datasources file inside environment.d:

1
  export DIB_CLOUD_INIT_DATASOURCES="OpenStack"

Using both package-installs and pkg-map elements gives us a flexible way to manage package installation across different distributions. We use pkg-map to map an arbitrary name to a specific package name in a specific distribution family, or release, and with package-installs we define which of those arbitrary names we want installed in our image. So we create the package-installs.yaml file in the root of our element:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
curl:
dnsutils:
git:
htop:
iptables:
man:
nano:
netcat:
ntp:
ping:
resolvconf:
syslog:
tree:
vim:
wget:

and the corresponding package mapping in the pkg-map file, in json format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
{
    "release": {
      "centos": {
        "8": {
          "ntp": "chrony"
        }
      }
    },
    "family": {
        "redhat": {
            "curl": "curl",
            "dnsutils": "bind-utils",
            "git": "git",
            "htop": "htop",
            "iptables": "iptables",
            "man": "man-db",
            "nano": "nano",
            "netcat": "nc",
            "ntp": "ntp",
            "ping": "iputils",
            "resolvconf": "",
            "syslog": "rsyslog",
            "tree": "tree",
            "vim": "vim-enhanced",
            "wget": "wget"
        },
        "debian":{
            "curl": "curl",
            "dnsutils": "dnsutils",
            "git": "git",
            "htop": "htop",
            "iptables": "iptables",
            "man": "man",
            "nano": "nano",
            "netcat": "netcat-openbsd",
            "ntp": "ntp",
            "ping": "inetutils-ping",
            "resolvconf": "resolvconf",
            "syslog": "rsyslog",
            "tree": "tree",
            "vim": "vim",
            "wget": "wget"
        }
    }
}

Which in our case will map everything under ["family"]["redhat"] except for the ntp package, which we specified to be chrony for CentOS 8.

For consistency between different elements using the my_base element, we may want to override the cloud-init configuration that comes with each distro's cloud-init version. To do that we should use the post-install.d stage. Let us create the 01-cloud-init-override file in the post-install.d directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
## DISCLAMER: this file serves as a consistency point for all cloud-init options, if you need to enable modules do it
# here. If you need to add options, use the 50-cloud-init-config or 99-cloud-init file in individual elements
tee /etc/cloud/cloud.cfg <<EOF
cloud_init_modules:
  - migrator
  - seed_random
  - bootcmd
  - write-files
  - growpart
  - resizefs
  - disk_setup
  - mounts
  - update_etc_hosts
  - resolv_conf
  - ca-certs
  - rsyslog
  - mounts
  - apt-configure
  - rh_subscription
  - yum-add-repo
  - package-update-upgrade-install
  - users-groups
  - ssh

cloud_config_modules:
  # Emit the cloud config ready event
  # this can be used by upstart jobs for 'start on cloud-config'.
  - emit_upstart
  - snap
  - snap_config  # DEPRECATED- Drop in version 18.2
  - ssh-import-id
  - locale
  - set-passwords
  - grub-dpkg
  - apt-pipelining
  - ubuntu-advantage
  - ntp
  - timezone
  - puppet
  - chef
  - salt-minion
  - mcollective
  - disable-ec2-metadata
  - runcmd
  - byobu

cloud_final_modules:
  - fan
  - landscape
  - lxd
  - ubuntu-drivers
  - rightscale_userdata
  - scripts-vendor
  - scripts-per-once
  - scripts-per-boot
  - scripts-per-instance
  - scripts-user
  - ssh-authkey-fingerprints
  - keys-to-console
  - phone-home
  - final-message
  - power-state-change

users:
   - default

disable_root: true

EOF

Note: the files under post-install.d should have executable mode activated, or they will not be executed.

Now we can move on to our centos-eight element, here is our directory structure:

 centos-eight/
 ├── element-deps
 ├── package-installs.yaml
 ├── pkg-map
 ├── post-install.d
 │   └── 50-cloud-init-config
 └── README.rst

Taking a look at the dependencies (element-deps file) for our final element, centos-eight:

1
2
3
4
5
centos-minimal
my_base
epel
package-installs
pkg-map

we can see we are including the my_base element we created earlier, two new elements, and we are repeating the use for the package-installs and pkg-map. We include them again as we will want to specify in this element only a special case where we want the cloud-init package to be installed. We do not do this in the my_base element as other distributions may already have cloud-init installed, and they may or may not use the package manager, so we leave that option open.

Let us configure package-install.yaml for centos-eight:

1
  cloud-init:

and also the pkg-map:

1
2
3
4
5
6
7
8
9
{
    "release": {
      "centos": {
        "8": {
          "cloud-init": "cloud-init"
        }
      }
    }
}

The centos-minimal element will create a minimal image based on CentOS. The use of this element will require ‘yum’ and ‘yum-utils’ to be installed if you are using Ubuntu or Debian where you generate the image. Nothing additional is needed on Fedora or CentOS. [9]

The epel element installs the Extra Packages for Enterprise Linux (EPEL) repository GPG key as well as configuration for yum. [10]

In this case we want to make sure some of cloud-init's options are added to our already overridden configuration, so we will use the same stage as in the my_base element, but with another priority so that it runs after, in this case we name the file 50-cloud-init-config within the post-instal.d directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
yum-config-manager --save --setopt=updates.skip_if_unavailable=true
yum-config-manager --save --setopt=extras.skip_if_unavailable=true

tee --append /etc/cloud/cloud.cfg <<EOF
mount_default_fields: [~, ~, 'auto', 'defaults,nofail,x-systemd.requires=cloud-init.service', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys:   0
ssh_genkeytypes:  ~
syslog_fix_perms: ~
ssh_pwauth:   0

system_info:
  default_user:
    name: centos
    lock_passwd: true
    gecos: CentOS
    groups: [wheel, adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  distro: rhel
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

ntp:
  enabled: true
  ntp_client: chrony
  conf:
     service_name: chronyd
  servers:
    - 0.europe.pool.ntp.org
    - 1.europe.pool.ntp.org
    - 2.europe.pool.ntp.org
    - 3.europe.pool.ntp.org

timezone: Europe/Lisbon

EOF

That is it for the centos-eight element and we are now (almost) ready to generate our customized cloud-ready image

Generating the image

Now that we have our elements setup, we can setup our environment variables:

  • we define where disk-image-create can find our custom elements
  • we define the main distro we are creating
  • we define the distro's release (this is relevant for *-minimal elements)
(dib-elements) t0rrant@testing:~/dib-tutorial$ export ELEMENTS_PATH=elements
(dib-elements) t0rrant@testing:~/dib-tutorial$ export DISTRO=centos
(dib-elements) t0rrant@testing:~/dib-tutorial$ export DIB_RELEASE=8

Finally we can create our image:

(dib-elements) t0rrant@testing:~/dib-tutorial$ disk-image-create -x --no-tmpfs -o centos8.qcow2 block-device-mbr centos-eight

We can now upload the generated image to our cloud platform, i.e:

(dib-elements) t0rrant@testing:~/dib-tutorial$ openstack image create my-centos-eight --file centos8.qcow2 --disk-format=qcow2 --container-format=bare

Pick up some java, wheat or malt, you've earned it! Then go play with the image you have just uploaded and see if you can customize it even further.

Conclusion

In this article we went through all of the components required to create a cloud-ready image (almost) from scratch. We saw what are elements defined in diskimage builder, and what phases they go through in the image building process. Talked briefly about cloud-init and metadata types, and finished with an in-depth tutorial of building a customized cloud-ready CentOS 8 image.

Hopefully this was useful and you could replicate every example =)

Feel free to leave comments below, any improvement to this and other articles is always welcome.

References

[1]--, diskimage-builder - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[2](1, 2) -- , Developer Guide (Design) - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[3]-- , Developer Guide (Developing Elements) - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[4]-- , cloud-init Documentation, cloud-init 20.1. [link]
[5]--, growroot element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[6]--, dhcp-all-interfaces element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[7]--, enable-serial-console element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[8]--, openssh-server element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[9]--, centos-minimal element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]
[10]--, epel element - Openstack Documentation, diskimage-builder 2.33.1.dev11. [link]

Tags: linux centos openstack diskimage-builder yaml cloud-init cloud images filesystem


avatar
Manuel Torrinha is an information systems engineer, with more than 10 years of experience in managing GNU/Linux environments. Has an MSc in Information Systems and Computer Engineering. Work interests include High Performance Computing, Data Analysis, and IT management and Administration. Knows diverse programming, scripting and markup languages. Speaks Portuguese and English.

Related Content


comments powered by Disqus