Linux DNS Exploration

DNS is the phonebook of internet. Comparison of DNS server software, be aware of alternatives. Cloudflare DNS course 内容很不错。

Issue

这个问题很有意思,最开始我并没有意识到这其实是个DNS问题,后来随着逐步深入排查,解决了一些有干扰的边边角角的错误,才发现。

问题的开始是当集群中docker registry 已经正常运行的时候,docker push 以及 docker pull不能正常工作,retry 超时。当时的push URL 是以hostname 为主的,比如:

1
dal12-3m-3w-testcluster-03master-00.demo.ibmcloud.com:5000/is-realtime-busybox:latest

如果以上docker push 操作在docker registry pod的宿主机上进行,还是不行,但把地址改成localhost 就可以了, 或则在其他机器上用host VM的public IP:

1
localhost:5000/is-realtime-busybox:latest

这让我首先意识到是域名解析的问题,我的第一反应是查看各个节点上的/etc/hosts文件,完全没问题, ping命令也OK,很奇怪。

让我们来再仔细的检查一下域名配置: 参考这篇文章, 查看/etc/nsswitch.conf可知域名查询时的顺序, 值得注意的是,有的malicious scripting或病毒可能会更改你的nsswitch.conf文件。

1
2
#hosts:     db files nisplus nis dns
hosts: files dns

files就是指/etc/hosts, dns 指DNS server,说明确实是先看local file /etc/hosts的。

查看/etc/resolv.conf,这个就是DNS server的地址了,貌似也没啥问题。

1
2
nameserver 10.0.80.11
nameserver 10.0.80.12

我猜想有的命令可能不会使用local DNS file /etc/hosts,试了试host command,果然如此: Why does the host command not resolve entries in /etc/hosts?, 看来docker push/pull 也是如此。 这个答案还告诉了我另一个命令getent,对于查询/etc/hosts挺方便的。

1
getent hosts halos1

You will find that dig and nslookup behave the same way as host, the purpose of all of these commands is to do DNS lookups, not to look in files such as /etc/hosts.

后来我让别人把master node的域名和IP加入到集群访问的DNS Server中,问题就解决了!

所以,下次遇到类似问题,除了检查本地DNS配置和文件,还要用host command试一下,看看外部DNS Server是否工作正常,最重要的是,有的命令不会使用/etc/hosts去查询。

resolv.conf

The file is a plain-text file usually created by the network administrator or by applications that manage the configuration tasks of the system. The file is either maintained manually, or rewriting by DHCP server. If wants to customize this file, need to disable resolved serivce.

The process of determining IP addresses from domain names is called resolving.

resolv.con file content explanation, or see man resolv.conf:

1
2
3
4
5
6
7
8
9
# local domain name suffix
# obsolete only for search directive
domain service.consul
# Which Domain to search
search service.consul node.consul
# DNS server IP, up to 3
# ipv4 or ipv6
# query in order
nameserver 127.0.0.1

So if we lookup hostname xxx, the DNS will try to resolve xxx.service.consul followed by xxx.node.consul on localhost DNS server.

BIND

/etc/hosts file is not enough as internet keep growing, in 1984 7 Top level domains got created. DNS record type, for example A record

BIND DNS: is an acronym for Berkeley Internet Name Domain. Install DNS server using bind 9 on centOS 7, the package is called bind but service is called named:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
yum updates
yum install -y bind bind-utils

# list files in bind package
# -q: query
# -l: list option under -q
rpm -ql bind

/etc/logrotate.d/named
/etc/named
/etc/named.conf
/etc/named.iscdlv.key
/etc/named.rfc1912.zones
/etc/named.root.key
/etc/rndc.conf
/etc/rndc.key
/etc/rwtab.d/named
/etc/sysconfig/named
...

# if have firewall open 53 port
firewall-cmd --permanent --add-port=53/tcp
firewall-cmd --permanent --add-port=53/udp
firewall-cmd --reload

# enable and start the dns service
systemctl enable named
systemctl start named

# you will see 53 and 953 port
# 953 is for connecting with rdnc command
netstat -tnlp

# query
dig @localhost www.google.com
# check dns version
named -v

DNS information is stored in text file called zones

rdnc command is used to control the named service:

1
2
3
4
5
6
# see dns version
# system resources: CPU, threads, zones
# up and running state
rndc status
# reload config
rndc reload

Let’s see the named systemd unit file, similar to zookeeper’s, also have an eye on ExecStartPre using bash:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# systemctl cat named
[Unit]
Description=Berkeley Internet Name Domain (DNS)
Wants=nss-lookup.target
Wants=named-setup-rndc.service
Before=nss-lookup.target
After=network.target
After=named-setup-rndc.service

[Service]
Type=forking
Environment=NAMEDCONF=/etc/named.conf
EnvironmentFile=-/etc/sysconfig/named
Environment=KRB5_KTNAME=/etc/named.keytab
PIDFile=c

# check zone files
ExecStartPre=/bin/bash -c 'if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -z "$NAMEDCONF"; else echo "Checking of zone files is disabled"; fi'
ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} $OPTIONS
ExecReload=/bin/sh -c '/usr/sbin/rndc reload > /dev/null 2>&1 || /bin/kill -HUP $MAINPID'
ExecStop=/bin/sh -c '/usr/sbin/rndc stop > /dev/null 2>&1 || /bin/kill -TERM $MAINPID'
PrivateTmp=true

[Install]
WantedBy=multi-user.target

As you see, /etc/named.conf is the config file.

Zone

What is a DNS zone and zone file? A zone file is a plain text file stored in a DNS server that contains an actual representation of the zone and contains all the records for every domain within the zone.

For example, a local configuration:

1
2
3
4
5
6
7
8
9
10
11
zone "example.com" IN {
type master;
file "db.example";
allow-update { none; };
};

zone "2.0.10.in-addr.arpa" IN {
type master;
file "db.10.0.2";
allow-update { none; };
};

Run syntax checks on configuration and zone files:

1
2
3
4
5
# no parameters needed
sudo named-checkconf -v
# or
# check zone file syntax
sudo named-checkzone <zone name> <paht to zone file>

Then you can create db.example file accordingly, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$TTL 3h
$ORIGIN example.com.
example.com. IN SOA master.example.com root.example.com. (
2020012323 ; Serial
8h ; Refresh
4h ; Retry
1w ; Expire
1h ; Negative TTL
)
example.com. IN NS master.example.com.
master IN A 10.0.2.4
gw IN A 10.0.2.1
mail IN A 10.0.2.2
$GENERATE 101-200 student-$ IN A 10.0.2.$
; Alias
ns1 IN CNAME master.example.com.

; Mail Servers
nexample.com. IN MX 5 mail.example.com.
1
2
# check syntax
named-checkzone example.com db.example

Dnsmasq

dnsmasq 是最常用的 DNS 缓存服务之一,还经常作为 DHCP 服务来使用。它的安装和配置都比较简单,性能也可以满足绝大多数应用程序对 DNS 缓存的需求.

Want Faster, Easier-to-Manage DNS? Use Dnsmasq Dnsmasq (short for DNS masquerade) is a lightweight, easy to configure DNS forwarder, designed to provide DNS (and optionally DHCP and TFTP) services to a small-scale network. It can serve the names of local machines which are not in the global DNS.

Dnsmasq accepts DNS queries and either answers them from a small, local cache or forwards them to a real, recursive DNS server. It loads the contents of /etc/hosts, so that local host names which do not appear in the global DNS can be resolved.

By default, Dnsmasq will use the DNS servers setup in your /etc/resolv.conf file.

Dnsmasq will only access the first three sites listed in the resolv.conf file. I usually add one of the Google Public DNS servers, 8.8.8.8 or 8.8.4.4 and one of Cisco’s OpenDNS servers, 208.67.222.222 or 208.67.220.220, and 1.1.1.1 operated by cloudflare to the default DNS site

While, you’re in the resolv.conf file, go ahead and add 127.0.0.1 localhost as the first line. This enables Dnsmasq to cache DNS queries for queries from the local machine.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
yum install -y dnsmasq
# dnsmasq does not create its own un-privileged user and group
groupadd -r dnsmasq && useradd -rg dnsmasq dnsmasq
# add the user and group in conf file
# Config file: `/etc/dnsmasq.conf`


# if have firewall open 53 port
firewall-cmd --permanent --add-port=53/tcp
firewall-cmd --permanent --add-port=53/udp
firewall-cmd --reload

systemctl enable dnsmasq
systemctl start dnsmasq

config options for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
listen-address=127.0.0.1,10.0.2.15
port=53
domain-needed
bogus-priv
# no read /etc/hosts
no-hosts
dns-forward-max=100
cache-size=500
# no continue polling to update cahce
no-poll
# specify resolv file location
resolv-file=/etc/resolv.conf
# upstream dns server ip
server=
0%