DNS is the phonebook of internet. Comparison of DNS server software, be aware of alternatives. Cloudflare DNS course 内容很不错。
Issue
这个问题很有意思,最开始我并没有意识到这其实是个DNS问题,后来随着逐步深入排查,解决了一些有干扰的边边角角的错误,才发现。
问题的开始是当集群中docker registry 已经正常运行的时候,docker push 以及 docker pull不能正常工作,retry 超时。当时的push URL 是以hostname 为主的,比如:
1 | dal12-3m-3w-testcluster-03master-00.demo.ibmcloud.com:5000/is-realtime-busybox:latest |
如果以上docker push 操作在docker registry pod的宿主机上进行,还是不行,但把地址改成localhost 就可以了, 或则在其他机器上用host VM的public IP:
1 | localhost:5000/is-realtime-busybox:latest |
这让我首先意识到是域名解析的问题,我的第一反应是查看各个节点上的/etc/hosts
文件,完全没问题, ping
命令也OK,很奇怪。
让我们来再仔细的检查一下域名配置:
参考这篇文章, 查看/etc/nsswitch.conf
可知域名查询时的顺序, 值得注意的是,有的malicious scripting或病毒可能会更改你的nsswitch.conf文件。
1 | #hosts: db files nisplus nis dns |
files就是指/etc/hosts
, dns 指DNS server,说明确实是先看local file /etc/hosts
的。
查看/etc/resolv.conf
,这个就是DNS server的地址了,貌似也没啥问题。
1 | nameserver 10.0.80.11 |
我猜想有的命令可能不会使用local DNS file /etc/hosts
,试了试host
command,果然如此:
Why does the host command not resolve entries in /etc/hosts?, 看来docker push/pull 也是如此。
这个答案还告诉了我另一个命令getent
,对于查询/etc/hosts
挺方便的。
1 | getent hosts halos1 |
You will find that dig
and nslookup
behave the same way as host
, the purpose of all of these commands is to do DNS lookups, not to look in files such as /etc/hosts
.
后来我让别人把master node的域名和IP加入到集群访问的DNS Server中,问题就解决了!
所以,下次遇到类似问题,除了检查本地DNS配置和文件,还要用host
command试一下,看看外部DNS Server是否工作正常,最重要的是,有的命令不会使用/etc/hosts
去查询。
resolv.conf
The file is a plain-text file usually created by the network administrator or by applications that manage the configuration tasks of the system. The file is either maintained manually, or rewriting by DHCP server. If wants to customize this file, need to disable
resolved serivce.
The process of determining IP addresses from domain names is called resolving
.
resolv.con file content explanation, or see man resolv.conf
:
1 | # local domain name suffix |
So if we lookup hostname xxx, the DNS will try to resolve xxx.service.consul followed by xxx.node.consul on localhost DNS server.
BIND
/etc/hosts
file is not enough as internet keep growing, in 1984 7 Top level domains got created.
DNS record type, for example A
record
BIND
DNS: is an acronym for Berkeley Internet Name Domain. Install DNS server using bind 9 on centOS 7, the package is called bind but service is called named:
1 | yum updates |
DNS information is stored in text file called zones
rdnc
command is used to control the named
service:
1 | # see dns version |
Let’s see the named systemd unit file, similar to zookeeper’s, also have an eye on ExecStartPre
using bash:
1 | # systemctl cat named |
As you see, /etc/named.conf
is the config file.
Zone
What is a DNS zone and zone file? A zone file is a plain text file stored in a DNS server that contains an actual representation of the zone and contains all the records for every domain within the zone.
For example, a local configuration:
1 | zone "example.com" IN { |
Run syntax checks on configuration and zone files:
1 | # no parameters needed |
Then you can create db.example
file accordingly, for example:
1 | $TTL 3h |
1 | # check syntax |
Dnsmasq
dnsmasq 是最常用的 DNS 缓存服务之一,还经常作为 DHCP 服务来使用。它的安装和配置都比较简单,性能也可以满足绝大多数应用程序对 DNS 缓存的需求.
Want Faster, Easier-to-Manage DNS? Use Dnsmasq Dnsmasq (short for DNS masquerade) is a lightweight, easy to configure DNS forwarder, designed to provide DNS (and optionally DHCP and TFTP) services to a small-scale network. It can serve the names of local machines which are not in the global DNS.
Dnsmasq accepts DNS queries and either answers them from a small, local cache
or forwards them to a real, recursive DNS server. It loads the contents of /etc/hosts
, so that local host names which do not appear in the global DNS can be resolved.
By default, Dnsmasq will use the DNS servers setup in your /etc/resolv.conf
file.
Dnsmasq will only access the first three sites listed in the resolv.conf file. I usually add one of the Google Public DNS servers, 8.8.8.8
or 8.8.4.4
and one of Cisco’s OpenDNS servers, 208.67.222.222
or 208.67.220.220
, and 1.1.1.1
operated by cloudflare to the default DNS site
While, you’re in the resolv.conf file, go ahead and add 127.0.0.1
localhost as the first line. This enables Dnsmasq to cache DNS queries for queries from the local machine.
1 | yum install -y dnsmasq |
config options for example:
1 | listen-address=127.0.0.1,10.0.2.15 |