IT Cloud

Text
Read preview
Mark as finished
How to read the book after purchase
Font:Smaller АаLarger Aa

Stop the virtual machine:

docker-machine stop name_virtual_system

Start a stopped virtual machine:

docker-machine start name_virtual_system

Delete virtual machine:

docker-machine rm name_virtual_system

Connect to virtual machine:

eval "$ (docker-machine env name_virtual_system)"

Disconnect Docker from VM:

eval $ (docker-machine env -u)

Login via SSH:

docker-machine ssh name_virtual_system

Quit the virtual machine:

exit

Run the sleep 10 command in the virtual machine:

docker-machine ssh name_virtual_system 'sleep 10'

Running commands in BASH environment:

docker-machine ssh dev 'bash -c "sleep 10 && echo 1"'

Copy the dir folder to the virtual machine:

docker-machine scp -r / dir name_virtual_system: / dir

Make a request to the containers of the virtual machine:

curl $ (docker-machine ip name_virtual_system): 9000

Forward port 9005 of host machine to 9005 virtual machine

docker-machine ssh name_virtual_system -f -N -L 9005: 0.0.0.0: 9007

Master initialization:

docker swarm init

Running multiple containers with the same EXPOSE:

essh @ kubernetes-master: ~ / mongo-rs $ docker run –name redis -p 6379 -d redis

f3916da35b6ba5cd393c21d5305002b78c32b089a6cc01e3e2425930c9310cba

essh @ kubernetes-master: ~ / mongo-rs $ docker ps | grep redis

f3916da35b6b redis "docker-entrypoint.s…" 8 seconds ago Up 6 seconds 0.0.0.0:32769->6379/tcp redis

essh @ kubernetes-master: ~ / mongo-rs $ docker port reids

Error: No such container: reids

essh @ kubernetes-master: ~ / mongo-rs $ docker port redis

6379 / tcp -> 0.0.0.0:32769

essh @ kubernetes-master: ~ / mongo-rs $ docker port redis 6379

0.0.0.0:32769

Build is the first solution to copy all files and install. As a result, when any file changes, all packages will be reinstalled:

COPY ./ / src / app

WORKDIR / src / app

RUN NPM install

Let's use caching and split the static files and the installation:

COPY ./package.json /src/app/package.json

WORKDIR / src / app

RUN npm install

COPY. / src / app

Using the base image template node: 7-onbuild:

$ cat Dockerfile

FROM node: 7-onbuild

EXPOSE 3000

$ docker build.

In this case, files that do not need to be included in the image, such as system files, for example, Dockerfile, .git, .node_modules, files with keys, they need to be added to node_modules, files with keys, they need to be added to .dockerignore .

–v / config

docker cp config.conf name_container: / config /

Real-time statistics of used resources:

essh @ kubernetes-master: ~ / mongo-rs $ docker ps -q | docker stats

CONTAINER ID NAME CPU% MEM USAGE / LIMIT MEM% NET I / O BLOCK I / O PIDS

c8222b91737e mongo-rs_slave_1 19.83% 44.12MiB / 15.55GiB 0.28% 54.5kB / 78.8kB 12.7MB / 5.42MB 31

aa12810d16f5 mongo-rs_backup_1 0.81% 44.64MiB / 15.55GiB 0.28% 12.7kB / 0B 24.6kB / 4.83MB 26

7537c906a7ef mongo-rs_master_1 20.09% 47.67MiB / 15.55GiB 0.30% 140kB / 70.7kB 19.2MB / 7.5MB 57

f3916da35b6b redis 0.15% 3.043MiB / 15.55GiB 0.02% 13.2kB / 0B 2.97MB / 0B 4

f97e0697db61 node_api 0.00% 65.52MiB / 15.55GiB 0.41% 862kB / 8.23kB 137MB / 24.6kB 20

8c0d1adc9b9c portainer 0.00% 8.859MiB / 15.55GiB 0.06% 102kB / 3.87MB 57.8MB / 122MB 20

6018b7e3d9cd node_payin 0.00% 9.297MiB / 15.55GiB 0.06% 222kB / 3.04kB 82.4MB / 24.6kB 11

^ C

When creating images, you need to consider:

** changing a large layer, it will be recreated, so it is often better to split it, for example, create one layer with 'NPM i' and copy the code on the second;

* if the file in the image is large and the container is changed, then from the read-only image layer the file will be completely copied to the editing layer, therefore, the containers are supposed to be lightweight, and the content is usually placed in a special storage. code-as-a-service: 12 factors (12factor.net)

* Codebase – one service – they are a repository;

* Dependeces – all dependent services in the config;

* Config – configs are available through the environment;

* BackEnd – exchange data with other services via an API-based network;

* Processes – one service – one process, which allows in the event of a fall to unambiguously track (the container itself ends) and restart it;

* Independence of the environment and no influence on it.

* СI / CD – code control (git) – build (jenkins, GitLab) – relies (Docker, jenkins) – deploy (helm, Kubernetes). Keeping the service lightweight is important, but there are programs not designed to run in containers like databases. Due to their peculiarity, certain requirements are imposed on their launch, and the profit is limited. So, because of big data, they are not only slow to scale, and rolling-abdate is unlikely, and the restart must be performed on the same nodes as their data for reasons of performance of access to them.

* Config – service relationships are defined in the configuration, for example, docker-compose.yml;

* Port bindign – services communicate through ports, while the port can be selected automatically, for example, if EXPOSE PORT is specified in the Dockerfile, then when a container is called with the -P flag, it will be terminated to the free one automatically.

* Env – environment settings are passed through environment variables, not through configs, which allows them to be added to the service config configuration, for example, docker-compose.yml

* Logs – logs are streamed over the network, for example, ELK, or printed to the output, which is already streamed by Docker.

Dockerd internals:

essh @ kubernetes-master: ~ / mongo-rs $ ps aux | grep dockerd

root 6345 1.1 0.7 3257968 123640? Ssl Jul05 76:11 / usr / bin / dockerd -H fd: // –containerd = / run / containerd / containerd.sock

essh 16650 0.0 0.0 21536 1036 pts / 6 S + 23:37 0:00 grep –color = auto dockerd

essh @ kubernetes-master: ~ / mongo-rs $ pgrep dockerd

6345

essh @ kubernetes-master: ~ / mongo-rs $ pstree -c -p -A $ (pgrep dockerd)

dockerd (6345) – + – docker-proxy (720) – + – {docker-proxy} (721)

| | – {docker-proxy} (722)

| | – {docker-proxy} (723)

| | – {docker-proxy} (724)

| | – {docker-proxy} (725)

| | – {docker-proxy} (726)

| | – {docker-proxy} (727)

| `– {docker-proxy} (728)

| -docker-proxy (7794) – + – {docker-proxy} (7808)

Docker-File:

* cleaning caches from package managers: apt-get, pip and others, this cache is not needed in production, only

takes up space and loads the network, but nowadays it is not often not relevant, since there are multi-stage

assembly, but more on that below.

* group commands of the same entities, for example, get APT cache, install programs and uninstall

cache: in one instruction – the code of only programs, with the spaced version – the code of the programs and the cache,

because if you do not delete the cache in one instruction, then it will be saved in the layer, regardless of

follow-up actions.

* separate instructions by frequency of change, so for example, if not split installation

software and code, then when you change something in the code, then instead of using the ready-made

layer with programs, they will be reinstalled, which will entail significant preparation time

image that is critical for developers:

ADD ./app/package.json / app

RUN npm install

ADD ./app / app

Docker alternatives

** Rocket or rkt – containers for the CoreOS operating environment from RedHut, specially designed to use containers.

** Hyper-V is an environment for running Docker on the Windows operating system, which is a wrapper (lightweight virtual machine) of the container.

Docker has branched off its core components, which it uses as primitives, which have become standard components for implementing containers such as RKT, bundled into the containerd project:

* CRI-O – OpanSource project aimed from the beginning to fully support CRI (Container Runtime Interface) standards, github.com/opencontainers/runtime-spec/">Runtime Specification and github.com/opencontainers/image-spec">Image Specification as a general interface for the interaction of the orchestration system with containers. Along with Docker, support for CRI-O 1.0 has been added to Kubernetes (more on this) since version 1.7 in 2007, as well as MiniKube and Kubic. Has a CLI (Common Line Interface) implementation in the Pandom project, which almost completely repeats Docker commands, but without orchestration (Docker Swarm), which is the default tool in Linux Fedora.

* CRI (Kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-Kubernetes/">Container Runtime Interface) – an environment for running containers, universally providing primitives (Executor, Supervisor, Metadata, Content, Snapshot , Events and Metrics) for working with Linux containers (process spaces, groups, etc.).

** CNI (Container Networking Interface) – work with the network.

Portainer

The simplest monitoring option would be Portainer:

essh @ kubernetes-master: ~ / microKubernetes $ cat << EOF> docker-compose.monitoring.yml

version: '2'

>

services:

portainer:

image: portainer / portainer

command: -H unix: ///var/run/Docker.sock

restart: always

ports:

– 9000: 9000

volumes:

– /var/run/Docker.sock:/var/run/Docker.sock

– ./portainer_data:/data

>

EOF

essh @ kubernetes-master: ~ / microKubernetes $ docker-compose -f docker-compose.monitoring.yml up -d

 

Monitoring with Prometheus

Monitoring – maintaining the continuity of work, tracking the current situation (identifying, localizing and sending about the incident, for example, in SaaS PagerDuty), predicting possible situations, visualization, building models for the normal operation of IAOps (Artificial Intelligence For It Operations, https: //www.gartner .com / en / information-technology / glossary / aiops-artificial-intelligence-operations).

Monitoring contains the following steps:

* identification of the incident;

* notification of the incident;

* localization;

* decision.

Monitoring can be classified by level into the following types:

* infrastructure (operating system, servers, Kubernetes, DBMS),;

* applied (application logs, traces, application events),;

* business processes (points in transactions, traces of transactions).

Monitoring can be classified according to the principle:

* distributed (traces),;

* synthetic (availability),;

* IAOps (forecasting, anomalies).

Monitoring is divided into two parts according to the degree of analysis: logging systems and incident investigation systems. An example of logging

serves as ELK stack, and incident investigation – Sentry (SaaS). For micro-services, a tracing system is also added.

requests such as Jeger or Zipkin. The logging system simply writes all the logs that are available.

The incident investigation system writes much more information, but writes it only in case of errors in the application, for example,

environment parameters, versions of installed packages, stack trace and so on, which allows you to get maximum information when viewing

by mistake, rather than collecting it piece by piece from the server and the GIT repository. But the set and format of information depends on the environment, therefore

the incident system needs to be integrated with various language platforms, and even better with specific frameworks. So Sentry

poisons environment variables, a piece of code and an indication of where the error occurred, parameters of the program and platform

environments, method calls.

Ecosystem monitoring can be divided into:

* Built into Cloud Cloud: Azure Monitoring, Amazon CloudWatch, Google Cloud Monitoring

* Provided as a service with support for various SaaS integrations: DataDog, NewRelic

* CloudNative: Prometheus

* For dedicated servers OnPremis: Zabbix

Zabbix was developed in 1998 and released to OpenSource under the GPL in 2001. At that time, the traditional interface:

without any design, with a lot of tabs, selectors and the like. Since it was developed for

own needs, it contains specific solutions. He is oriented

monitoring devices and their components such as disks, networks, printers, routers and the like. For

interactions can be used:

Agents – installed on servers, collecting many metrics and poisoning Zabbix server

HTTP – Zabbix makes requests over http, for example printers

SNMP – a network protocol for communicating with network devices

IPMI is a protocol for communicating with server devices such as routers

In 2019, Gratner presented a rating of monitoring systems in its square:

** Dynatrace;

** Cisco (AppDynamics);

** New Relic;

** Broadcom (CA Technologies);

** Riverbed and Microsoft;

** IBM;

** Oracle;

** SolarWinds;

** Micro Focus;

** ManageEngine and Tingyun.

Not included in the square:

** Correlsense;

** Datadog;

** Elastic;

** Honeycomb;

** Instant;

** Jennifer Soft;

** Light Step;

** Nastel Technologies;

** SignalFx;

** Splunk;

** Sysdig.

When we run an application in a Docker container, all the standard output (what is displayed in the console) of the running program (process) is buffered. We can view this buffer with the docker logs name_container program . If we follow the Docker ideology – "one process, one container" – we can view the logs of an individual program. It is convenient to use the less and tail commands to view logs. The first program allows you to conveniently scroll through the logs with the keyboard arrows, search for the desired one based on matches and using a regular expression pattern, like the text editor vi . The second program displays the amount we need

An important criterion for ensuring smooth operation is the control of free space. So, if there is no space left, then the database will not be able to write data, with other components the situation can be more dire than the loss of new data. Docker has limit settings not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt

In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.

Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:

* cAdvisor + Heapster + InfluxDB

* cAdvisor + collectd + Heapster

* cAdvisor + Prometheus

* snapd + Heapster

* snapd + SNAP cluster-level agent

* Sysdig

There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:

cpu_usage: 2

cpu_usage {app: myapp}: 2

Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:

* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.

* Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.

* Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.

To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:

** node_exporter – node metrics (CRU, Memory, Network);

** snmp_exporter – SNMP protocol metrics;

** mysqld_exporter – MySQL database metrics;

** consul_exporter – Consul database metrics;

** graphite_exporter – Graphite database metrics;

** memcached_exporter – Memcached database metrics;

** haproxy_exporter – HAProxy balancer metrics;

** CAdvisor – container metrics;

** process-exporter – detailed process metrics;

** metrics-server – CRU, Memory, File-descriptors, Disks;

** cAdvisor – a Docker daemon metrics – containers monitoring;

** kube-state-metrics – deployments, PODs, nodes.

Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:

remote_write:

– url: "http: // localhost: 9000 / receive"

Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:

controlplane $ kubectl -n istio-system get svc prometheus

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE

prometheus ClusterIP 10.99.70.170 <none> 9090 / TCP 6m59s

To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.

 

machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"

machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01"

If the network is MEMORY, then you can select machine_memory_bytes – the size of the RAM on the machine (server or virtual):

machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "}

machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node901", kubernetes_io_hostname = "node901"

But in bytes it is not clear, so we will use PromQL to translate to Gb: machine_memory_bytes / 1000/1000/1000

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "controlplane", kubernetes_io25}

{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io48}

Let's enter for memory_bytes to search for container_memory_usage_bytes – used memory. The list contains all containers and their current memory consumption, I will give only three:

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepod-pods-beseff633. b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes-cadvisorname," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor " , name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name = "etcd-controlplane"} 45056

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepods-pods-besteff2. 76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-caduvisor ", kubernetes_dio_host , name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-nhzhn", pod_name = "kube-proxy-450 nhz

container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepa-poda-besteffort. 24ef0e898e1bb7dec9854b67291171aa9c5715d7683f53bdfc2cef49a19744fe.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" node01 ", job =" kubernetes-caduvisor amuber, kubernetes_dio_arch ", kubernetes_dio_arch , name = "k8s_POD_kube-proxy-6v49x_kube-system_6473aeea-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-6v49x", pod_name = "kube-proxy-835549x

Let's set the label that is contained in the metrics to filter out one: container_memory_usage_bytes {container_name = "prometheus"}

container_MEMORY_usage_bytes {beta_Kubernetes_io_arch = "amd64", beta_Kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubePODs.slice / kubePODs-burstableODslice-burdeaf2.slice. b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6.scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", JOB =" Kubernetes-cadvisor ", Kubernetes_io_arch =" amd64 ", Kubernetes_io_hostname =" node01 ", Kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus- 5b77b7d695-knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", POD =" prometheus-5b77b7d695-knf44 ", POD_name =" prometheus-5b77b7d44

283443200

Let's bring in Mb: container_memory_usage_bytes {container_name = "prometheus"} / 1000/1000

{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6 .scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695 -knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}

286.18752

Filter by container_memory_usage_bytes {container_name = "prometheus", instance = "node01"}

beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6. scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695- knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}

289.890304

And on the second one it is not: container_memory_usage_bytes {container_name = "prometheus", instance = "node02"}

no data

There are also aggregate functions sum (container_memory_usage_bytes) / 1000/1000/1000

{} 22.812798976

max (container_memory_usage_bytes) / 1000/1000/1000

{} 3.6422983679999996

min (container_memory_usage_bytes) / 1000/1000/1000

{} 0

You can also group by labels instance: max (container_memory_usage_bytes) by (instance) / 1000/1000/1000

{instance = "controlplane"} 1.641836544

{instance = "node01"} 3.6622745599999997

You can perform actions with the same type of labels and filter out: container_memory_mapped_file / container_memory_usage_bytes * 100> 80

{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-pode45f10af1ae684722cbd74cb11807900.slice / docker-5cb2f2083fbc467b8b394b27b69686d309f951450bcb910d509572aea9922806 .scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" controlplane ", kubernetes_io_os =" linux ", name = "k8s_POD_kube-controller-manager-controlplane_kube-system_e45f10af1ae684722cbd74cb11807900_0", namespace = "kube-system", pod = "kube-controller-manager-controlplane", pod_name = "kube-controller-manager-controlplane"}

80.52631578947368

You can look at the file system metrics using container_fs_limit_bytes, which produces a large list – I will give a few of it:

container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod0e619e5dc53ed9efcef63f5fe1d7ee71.slice / docker-b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name "} etcd-controlplane =" etc