基础入门

当你安装完 Kibana 就可以阅读本教程来实现对 Kibana 核心功能的快速上手了。本教程将会指引你:

  • 把数据导入到 Elasticsearch 中
  • 定义至少一个索引名称规则
  • 使用 Discover 功能来查阅研究分析你的数据
  • 使用一些可视化的图表来展示你的数据
  • 把多个图表组合在一个仪表板中

本教程假定你已经有一个可以运行的 Kibana 并且正确连接到了 Elasticsearch 上。

相应的视频教程地址还让不让非英语地区的程序狗混了

在开始之前:先在 Elasticsearch 里造点数据

本章教程需要以下这些数据:

  • 莎士比亚全集,恰当地被解析并存储到字段中。点击下载:shakespeare.json
  • 一些虚构的,随机生成的用户账号数据。点击下载:accounts.zip
  • 一些虚构的,随机生成的日志文件。点击下载:logs.jsonl.gz

其中第二个和第三个文件是压缩包。你可以使用这个命令来解压缩劳资用的是 windows!!!

unzip accounts.zip
gunzip logs.jsonl.gz

莎士比亚全集的 json 数据是下面这种格式的:

{
    "line_id": INT,
    "play_name": "String",
    "speech_number": INT,
    "line_number": "String",
    "speaker": "String",
    "text_entry": "String",
}

而账户信息则是下面这种格式:

{
    "account_number": INT,
    "balance": INT,
    "firstname": "String",
    "lastname": "String",
    "age": INT,
    "gender": "M or F",
    "address": "String",
    "employer": "String",
    "email": "String",
    "city": "String",
    "state": "String"
}

日志数据有很多不同的字段,但是比较值得注意的是下面几个,这些字段会在本教程中涉及得到:

{
    "memory": INT,
    "geo.coordinates": "geo_point"
    "@timestamp": "date"
}

在我们加载数据之前,我们需要对字段先进行映射。映射能够将索引中的文档分到各个逻辑分组中,并设置字段的一些特征属性,比如规划字段的可搜索性或者它是否需要被分词等等。

运行这段代码来进行映射:

curl -XPUT http://localhost:9200/shakespeare -d '
{
 "mappings" : {
  "_default_" : {
   "properties" : {
    "speaker" : {"type": "string", "index" : "not_analyzed" },
    "play_name" : {"type": "string", "index" : "not_analyzed" },
    "line_id" : { "type" : "integer" },
    "speech_number" : { "type" : "integer" }
   }
  }
 }
}
';

上面这段代码的意思是:

  • speaker 字段的数据类型是一个字符串,并且不需要被解析。这个字段的字符串会被当做一个不可分割的整体,哪怕它是由多个单词构成的。
  • play_name 字段和 speaker 字段一样
  • line_id 和 speech_number 是整数

对于日志数据,则需要把它的经纬度数据设置成 geo_point 类型,这样这些数据就能正确地被 Elasticsearch 识别为地址位置型数据了。

相应的映射代码如下:

curl -XPUT http://localhost:9200/logstash-2015.05.18 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';
curl -XPUT http://localhost:9200/logstash-2015.05.19 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';
curl -XPUT http://localhost:9200/logstash-2015.05.20 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';

账户信息数据则不需要手动去映射了,所以接下来我们只要使用 Elasticsearch 的 bulk API 把这些数据塞到 Elasticsearch 中就行了:

curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl

这个命令可能有点耗时,耗时长短拒绝于硬件资源。

如果要校验数据是否加载成功的话就运行这个命令:

curl 'localhost:9200/_cat/indices?v'

运行上面这段命令你会看到大致这样子的结果:

health status index               pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank                  5   1       1000            0    418.2kb        418.2kb
yellow open   shakespeare           5   1     111396            0     17.6mb         17.6mb
yellow open   logstash-2015.05.18   5   1       4631            0     15.6mb         15.6mb
yellow open   logstash-2015.05.19   5   1       4624            0     15.7mb         15.7mb
yellow open   logstash-2015.05.20   5   1       4750            0     16.4mb         16.4mb

Getting Started with Kibana

Now that you have Kibana installed, you can step through this tutorial to get fast hands-on experience with key Kibana functionality. By the end of this tutorial, you will have:

  • Loaded a sample data set into your Elasticsearch installation
  • Defined at least one index pattern
  • Use the Discover functionality to explore your data
  • Set up some visualizations to graphically represent your data
  • Assembled visualizations into a Dashboard

The material in this section assumes you have a working Kibana install connected to a working Elasticsearch install.

Video tutorials are also available:

  • High-level Kibana 4 introduction, pie charts
  • Data discovery, bar charts, and line charts
  • Tile maps
  • Embedding Kibana 4 visualizations

Before You Start: Loading Sample Dataedit

The tutorials in this section rely on the following data sets:

  • The complete works of William Shakespeare, suitably parsed into fields. Download this data set by clicking here: shakespeare.json.
  • A set of fictitious accounts with randomly generated data. Download this data set by clicking here: accounts.zip
  • A set of randomly generated log files. Download this data set by clicking here: logs.jsonl.gz

Two of the data sets are compressed. Use the following commands to extract the files:

unzip accounts.zip
gunzip logs.jsonl.gz

The Shakespeare data set is organized in the following schema:

{
    "line_id": INT,
    "play_name": "String",
    "speech_number": INT,
    "line_number": "String",
    "speaker": "String",
    "text_entry": "String",
}

The accounts data set is organized in the following schema:

{
    "account_number": INT,
    "balance": INT,
    "firstname": "String",
    "lastname": "String",
    "age": INT,
    "gender": "M or F",
    "address": "String",
    "employer": "String",
    "email": "String",
    "city": "String",
    "state": "String"
}

The schema for the logs data set has dozens of different fields, but the notable ones used in this tutorial are:

{
    "memory": INT,
    "geo.coordinates": "geo_point"
    "@timestamp": "date"
}

Before we load the Shakespeare data set, we need to set up a mapping for the fields. Mapping divides the documents in the index into logical groups and specifies a field’s characteristics, such as the field’s searchability or whether or not it’s tokenized, or broken up into separate words.

Use the following command to set up a mapping for the Shakespeare data set:

curl -XPUT http://localhost:9200/shakespeare -d '
{
 "mappings" : {
  "_default_" : {
   "properties" : {
    "speaker" : {"type": "string", "index" : "not_analyzed" },
    "play_name" : {"type": "string", "index" : "not_analyzed" },
    "line_id" : { "type" : "integer" },
    "speech_number" : { "type" : "integer" }
   }
  }
 }
}
';

This mapping specifies the following qualities for the data set:

  • The speaker field is a string that isn’t analyzed. The string in this field is treated as a single unit, even if there are multiple words in the field.
  • The same applies to the play_name field.
  • The line_id and speech_number fields are integers.

The logs data set requires a mapping to label the latitude/longitude pairs in the logs as geographic locations by applying the geo_point type to those fields.

Use the following commands to establish geo_point mapping for the logs:

curl -XPUT http://localhost:9200/logstash-2015.05.18 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';
curl -XPUT http://localhost:9200/logstash-2015.05.19 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';
curl -XPUT http://localhost:9200/logstash-2015.05.20 -d '
{
  "mappings": {
    "log": {
      "properties": {
        "geo": {
          "properties": {
            "coordinates": {
              "type": "geo_point"
            }
          }
        }
      }
    }
  }
}
';

The accounts data set doesn’t require any mappings, so at this point we’re ready to use the Elasticsearch bulk API to load the data sets with the following commands:

curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'localhost:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @logs.jsonl

These commands may take some time to execute, depending on the computing resources available.

Verify successful loading with the following command:

curl 'localhost:9200/_cat/indices?v'

You should see output similar to the following:

health status index               pri rep docs.count docs.deleted store.size pri.store.size
yellow open   bank                  5   1       1000            0    418.2kb        418.2kb
yellow open   shakespeare           5   1     111396            0     17.6mb         17.6mb
yellow open   logstash-2015.05.18   5   1       4631            0     15.6mb         15.6mb
yellow open   logstash-2015.05.19   5   1       4624            0     15.7mb         15.7mb
yellow open   logstash-2015.05.20   5   1       4750            0     16.4mb         16.4mb

results matching ""

    No results matching ""