Skip to content

Esql support #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Esql support #233

wants to merge 20 commits into from

Conversation

mashhurs
Copy link
Contributor

@mashhurs mashhurs commented Apr 4, 2025

Description

ES|QL support:

  • response_type accepts esql option distinguish from other query types. For the long term this will be deprecated and replaced by query_type if team agrees.
  • adds ES|QL executor to execute ESQL query and parse/map response to event
  • validations
    • make sure LS (8.17.4+) supports ES|QL (new elasticsearch-ruby client)
    • make sure connected ES is greater than 8.11+
    • query isn't empty or meaningful that starts with command syntax
  • informing if query isn't using METADATA which adds _id, _version to the response entries
  • informing ineffective params such as size, search_api, target if users configure
  • ES|QL results field names in a dotted format. The plugin reproduces nested (example {a.b.c: 'val'} => {'a':{'b':{'c':'val'}}})

FYI: failed docs CI isn't related to this change.

Sample minimal config to test:

    elasticsearch {
        cloud_id => "my-cloud-id"
        api_key => "api:key"
        response_type => "esql"
        query => "FROM my-index /* Query comment */ | LIMIT 1"
}

Author's check

  • Common (timeout, internal error, etc..) error cases more tests
  • Enrichment errors test (in case)
  • With multiple indices, if field types mismatch, unsupported field type will be filled (reference)
  • query includes comment
  • Unit tests to run on >8.17.4
  • Documentation
  • Integration tests

Logs

  • when credentials wrong
[2025-04-08T14:39:01,060][WARN ][logstash.inputs.elasticsearch.esql][main][6ecdbd14f1bdf461d566eb2807fb23bdf38e032ae8b36a3ff64ee9c4e112ef51] Attempt to ES|QL job but failed. Sleeping for 0.16 {:fail_count=>4, :exception=>"[401] {\"error\":{\"root_cause\":[{\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [elastic] for REST request [/_query?format=json]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"ApiKey\"]}}],\"type\":\"security_exception\",\"reason\":\"unable to authenticate user [elastic] for REST request [/_query?format=json]\",\"header\":{\"WWW-Authenticate\":[\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\",\"ApiKey\"]}},\"status\":401}"}
  • when ES is unresponsive at startup
[2025-04-08T14:40:43,942][ERROR][logstash.javapipeline    ][main] Pipeline error {:pipeline_id=>"main", :exception=>#<Elastic::Transport::Transport::Error: Connect to localhost:9200 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused>, :backtrace=>["/logstash/vendor/bundle/jruby/3.1.0/gems/elastic-transport-8.4.0/lib/elastic/transport/transport/base.rb:324:in `perform_request'", "/logstash/vendor/bundle/jruby/3.1.0/gems/elastic-transport-8.4.0/lib/elastic/transport/transport/http/manticore.rb:91:in `perform_request'", "/logstash/vendor/bundle/jruby/3.1.0/gems/elastic-transport-8.4.0/lib/elastic/transport/client.rb:192:in `perform_request'", "/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-8.17.1/lib/elasticsearch.rb:86:in `verify_elasticsearch'", "/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-8.17.1/lib/elasticsearch.rb:69:in `method_missing'", "/logstash/vendor/bundle/jruby/3.1.0/gems/elasticsearch-api-8.17.1/lib/elasticsearch/api/actions/ping.rb:43:in `ping'", "/ls-plugins/logstash-input-elasticsearch/lib/logstash/inputs/elasticsearch.rb:632:in `test_connection!'", "/ls-plugins/logstash-input-elasticsearch/lib/logstash/inputs/elasticsearch.rb:350:in `register'", "/logstash/vendor/bundle/jruby/3.1.0/gems/logstash-mixin-ecs_compatibility_support-1.3.0-java/lib/logstash/plugin_mixins/ecs_compatibility_support/target_check.rb:48:in `register'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:245:in `block in register_plugins'", "org/jruby/RubyArray.java:1981:in `each'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:244:in `register_plugins'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:401:in `start_inputs'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:325:in `start_workers'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:198:in `run'", "/logstash/logstash-core/lib/logstash/java_pipeline.rb:150:in `block in start'"], "pipeline.sources"=>["/logstash/config/input-elasticsearch.conf"], :thread=>"#<Thread:0x2a7cfea9 /logstash/logstash-core/lib/logstash/java_pipeline.rb:138 run>"}
  • when ES is unresponsive with scheduler
[2025-04-08T14:44:00,925][WARN ][logstash.inputs.elasticsearch.esql][main][69ebd8c45e226f6cb40702a83fface05c973db75d4d163d6d546d0a1b38aa425] Attempt to ES|QL job but failed. Sleeping for 0.16 {:fail_count=>4, :exception=>"Connect to localhost:9200 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused"}
[2025-04-08T14:44:01,085][ERROR][logstash.inputs.elasticsearch.esql][main][69ebd8c45e226f6cb40702a83fface05c973db75d4d163d6d546d0a1b38aa425] ES|QL job failed with  {:message=>"Connect to localhost:9200 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused", :cause=>#<Manticore::SocketException: Connect to localhost:9200 [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused>}
  • wrong query
[2025-04-08T14:46:00,908][ERROR][logstash.inputs.elasticsearch.esql][main][2b63d9b563a054f220f92dd2688502c86e75d08da26cc8c71facbd4ed07e8256] ES|QL job failed with  {:message=>"[400] {\"error\":{\"root_cause\":[{\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline 1:1: Unknown index [*datastream-my-index*]\"}],\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline 1:1: Unknown index [*datastream-my-index*]\"},\"status\":400}", :cause=>nil}

  • no enrichment policy found
[2025-04-08T16:38:54,115][ERROR][logstash.inputs.elasticsearch.esql][main][ef5a0e022f6af6062065dce8128d02a7e5f971da3ff3cd53a9414ff1b90ae97e] ES|QL job failed with  {:message=>"[400] {\"error\":{\"root_cause\":[{\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline 2:28: failed to resolve enrich policy [geo-match-esql-test]; reason [Unknown index [.enrich-geo-match-esql-test]]\"}],\"type\":\"verification_exception\",\"reason\":\"Found 1 problem\\nline 2:28: failed to resolve enrich policy [geo-match-esql-test]; reason [Unknown index [.enrich-geo-match-esql-test]]\"},\"status\":400}", :cause=>nil}

Name	               | Type	| Source indices	| Match field    | Enrich fields
match-test-policy |	match	| significant_month	| depth.            | place
{
      "@version" => "1",
    "@timestamp" => 2025-04-09T00:04:01.033127Z,
         "depth" => 10.0,
         "place" => [
        [0] "181 km ESE of Kimbe, Papua New Guinea",
        [1] "Reykjanes Ridge",
        [2] "Pacific-Antarctic Ridge",
        [3] "Burma (Myanmar)",
        [4] "2025 Mandalay, Burma (Myanmar) Earthquake",
        [5] "120 km SSE of Burica, Panama",
        [6] "34 km NE of Olonkinbyen, Svalbard and Jan Mayen"
    ]
}
  • the output filtered out null columns
[2025-04-30T16:50:16,820][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2025-04-30T16:50:16,820][INFO ][logstash.inputs.elasticsearch.esql][main][a8b5b8d000adeec849b012fb3d89adeb9927ded9aefdd6599a7883d07fe3821e] ES|QL executor has started
[2025-04-30T16:50:16,834][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{

    "target-test" => {
                 "agent" => {
                "hostname" => "8790421be44c",
                    "name" => "8790421be44c",
                      "id" => "756c54d4-a328-47f6-a7a9-d9f068f01879",
                    "type" => "metricbeat",
            "ephemeral_id" => "ba48e086-a70b-4d40-8111-eed0a59257e3",
                 "version" => "8.17.0"
        },
                "_index" => ".ds-metricbeat-8.17.0-2025.02.19-000001",
                 "error" => {
            "message" => "HTTP error 503 in : 503 Service Unavailable"
        },
          "kibana_stats" => {
            "timestamp" => 2025-02-19T19:08:06.626Z
        },
            "@timestamp" => 2025-02-19T19:08:06.626Z,
        "logstash_stats" => {
            "timestamp" => 2025-02-19T19:08:06.626Z
        },
                   "ecs" => {
            "version" => "8.0.0"
        },
               "service" => {
            "address" => "https://8790421be44c:18804/api/monitoring_collection/node_rules",
               "type" => "kibana"
        },
                  "host" => {
            "name" => "8790421be44c"
        },
             "metricset" => {
            "period" => 10000,
              "name" => "node_rules"
        },
                   "_id" => "sM2fH5UByLwtR9eBJDuS",
                 "event" => {
            "duration" => 22548699,
              "module" => "kibana",
             "dataset" => "kibana.node_rules"
        },
              "_version" => 1,
           "beats_state" => {
                "state" => {
                "host" => {
                    "name" => "8790421be44c"
                }
            },
            "timestamp" => 2025-02-19T19:08:06.626Z
        },
             "timestamp" => 2025-02-19T19:08:06.626Z
    },
     "@timestamp" => 2025-04-30T23:50:17.016835Z,
       "@version" => "1"
}

mashhurs added 2 commits April 8, 2025 07:36
…esql option, validations to make sure both LS and ES support the ESQL execution.
… adds by default - might be users are looking for by default.
@mashhurs mashhurs marked this pull request as ready for review April 10, 2025 23:30
mashhurs and others added 2 commits April 21, 2025 14:20
@jsvd jsvd self-requested a review April 22, 2025 15:49
…t timestampt converter to LogStash::Timestamp, dotted fields extended to nested fields.
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first round of review, overall it looks good, I'll give it a spin today/tomorrow to check on the overall user experience.

…tting the result into target if defined. Debug logs added which can help to investigate query and its result.

private

def get_query_object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review note: moved to private area

@mashhurs mashhurs requested a review from jsvd May 1, 2025 04:14
@jsvd
Copy link
Member

jsvd commented May 2, 2025

One use case that concerns me is the common default pattern of ES creating a "field.keyword" for each "field", which results in an error in the plugin during nest_keys here they keyword is being nested in a field that also exists called "field".

The ways to not have this is to have a dedicated mapping without this overlap or being explicit about what to keep using KEEP columns, which unfortunately doesn't support something like KEEP field, field.keyword as field_kw, forcing the user to do KEEP field.keyword.

Also the error is not very helpful given it's coming straight from nest_keys and is too technical for the user to understand. Example:

{
      "@version" => "1",
       "columns" => [
        [0] {
            "name" => "hey",
            "type" => "text"
        },
        [1] {
            "name" => "hey.keyword",
            "type" => "keyword"
        }
    ],
    "@timestamp" => 2025-05-02T09:03:00.125781Z,
          "tags" => [
        [0] "_elasticsearch_input_failure"
    ],
        "values" => [
        [0] [
            [0] "you",
            [1] "you"
        ]
    ]
}
[2025-05-02T10:04:00,457][WARN ][logstash.inputs.elasticsearch.esql][main][c5566e5e0eb93e4d936f5b5e834d37f79db36b7a3ffe9d301585851d77880290] Event creation error,  {:message=>"string not matched", :exception=>IndexError, :data=>{"columns"=>[{"name"=>"hey", "type"=>"text"}, {"name"=>"hey.keyword", "type"=>"keyword"}], "values"=>[["you", "you"]]}}

not sure yet what the solution should be, but at least catching this particular nesting scenario and bubbling up a warning saying "you can't keep top level and nested fields".

@jsvd
Copy link
Member

jsvd commented May 2, 2025

which unfortunately doesn't support something like KEEP field, field.keyword as field_kw, forcing the user to do KEEP field.keyword.

I was wrong, if there is a field and field.keyword it is possible to write FROM events as FROM events | RENAME field.keyword as field_keyword

@mashhurs
Copy link
Contributor Author

mashhurs commented May 5, 2025

which unfortunately doesn't support something like KEEP field, field.keyword as field_kw, forcing the user to do KEEP field.keyword.

I was wrong, if there is a field and field.keyword it is possible to write FROM events as FROM events | RENAME field.keyword as field_keyword

Right!
There are some cases such as synthetic sourre or fielddata (unanalyzed by default enabled keyword sub-field) with text field type which may contain sub-field, reference: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/text
I have included in the doc about this conflict on multi-field or sub-field cases: https://github.com/logstash-plugins/logstash-input-elasticsearch/pull/233/files#diff-cae5619b3d18ec99c5ccd0a9f6de0c6d3f53343c64692444551a7d29da6863e7R310-R311

Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks to be on track.

  1. I'd like to include the client-side mitigation of queries that come back with inner sub-fields to prevent crashes
  2. I'd like to align with the filter plugin for which parameter to specify the ESQL query in; if we determine that is better to use esql_query in the filter due to the filter's inability to distinguish a QueryString query from an ES|QL query, I'd like to use it here too.
  3. I would prefer more validation of inputs; a user shouldn't be able to configure ESQL with irrelevant things like slices or docinfo.

NOTE: If your index has a mapping with sub-objects where `status.code` and `status.desc` actually dotted fields, they appear in {ls} events as a nested structure.

[id="plugins-{type}s-{plugin}-esql-multifields"]
===== Conflict on multi-fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is going to be a pretty common issue with things like text/keyword multi-fields, I've proposed in a separate channel that we could detect and drop sub-fields instead of allowing the plugin to crash, and have provided a prototype for doing so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have optimized the prototype to reduce the time complexity (O(NLgN) + O(N^2) -> O(NLgN) + O(N+K)) (K is the max depth) with memoization.

The (so-far) last commit

  • updates the doc to reflect this change;
  • warn log messages (1 warn message for 1 query result, not all rows) with ignored multi-field values and includes the guidance (use RENAME command) if user wants to include the them into the event;
  • adds unit test as well

Real scenario test (also tested with type => keyword scenario):

// mapping
{
  "my-time-index-000001": {
    "mappings": {
      "properties": {
        "metrics": {
          "subobjects": false,
          "properties": {
            "time": {
              "type": "long"
            },
            "time.max": {
              "type": "long"
            },
            "time.min": {
              "type": "long"
            }
          }
        }
      }
    }
  }
}

// warn message
[2025-05-08T11:44:12,419][WARN ][logstash.inputs.elasticsearch.esql][main][8ff0da15d6ccf4b9d00dbcea466def36aa962864eb8638fa2b28e2f58af6d254] Multi-fields found in ES|QL result and they will not be available in the event. Please use `RENAME` command if you want to include them. {:found_multi_fields=>["metrics.time.max", "metrics.time.min"]}

// output event
{
    "@timestamp" => 2025-05-08T18:44:12.421771Z,
      "@version" => "1",
           "_id" => "metric_2",
       "metrics" => {
        "time" => 100
    }
}
{
    "@timestamp" => 2025-05-08T18:44:12.419639Z,
      "@version" => "1",
           "_id" => "metric_1",
       "metrics" => {
        "time" => 100
    }
}

|This plugin |4.23.0+ (4.x series) or 5.2.0+ (5.x series)
|===

To configure ES|QL query in the plugin, set the `response_type` to `esql` and provide your ES|QL query in the `query` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels cumbersome to me.

Could we align with the proposal in the filter PR to provide an ESQL query with esql_query instead of requring the configuration of multiple separate parameters?

In this case, since the input plugin does require a JSON-encoded object for its query parameter when using the Query DSL, we could auto-detect that a given query parameter is ESQL (unlike the ES filter, which uses a QueryString query as its query parameter)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we had a discussion with @jsvd about this, we had a similar idea to deprecate this response_type and replace with query_type in the future. And through the experience as I do see, introducing new param is not a difficult, deprecation -> obseletion -> removal is a long headache process.
From this point of view, I would support adding minimal change but I am open to apply changes if anyone has strong opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a separate note on how to do it.

I don't personally care much about removing the response_type right away, but if a user starts using ESQL I'd like them to not start new usages of a config that we'd like to deprecate.

Since this is effectively a rename, we can easily use the with_deprecated_alias helper from NormalizeConfigSupport.

mashhurs and others added 2 commits May 6, 2025 17:40
Co-authored-by: Rye Biesemeyer <yaauie@users.noreply.github.com>
Co-authored-by: João Duarte <jsvd@users.noreply.github.com>
# hits: normal search request
# aggregations: aggregation request
# esql: ES|QL request
config :response_type, :validate => %w[hits aggregations esql], :default => 'hits'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating to query_type with auto-detection of ESQL queries would be pretty straight-forward with the NormalizeConfigSupport mixin:

Suggested change
config :response_type, :validate => %w[hits aggregations esql], :default => 'hits'
config :response_type, :validate => %w[hits aggregations], :deprecated => "use `query_type`"
config :query_type, :validate => %w[hits aggregations esql] # default depends on query shape
   def register
+    @query_type = normalize_config("query_type") do |normalizer|
+      normalizer.with_deprecated_alias("response_type")
+    end || (@query.start_with?('{') ? 'hits' : 'esql')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to add the deprecation right after this ES|QL change.
One agreement we need to decide is naming. I personally do not like hits, aggregations along with esql. They indicate different contexts. I had options dsl_search, dsl_aggregation and esql.
Let me please know your opinion: I can either apply with change if we quickly come with agreement or create an issue follow up right after this PR.

|This plugin |4.23.0+ (4.x series) or 5.2.0+ (5.x series)
|===

To configure ES|QL query in the plugin, set the `response_type` to `esql` and provide your ES|QL query in the `query` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a separate note on how to do it.

I don't personally care much about removing the response_type right away, but if a user starts using ESQL I'd like them to not start new usages of a config that we'd like to deprecate.

Since this is effectively a rename, we can easily use the with_deprecated_alias helper from NormalizeConfigSupport.

@mashhurs mashhurs requested a review from yaauie May 8, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants