logstash-input-jdbc: query results do not index nested fields

here is my setup for the logstash.conf

input {
  jdbc {
    jdbc_driver_library => "C:/logstash-1.5.5/ojdbc6.jar"
    jdbc_driver_class => "Java::oracle.jdbc.driver.OracleDriver"
    jdbc_connection_string => "jdbc:oracle:thin:@..."
    jdbc_user => "es_user"
    jdbc_password => "*************"
    statement => "SELECT e.id_number, e.name,  a.address from customer e left join address a on e.id_number = a.id_number"

  }
}
output {
  stdout {
    codec => rubydebug
  }
  elasticsearch_http {
    host => "localhost:9200"
    index => "myindex"
    document_id => "%{id_number}"
  }
}

i am trying to index customer ids with their address information

customers and address have a 1:M relationship

i expect the results to be one document indexed per unique customer id, with each document possibly having a nested amount of addresses

instead, after my conf file runs, the document is created each time it encounters a row, whichever row is last is the version, i.e. if a person has 5 addresses, the document has one address , and is at version 5

its like its not understanding how to fold query results into normalized json object(s), is there a way to fold query results into nested fields?

About this issue

  • Original URL
  • State: open
  • Created 9 years ago
  • Reactions: 8
  • Comments: 45 (2 by maintainers)

Most upvoted comments

Good news guys : I just released logstash-filter-aggregate 2.2.0 with a new feature called “push_previous_map_as_event”, which aims precisely to fix this issue !

You can see an example of what it can do here : https://github.com/logstash-plugins/logstash-filter-aggregate#example-4--no-end-event-and-tasks-come-one-after-the-other

Hope you’ll enjoy it 😃

@tademeit

  • If you have only one location associated to one person, add this filter :
	mutate {
		rename => { "name" => "[location][name]" }
	}
  • If you have potentially several locations associated to one person, add this filter configuration : :
filter {
         aggregate {
             task_id => "%{id}"
             code => "
                map['location'] ||= { 'name' => [] }
                event.to_hash.each do |key,value|
                    map[key] ||= value unless key == 'name'
                    map['location']['name'] << value if key == 'name'
                end
             "
             push_previous_map_as_event => true
             timeout => 5
             timeout_tags => ['aggregated']
         }

         if "aggregated" not in [tags] {
            drop {}
         }
     }

@Daan-ISAAC I inform you that aggregate plugin v2.5.0 is compatible with Logstash 5. I hope you’ll enjoy it 😃

+1 on this enhancement request.

This was the default way that the jdbc river plugin treated dots in field names and this enhancement would help ease the transition away from rivers for many users. Thanks