nushell: Don't keep duplicate entries in history file

Related problem

Currently, the history file contains duplicates. Sorting them using uniq will take a quadratic time complexity.

❯ benchmark { history | reverse | where exit_status != 1 | uniq }
39sec 115ms 678µs 598ns

❯ benchmark { history | reverse | where exit_status != 1 }
39ms 329µs 969ns

It is 39 seconds vs 39 milliseconds. The gap is huge. One alternative is to pass only one column to uniq

benchmark { history | select command exit_status | where exit_status != 1 | get command | uniq }

This is faster in the command-line benchmark but the lag is still noticeable in menu history.

This is the excerpt of my history menu:

    {
        # List all unique successful commands
        name: all_history_menu
        only_buffer_difference: true
        marker: "? "
        type: {
            layout: list
            page_size: 10
        }
        style: {
            text: green
            selected_text: green_reverse
        }
        source: { |buffer, position|
            history
            | select command exit_status
            | where exit_status != 1
            | where command =~ $buffer
            | each { |it| {value: $it.command } }
            | reverse
            | uniq # ⚠️ 
        }
    }

Without uniq it is instantaneous. But you get duplicates everywhere.

Describe the solution you’d like

We should have a similar feature as HISTCONTROL=ignoreboth:erasedups in bash.

Describe alternatives you’ve considered

Other user mention this alternative.

It is meaningful-ooo/sponge: 🧽 Clean fish history from typos automatically.

Sponge quietly runs in the background and keeps your shell history clean from typos, incorrectly used commands, and everything you don’t want to store due to privacy reasons.

Additional context and details

It is the output of our discussion on why the history menu took so long.

In the meantime, I use this script:

# clean-nushell-db

#!/usr/bin/env nu
let db = "~/dotfiles/nushell/.config/nushell/history.sqlite3"

def get_current_row [] {
  let current_row = (^sqlite3 $db "SELECT COUNT(*) FROM history h")
  echo $"current rows: ($current_row)"
}

get_current_row
# Remove failed commands
sqlite3 $db "DELETE FROM history WHERE exit_status != 0"
# Remove duplicates. But keep one.
# https://stackoverflow.com/a/53693544/6000005
sqlite3 $db "DELETE FROM history WHERE id NOT IN (SELECT MIN(id) FROM history h  GROUP BY command_line);"
get_current_row

Related:

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 6
  • Comments: 15 (11 by maintainers)

Most upvoted comments

An option to skip duplicates on-fly would be great. Keeping the everlasting history for statistics may be useful, but only if you need statistics. And how many users are actually interested in analyzing their history?

At the moment I have 4500 history entries in .zsh_history with deduplication enabled, collected in few years, and 10000 entries in ~/.config/nushell/history.txt from few weeks of usage. This is definitely not scalable. While housekeeping is an option (like removing dups/repacking history once per week or so), the on-fly deduplication is better (and faster, as with unique entries you are unlikely to need to handle thousands of lines)

Is it possible to add unique constraint on the command_line column so it is handled on insert time by sqlite…? this will be super efficient than to dedup at read time…

it would be even better to have a setting, that updates previous same command with current timestamp