Compare files & deduplicate file without sorting?


Results 1 to 3 of 3

Thread: Compare files & deduplicate file without sorting?

  1. #1
    Join Date
    Apr 2014
    Posts
    34

    Compare files & deduplicate file without sorting?

    Hello,

    i got two files with several thousand lines

    and i want to deduplicate file1 (remove from it lines that already exist in file2)

    these lines contains various symbols like quotation marks, $ etc

    the command must not change lines order. just remove duplicate lines...

    i found several ontopic tutorials, but they use sorting which i cant use as its important for lines order to stay as it is, just remove duplicate lines..

    Thank you
    Last edited by postcd; 07-05-2015 at 07:18 PM.

  2. #2
    Join Date
    Nov 2002
    Location
    Powys, UK
    Posts
    18
    While I don't have a ready made solution, I would have thought that there must be a way of sed doing this. Might be worth exploring?

  3. #3
    Join Date
    Oct 2002
    Location
    AZ, USA
    Posts
    110
    I would use a scripting language (Perl, Python, ....).
    Are the lines the same order in both files?
    read line from smaller file
    read from big file and if not a match output to file3
    if match get next line from small file and repeat.
    when done copy file3 to file1
    if they are not in the same order you are looking at a pia process of reading through 1 file for each possible line in the other file(lots of passes) .

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •