-
Compare files & deduplicate file without sorting?
Hello,
i got two files with several thousand lines
and i want to deduplicate file1 (remove from it lines that already exist in file2)
these lines contains various symbols like quotation marks, $ etc
the command must not change lines order. just remove duplicate lines...
i found several ontopic tutorials, but they use sorting which i cant use as its important for lines order to stay as it is, just remove duplicate lines..
Thank you
Last edited by postcd; 07-05-2015 at 07:18 PM.
-
While I don't have a ready made solution, I would have thought that there must be a way of sed doing this. Might be worth exploring?
-
I would use a scripting language (Perl, Python, ....).
Are the lines the same order in both files?
read line from smaller file
read from big file and if not a match output to file3
if match get next line from small file and repeat.
when done copy file3 to file1
if they are not in the same order you are looking at a pia process of reading through 1 file for each possible line in the other file(lots of passes) .
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|