use sed to place "hard returns" in a massive (one-line!) text file?

**root.veg** · 08-31-2007, 05:49 PM

Hi people,

long time no log-in. Any ideas on this?:

I have a long list of e-mail addresses which I want to use for a mass-subscription to a Mailman mailing list. Don't worry, I'm not spamming or anything, just streamlining a previously unwieldy manual mailing-list, for an NGO in Peru.

The curious problem is that all these e-mail addresses are arranged on one line in a text file, having been prepared in Microsoft Word (shudder!). The addresses are each separated from the next by one comma and one space. Hoewever, Mailman requires one address per line, each line separated by a "hard return".

I know only enough sed to do this:

Code:

sed 's/, /\n/'g mailing-list.txt

Which basically does the job, replacing each occurrence of ", " with "\n" (a hard return).

BUT...

some of the addresses are of the form:

Code:

Smith, John D <blah@blah.org>

Is there a way to NOT remove the comma when it's NOT separating two e-mail addresses, merely separating last names from first names?

Yours hopefully,

Andrew.

**bwkaz** · 08-31-2007, 07:19 PM

How is sed supposed to know when a ", " sequence is separating two email addresses versus when it's separating a person's last and first names? It's not like sed is clairvoyant...

Unless all the email addresses end in a > character? Then it may work to match ">, " and replace it with ">\n". But if the only thing you can go on is the ", " sequence, then I'm pretty sure the lack of clairvoyance is going to sink the effort. Maybe you should get whoever to redo the list.

**ArgPirate** · 08-31-2007, 10:33 PM

I dont see why you cant just match .org, and repace it with .org\n and the same with the other top level domains. You could do it with one long reg exp match to grab them all but i don't know if its worth the trouble to make it a little more elegant.

**flukshun** · 09-01-2007, 09:54 AM

that may work as well, but matching based on ">", as bwkaz suggested, seems to be the simplest solution. 1 pass and he's done

**ArgPirate** · 09-01-2007, 10:21 AM

Agreed, I was thinking that perhaps he placed the <> on the email addresses for illustrative purposes.

**nabetse** · 09-01-2007, 01:06 PM

Just threw this together:

Code:

(\w+\.)*\w+@(\w+\.)*\w+\.\w\w\w

You can use vim to do a pcre substitution to place something unique after each email address:

Code:

:perldo s/(\w+\.)*\w+@(\w+\.)*\w+\.\w\w\w/$&*|*/g

should place "*|*" after each email address.

Then you can use sed to replace each "*|*" with a hard return.

Code:

sed 's/\*|\*/\n/'g textfile.txt

Of course you'd probably want to strip away all the other junk like "Smith, John <" and ">".

**root.veg** · 09-01-2007, 04:14 PM

"sed is not clairvoyant"! How true...

I have two lines of enquiry now:

1) do what nabetse said and concentrate on stripping out all the cruft ( "Smith, John S" and <>).

2) try and re-attach the real names that I've split apart using sed.

once I've done the simple ", " for "\n" replacement, a grep for all lines containing "@" shows that only 140 addresses have this problem:
[CODE]
veg@purplemonster:~$ sed 's/, /\n/'g MailingListOriginal > FirstPass
veg@purplemonster:~$ wc -l MailingListOriginal
2 MailingListOriginal
veg@purplemonster:~$ wc -l FirstPass
2966 FirstPass
veg@purplemonster:~$ grep @ FirstPass | wc -l
2826
veg@purplemonster:~$ [\CODE]

Hmm. I'm going with option 2 right now, as it doesn't involve much mucking around with the lines which actually contain the useful addresses.

I think I'm going to try finding all lines with no "@" sign and concatenating each one with its following line. If anyone knows how to do this, you may save me much man-page hell over the next hour hour or two! Thanks for the help so far, guys!

**knute** · 09-01-2007, 05:54 PM

I'ld think that opening it up in openoffice and then saving it as a text file might work as well.

**Davy** · 09-02-2007, 07:31 PM

what knute said, except:

do a replace all search with '\n' and each of the names will have a hard return afterwards.

THEN save it as text file.

**ghostdog74** · 09-03-2007, 02:25 AM

show a sample of how your email file looks like

Thread: use sed to place "hard returns" in a massive (one-line!) text file?

Thread Tools

Display

use sed to place "hard returns" in a massive (one-line!) text file?

Posting Permissions