Google Groups Home
Help | Sign in
Need ideas on how to make this code faster than a speeding turtle
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 26 - 38 of 38 - Collapse all < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Ilya Zakharevich  
View profile
 More options May 16, 5:23 pm
Newsgroups: comp.lang.perl.misc
From: Ilya Zakharevich <nospam-ab...@ilyaz.org>
Date: Fri, 16 May 2008 21:23:40 +0000 (UTC)
Local: Fri, May 16 2008 5:23 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle
[A complimentary Cc of this posting was sent to
Uri Guttman
<u...@stemsystems.com>], who wrote in article <x7tzgy1aus....@mail.sysarch.com>:

>   >> better but forking off lynx is still slow. LWP should be much faster. if
>   >> you want speed (and with the data size you have, you want it), use LWP.

>   IZ> This may depend on many parameters, but the overhead of
>   IZ> system()ing may be quite low.  The overhead of opening a new HTTP
>   IZ> connection for each line may be larger.  LWP will have a chance to
>   IZ> use persistent connections...

> i highly doubt forking lynx and it doing a fetch with passing the page
> back via a pipe would be faster than a direct call to lwp and getting
> the page in ram. it would have to be a very odd system for the lynx
> solution to be faster.

> and lynx would have to always open a new connection as forked procs have
> no memory.

I do not think you understood what I wrote.

I'm not claiming that *this* overhead is small.  What I say is that
*other* overheads may be not negligible.

Anyway, all overheads I know are in favor on LWP.

Hope this helps,
Ilya


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gordon Etly  
View profile
 More options May 16, 5:30 pm
Newsgroups: comp.lang.perl.misc
From: "Gordon Etly" <g.e...@bentsys.INVALID.com>
Date: Fri, 16 May 2008 14:30:03 -0700
Local: Fri, May 16 2008 5:30 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

Keith Keller wrote:
> On 2008-05-16, Gordon Etly <g.e...@bentsys.INVALID.com> wrote:
> > Any email address is not an identity. It's an email address. The
> > "Name" field is your identity), and I have not changed that.
> There is no "Name" field.  The From: header often includes both a name
> and an email address.

Many readers separate the "name" and "email" fields. I never changed my
name. The email address part of the From: line is not a atatic entity;
one can always change their email address. It's anyone's right to do so,
as it's their info. You're not suggesting an email address is a reliable
way of tracking someone, are you?

> Changing one's From: header as often as you have is a strong
> indicator of a troll.

Or someone who does not wish to satisfy someone's false notion that they
can force the last word using that tired old method. If they going to
reply and then inform you that you're killfiled, as if the public really
needs to know (#1), then it is no less wrong to circumvent their
killfile; it's attack an counter, something that's existed as long as
man.

If one really wants to ignore me, they can either not read my posts or
block my name, as that remains constant.

> > Lastly, attempting to pose that "identity" on a medium like UseNet
> > actually meaning something is idiotic at best. There is no guarantee
> > that a name you see is a real name, and in many cases it is not.
> > Many
> > people use a "nick" name of sorts, and it is quite common to use a
> > false or munged email address to thwart spammer email harvesting.
> It is not common to alter the From: header

This is untrue. I see many people post one day with one name and/or
email and the next time I see a variant of their Name (or a nick name)
and/or a differing email address.

> no matter whether your name is Gordon Etly, Gordon Gekko, or Trolly
> McTroll.

My name has always been Gordon Etly. That is my identity; my name. If
one wishes to killfile me using that, then they are welcome to do so. If
they killfile me by email address then

(#1)
If you true need to ignore someone, you don't need to announce the fact,
or for that matter, one doesn't need a killfile either, though it can be
nice.

--
G.Etly


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Sinan Unur  
View profile
 More options May 16, 7:57 pm
Newsgroups: comp.lang.perl.misc
From: "A. Sinan Unur" <1...@llenroc.ude.invalid>
Date: Fri, 16 May 2008 23:57:05 GMT
Local: Fri, May 16 2008 7:57 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle
cha...@lonemerchant.com wrote in news:53668582-db24-4534-8950-de30e9c96e10
@k10g2000prm.googlegroups.com:

> I 'll eventually have the input file filled with 350 million items.

Incidentally, if you could do three pages in a second, this corresponds to
about 3.7 years of continues scraping.

If you try to do this in massively parallel way, then it might be
considered a denial of service attack.

Of course, if you could do that, then the performance constraints of the
web server on the other and of the connection kick in.

I am not sure if it is a good idea for you to invest any more time &
resources into improving the performance of your script.

Sinan
--
A. Sinan Unur <1...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
A. Sinan Unur  
View profile
 More options May 16, 9:05 pm
Newsgroups: comp.lang.perl.misc
From: "A. Sinan Unur" <1...@llenroc.ude.invalid>
Date: Sat, 17 May 2008 01:05:41 GMT
Local: Fri, May 16 2008 9:05 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle
"Gordon Etly" <g.e...@bentsys.INVALID.com> wrote in
news:696881F30sdfjU1@mid.individual.net:

> A. Sinan Unur wrote:
>> "Gordon Etly" wrote in

>> > Jürgen Exner wrote:

>> Noting from the Anti-Troll FAQ:

>> Subject: 7.6  Morphed Identity

>> A morphed identity is when a poster has one usenet identity,

> Any email address is not an identity. It's an email address. The
> "Name" field is your identity), and I have not changed that.
> I am free to change my email address field however I wish,
> as are you and anyone else.

In newsgroups, your identity is your full handle. It does not matter if
that does not correspond to your real life identity. So, so long as you
pick one, and stick with it, no one has a problem with it.

Except,

>> Sender Address
>> The e-mail addresses given in "From:", "Reply-To:", and "Sender:"
>> should be valid (= should not bounce because of invalidity). Using
>> addresses and name space of other people without their permission is
>> prohibited.

You snipped the source of that rule. That is a rule stated by the
service provider you chose.

> Being in control of your mail server actually allows you to fulfill
> the "should not bounce because of invalidity" if you want to get down
> to that.

That's funny because most of the domain names you use are not
registered. I am not sure how you are running a mail servers for non-
existent domains.

Second, some of the domains you use are registered but do not seem to be
owned by someone named Gordon Etly.

> How a poster writes their email address is completely up to
> that person. A rather large amount of people munge their email
> addresses, so this isn't even an issue.

From other users' perspective, what matters is that you pick one and
stick with it. It seems like your service provider has explicit policies  
prohibiting you from using non-existent domains or domains owned by
others. So, you should argue this point with them.

> Lastly, attempting to pose that "identity" on a medium like UseNet
> actually meaning something is idiotic at best. There is no guarantee
> that a name you see is a real name, and in many cases it is not. Many
> people use a "nick" name of sorts, and it is quite common to use a
> false or munged email address to thwart spammer email harvesting.

And that is completely irrelevant.

Sinan

--
A. Sinan Unur <1...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jürgen Exner  
View profile
 More options May 16, 10:04 pm
Newsgroups: comp.lang.perl.misc
From: Jürgen Exner <jurge...@hotmail.com>
Date: Sat, 17 May 2008 02:04:11 GMT
Local: Fri, May 16 2008 10:04 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

"Gordon Etly" <g.e...@bentsys.INVALID.com> wrote:
>Jürgen Exner wrote:
>> "Gordon Etly" <g.e...@bent-INVALID-sys.com> wrote:

>> > I'm just pointing out what is. It's you who keep bringing this upon
>> > yourself. You are constantly rude and arrogant to people, then you

>> Changing your identity again because everyone filtered you?

>1) My identity has never changed.

Oh really? So
        Author: Gordon Etly <g...@bentsys.com>
        Author: Gordon Etly <ge...@bentsys-INVALID.com>
        Author: Gordon Etly <g.e...@bent-INVALID-sys.com>
was not you? How come that I don't believe you?

And now using identity number 4:
        Author: Gordon Etly <g.e...@bentsys.INVALID.com>?

You must have a _REALLY_ bad reputation that you feel the need to change
your ID every other day.

>2) Why are you trying to speak for everyone. While certain people may
>share your view (and vice versa), it doesn't mean you speak for the
>whole of the group.

I never claimed to speak for anyone but myself.

jue


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jürgen Exner  
View profile
 More options May 16, 10:12 pm
Newsgroups: comp.lang.perl.misc
From: Jürgen Exner <jurge...@hotmail.com>
Date: Sat, 17 May 2008 02:12:07 GMT
Local: Fri, May 16 2008 10:12 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

"Gordon Etly" <g.e...@bentsys.INVALID.com> wrote:
>Keith Keller wrote:
>> On 2008-05-16, Gordon Etly <g.e...@bentsys.INVALID.com> wrote:
>> There is no "Name" field.  The From: header often includes both a name
>> and an email address.

>Many readers separate the "name" and "email" fields.

Nonsense. There is a From header. And maybe a ReplyTo header. And maybe
a FollowupTo header. But there is no such thing as a "Name" or an
"Email" header field in the first place.

>I never changed my
>name. The email address part of the From: line is not a atatic entity;
>one can always change their email address. It's anyone's right to do so,
>as it's their info. You're not suggesting an email address is a reliable
>way of tracking someone, are you?

If someone has to change it frequently then it is a very good indication
that that person has something to hide in their past. Why else would
they change their ID frequently?

Back you go to where you crawled out from

jue


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gordon Etly  
View profile
 More options May 16, 11:20 pm
Newsgroups: comp.lang.perl.misc
From: "Gordon Etly" <g.e...@bentsys.INVALID.com>
Date: Fri, 16 May 2008 20:20:16 -0700
Local: Fri, May 16 2008 11:20 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

Jürgen Exner wrote:
> "Gordon Etly" <g.e...@bentsys.INVALID.com> wrote:
> > Jürgen Exner wrote:
> > > "Gordon Etly" <g.e...@bent-INVALID-sys.com> wrote:
> > > > I'm just pointing out what is. It's you who keep bringing this
> > > > upon
> > > > yourself. You are constantly rude and arrogant to people, then
> > > > you
> > > Changing your identity again because everyone filtered you?
> > 1) My identity has never changed.
> Oh really? So
> Author: Gordon Etly <g...@bentsys.com>
> Author: Gordon Etly <ge...@bentsys-INVALID.com>
> Author: Gordon Etly <g.e...@bent-INVALID-sys.com>
> was not you?

My name never changed. Email address is not an identity, it's an email
address. They are a variable field. One can always change it, so stop
trying to use that as an argument here. I said before my name never
changed and you just proved that for me.

> > 2) Why are you trying to speak for everyone. While certain people
> > may
> > share your view (and vice versa), it doesn't mean you speak for the
> > whole of the group.
> I never claimed to speak for anyone but myself.

Not true:

( from above )

> > > Changing your identity again because everyone filtered you?

You clearly implied you knew -everyone- had done it. Stop trying to
misrepresent things in order to formulate your arguments.

--
G.Etly


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gordon Etly  
View profile
 More options May 16, 11:28 pm
Newsgroups: comp.lang.perl.misc
From: "Gordon Etly" <g.e...@bentsys.INVALID.com>
Date: Fri, 16 May 2008 20:28:42 -0700
Local: Fri, May 16 2008 11:28 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

Jürgen Exner wrote:
> "Gordon Etly" <g.e...@bentsys.INVALID.com> wrote:
> > Keith Keller wrote:
> > > On 2008-05-16, Gordon Etly <g.e...@bentsys.INVALID.com> wrote:
> > > There is no "Name" field.  The From: header often includes both a
> > > name and an email address.
> > Many readers separate the "name" and "email" fields.
> Nonsense. There is a From header. And maybe a ReplyTo header. And
> maybe a FollowupTo header. But there is no such thing as a "Name" or
> an "Email" header field in the first place.

No, most readers that I've used give separate fields for Name and Email.
It writes the From: header behind the scenes. Either way, it doesn't
change the fact that Email part is a variable field that can change at
any time. Whether it's from changing email providers, or any number of
reasons (which one is not required to disclose), it is a person's own
choice what they want to display to the public as an email address.

Hell, some providers don't even require an email address (I once had one
when I was in Europe for a few months that allowed "Name < >" (a space
for an email), which I realized when I forgot to enter an email. Granted
most don't allow it, but the point is what ever it is, it's up to the
poster.

> If someone has to change it frequently then it is a very good
> indication that that person has something to hide in their past.

Err... I never changed my name, so how could I possible be trying to
hide? Actualyl quite the oppisite, I change the way my email appears in
the From: like so I am -NOT- hidden :)

--
G.Etly


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gordon Etly  
View profile
 More options May 16, 11:39 pm
Newsgroups: comp.lang.perl.misc
From: "Gordon Etly" <g.e...@bentsys.INVALID.com>
Date: Fri, 16 May 2008 20:39:38 -0700
Local: Fri, May 16 2008 11:39 pm
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

So what? I am not violating it.

> > Being in control of your mail server actually allows you to fulfill
> > the "should not bounce because of invalidity" if you want to get
> > down
> > to that.
> That's funny because most of the domain names you use are not
> registered.

Please stop playing stupid. I am not the first to add "invalid" or
"nospam" or so to my email address. IT's a common practice and it've
never been prohibted by any privider I've come across. Bottom line: the
email address you enter is for public display and that's what many
harvesters look for.

> Second, some of the domains you use are registered

I only use one domain. You know very well about munging practices so
please stop feigning ignorance so suddenly.

> but do not seem to be owned by someone named Gordon Etly.

Come on, really. How many @aol, @yahoo, etc etc etc own those domains?
You know better than to make such an arugement. Most people -don't- own
the domain their email is in.

> > How a poster writes their email address is completely up to
> > that person. A rather large amount of people munge their email
> > addresses, so this isn't even an issue.
> From other users' perspective, what matters is that you pick one and
> stick with it.

One does not have to use the same email address. One is free to change
that to what ever they wish.

--
G.Etly


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile
 More options May 17, 5:03 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Sat, 17 May 2008 10:03:09 +0100
Local: Sat, May 17 2008 5:03 am
Subject: Re: Need ideas on how to make this code faster than a speeding turtle

Quoth Jürgen Exner <jurge...@hotmail.com>:

> "Gordon Etly" <g.e...@bentsys.INVALID.com> wrote:
> >Keith Keller wrote:
> >> On 2008-05-16, Gordon Etly <g.e...@bentsys.INVALID.com> wrote:
> >> There is no "Name" field.  The From: header often includes both a name
> >> and an email address.

> >Many readers separate the "name" and "email" fields.

> Nonsense. There is a From header. And maybe a ReplyTo header. And maybe
> a FollowupTo header. But there is no such thing as a "Name" or an
> "Email" header field in the first place.

          +-------------------+             .:\:\:/:/:.
          |   PLEASE DO NOT   |            :.:\:\:/:/:.:
          |  FEED THE TROLLS  |           :=.' -   - '.=:
          |                   |           '=(\ 9   9 /)='
          |   Thank you,      |              (  (_)  )
          |       Management  |              /`-vvv-'\
          +-------------------+             /         \
                  |  |        @@@          / /|,,,,,|\ \
                  |  |        @@@         /_//  /^\  \\_\
    @x@@x@        |  |         |/         WW(  (   )  )WW
    \||||/        |  |        \|           __\,,\ /,,/__
     \||/         |  |         |          (______Y______)
/\/\/\/\/\/\/\/\//\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
==================================================================

Ben

--
  Joy and Woe are woven fine,
  A Clothing for the Soul divine       William Blake
  Under every grief and pine          'Auguries of Innocence'
  Runs a joy with silken twine.                                b...@morrow.me.uk


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.