Discussion:
No for each loop comment?
Gary Gregory
2014-09-25 03:12:47 UTC
Permalink
Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?

Why not for an enhanced for each loop?

private static boolean contains(final Marker parent, final Marker...
localParents) {
//noinspection ForLoopReplaceableByForEach
for (int i = 0, localParentsLength = localParents.length; i <
localParentsLength; i++) {
final Marker marker = localParents[i];
if (marker == parent) {
return true;
}
}
return false;
}

Thanks,
Gary

--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Matt Sicker
2014-09-25 03:46:02 UTC
Permalink
>From what I remember, it had something to do with the incredibly large
difference in speed between for loops and foreach loops on arrays. And by
incredibly large, I mean most likely negligible.

On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:

> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>
> Why not for an enhanced for each loop?
>
> private static boolean contains(final Marker parent, final Marker...
> localParents) {
> //noinspection ForLoopReplaceableByForEach
> for (int i = 0, localParentsLength = localParents.length; i <
> localParentsLength; i++) {
> final Marker marker = localParents[i];
> if (marker == parent) {
> return true;
> }
> }
> return false;
> }
>
> Thanks,
> Gary
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



--
Matt Sicker <***@gmail.com>
Remko Popma
2014-09-25 04:10:32 UTC
Permalink
> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>
> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
:-)
I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)

On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.

>
>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>
>> Why not for an enhanced for each loop?
>>
>> private static boolean contains(final Marker parent, final Marker... localParents) {
>> //noinspection ForLoopReplaceableByForEach
>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>> final Marker marker = localParents[i];
>> if (marker == parent) {
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> Thanks,
>> Gary
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> JUnit in Action, Second Edition
>> Spring Batch in Action
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>
>
>
> --
> Matt Sicker <***@gmail.com>
Ralph Goers
2014-09-25 15:18:08 UTC
Permalink
Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.

Ralph

On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:

>
> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>
>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
> :-)
> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>
> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>
>>
>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>
>> Why not for an enhanced for each loop?
>>
>> private static boolean contains(final Marker parent, final Marker... localParents) {
>> //noinspection ForLoopReplaceableByForEach
>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>> final Marker marker = localParents[i];
>> if (marker == parent) {
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> Thanks,
>> Gary
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> JUnit in Action, Second Edition
>> Spring Batch in Action
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
Remko Popma
2014-09-25 15:47:53 UTC
Permalink
Hm.. Why did I think it was configuration? I must have gotten mixed up with
another commit email...
The class is MarkerManager in log4j-api.

On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com>
wrote:

> Configuration? If I recall correctly this method is called on every log
> event that contains a Marker. But I am just guessing since Gary neglected
> to say what class this is. But I do remember doing extensive testing when
> this code was written. And I also remember someone (probably Gary)
> mentioning then that it should use a for-loop and we had this same
> conversation then. I think that is why the comment was added.
>
> Ralph
>
> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>
>
> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>
> From what I remember, it had something to do with the incredibly large
> difference in speed between for loops and foreach loops on arrays. And by
> incredibly large, I mean most likely negligible.
>
> :-)
> I do remember reading that someone found a speed difference. But I've
> never verified it. (Note to self: write a quick jmh benchmark for this.)
>
> On the other hand, this is configuration, so it only happens once and is
> very unlikely to be "hot" code so there is probably not much value in
> optimizing this loop.
>
>
> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>
>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>
>> Why not for an enhanced for each loop?
>>
>> private static boolean contains(final Marker parent, final Marker...
>> localParents) {
>> //noinspection ForLoopReplaceableByForEach
>> for (int i = 0, localParentsLength = localParents.length; i <
>> localParentsLength; i++) {
>> final Marker marker = localParents[i];
>> if (marker == parent) {
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> Thanks,
>> Gary
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
>
>
Paul Benedict
2014-09-25 15:51:01 UTC
Permalink
I would be surprised if foreach over an array makes a speed difference.
AFAIK, foreach is synatic sugar. There is no iterator for an array so it
has to be desugared using a for/index loop like you have there. I don't
think this code is saving anything.


Cheers,
Paul

On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:

> Hm.. Why did I think it was configuration? I must have gotten mixed up
> with another commit email...
> The class is MarkerManager in log4j-api.
>
> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com>
> wrote:
>
>> Configuration? If I recall correctly this method is called on every log
>> event that contains a Marker. But I am just guessing since Gary neglected
>> to say what class this is. But I do remember doing extensive testing when
>> this code was written. And I also remember someone (probably Gary)
>> mentioning then that it should use a for-loop and we had this same
>> conversation then. I think that is why the comment was added.
>>
>> Ralph
>>
>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>
>>
>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>
>> From what I remember, it had something to do with the incredibly large
>> difference in speed between for loops and foreach loops on arrays. And by
>> incredibly large, I mean most likely negligible.
>>
>> :-)
>> I do remember reading that someone found a speed difference. But I've
>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>
>> On the other hand, this is configuration, so it only happens once and is
>> very unlikely to be "hot" code so there is probably not much value in
>> optimizing this loop.
>>
>>
>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>
>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>
>>> Why not for an enhanced for each loop?
>>>
>>> private static boolean contains(final Marker parent, final Marker...
>>> localParents) {
>>> //noinspection ForLoopReplaceableByForEach
>>> for (int i = 0, localParentsLength = localParents.length; i
>>> < localParentsLength; i++) {
>>> final Marker marker = localParents[i];
>>> if (marker == parent) {
>>> return true;
>>> }
>>> }
>>> return false;
>>> }
>>>
>>> Thanks,
>>> Gary
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <http://www.manning.com/bauer3/>
>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>> Spring Batch in Action <http://www.manning.com/templier/>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
>>
>>
>>
>
Gary Gregory
2014-09-25 15:53:50 UTC
Permalink
On Thu, Sep 25, 2014 at 11:51 AM, Paul Benedict <***@apache.org>
wrote:

> I would be surprised if foreach over an array makes a speed difference.
> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
> has to be desugared using a for/index loop like you have there. I don't
> think this code is saving anything.
>

I would be a sad day for any compiler if it did!

Gary


>
>
> Cheers,
> Paul
>
> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
> wrote:
>
>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>> with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com
>> > wrote:
>>
>>> Configuration? If I recall correctly this method is called on every log
>>> event that contains a Marker. But I am just guessing since Gary neglected
>>> to say what class this is. But I do remember doing extensive testing when
>>> this code was written. And I also remember someone (probably Gary)
>>> mentioning then that it should use a for-loop and we had this same
>>> conversation then. I think that is why the comment was added.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>> From what I remember, it had something to do with the incredibly large
>>> difference in speed between for loops and foreach loops on arrays. And by
>>> incredibly large, I mean most likely negligible.
>>>
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've
>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>
>>> On the other hand, this is configuration, so it only happens once and is
>>> very unlikely to be "hot" code so there is probably not much value in
>>> optimizing this loop.
>>>
>>>
>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker...
>>>> localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i
>>>> < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>>
>>>
>>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Ralph Goers
2014-09-25 15:55:27 UTC
Permalink
You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.

Ralph

On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:

> I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.
>
>
> Cheers,
> Paul
>
> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
> The class is MarkerManager in log4j-api.
>
> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>
> Ralph
>
> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>
>>
>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>
>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>> :-)
>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>>
>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>
>>>
>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>
>>> Why not for an enhanced for each loop?
>>>
>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>> //noinspection ForLoopReplaceableByForEach
>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>> final Marker marker = localParents[i];
>>> if (marker == parent) {
>>> return true;
>>> }
>>> }
>>> return false;
>>> }
>>>
>>> Thanks,
>>> Gary
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> JUnit in Action, Second Edition
>>> Spring Batch in Action
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>
>
>
Paul Benedict
2014-09-25 16:28:17 UTC
Permalink
I wonder if accessing the .length instance variable of the array is slower
than referencing the local variable you stored it in. I am actually
surprised to hear of your results Ralph, but you did the testing so I
believe you.


Cheers,
Paul

On Thu, Sep 25, 2014 at 10:55 AM, Ralph Goers <***@dslextreme.com>
wrote:

> You can think that, but the testing in the testing I did at the time the
> difference was quite noticeable. I would have left it as a foreach if it
> wasn’t.
>
> Ralph
>
> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>
> I would be surprised if foreach over an array makes a speed difference.
> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
> has to be desugared using a for/index loop like you have there. I don't
> think this code is saving anything.
>
>
> Cheers,
> Paul
>
> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
> wrote:
>
>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>> with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com
>> > wrote:
>>
>>> Configuration? If I recall correctly this method is called on every log
>>> event that contains a Marker. But I am just guessing since Gary neglected
>>> to say what class this is. But I do remember doing extensive testing when
>>> this code was written. And I also remember someone (probably Gary)
>>> mentioning then that it should use a for-loop and we had this same
>>> conversation then. I think that is why the comment was added.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>> From what I remember, it had something to do with the incredibly large
>>> difference in speed between for loops and foreach loops on arrays. And by
>>> incredibly large, I mean most likely negligible.
>>>
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've
>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>
>>> On the other hand, this is configuration, so it only happens once and is
>>> very unlikely to be "hot" code so there is probably not much value in
>>> optimizing this loop.
>>>
>>>
>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker...
>>>> localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i
>>>> < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>>
>>>
>>
>
>
Gary Gregory
2014-09-25 16:33:19 UTC
Permalink
On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com>
wrote:

> You can think that, but the testing in the testing I did at the time the
> difference was quite noticeable. I would have left it as a foreach if it
> wasn’t.
>
> Ralph
>
> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>
> I would be surprised if foreach over an array makes a speed difference.
> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
> has to be desugared using a for/index loop like you have there. I don't
> think this code is saving anything.
>
>
> Cheers,
> Paul
>
> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
> wrote:
>
>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>> with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com
>> > wrote:
>>
>>> Configuration? If I recall correctly this method is called on every log
>>> event that contains a Marker. But I am just guessing since Gary neglected
>>> to say what class this is. But I do remember doing extensive testing when
>>> this code was written. And I also remember someone (probably Gary)
>>> mentioning then that it should use a for-loop and we had this same
>>> conversation then. I think that is why the comment was added.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>> From what I remember, it had something to do with the incredibly large
>>> difference in speed between for loops and foreach loops on arrays. And by
>>> incredibly large, I mean most likely negligible.
>>>
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've
>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>
>>>
I'd be curious to see the results!

Gary


>
>>> On the other hand, this is configuration, so it only happens once and is
>>> very unlikely to be "hot" code so there is probably not much value in
>>> optimizing this loop.
>>>
>>>
>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker...
>>>> localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i
>>>> < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>>
>>>
>>
>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Matt Sicker
2014-09-25 17:56:17 UTC
Permalink
The foreach over an array looks like it's supposed to compile to the same
thing:

https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html

Same goes for .length which is supposed to be a final field which would
allow for inlining by the JIT I'd imagine (hence why we use final
everywhere):

http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7

On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:

> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com>
> wrote:
>
>> You can think that, but the testing in the testing I did at the time the
>> difference was quite noticeable. I would have left it as a foreach if it
>> wasn’t.
>>
>> Ralph
>>
>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>>
>> I would be surprised if foreach over an array makes a speed difference.
>> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
>> has to be desugared using a for/index loop like you have there. I don't
>> think this code is saving anything.
>>
>>
>> Cheers,
>> Paul
>>
>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>> wrote:
>>
>>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>>> with another commit email...
>>> The class is MarkerManager in log4j-api.
>>>
>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>> ***@dslextreme.com> wrote:
>>>
>>>> Configuration? If I recall correctly this method is called on every
>>>> log event that contains a Marker. But I am just guessing since Gary
>>>> neglected to say what class this is. But I do remember doing extensive
>>>> testing when this code was written. And I also remember someone (probably
>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>> conversation then. I think that is why the comment was added.
>>>>
>>>> Ralph
>>>>
>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>>>
>>>>
>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>
>>>> From what I remember, it had something to do with the incredibly large
>>>> difference in speed between for loops and foreach loops on arrays. And by
>>>> incredibly large, I mean most likely negligible.
>>>>
>>>> :-)
>>>> I do remember reading that someone found a speed difference. But I've
>>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>>
>>>>
> I'd be curious to see the results!
>
> Gary
>
>
>>
>>>> On the other hand, this is configuration, so it only happens once and
>>>> is very unlikely to be "hot" code so there is probably not much value in
>>>> optimizing this loop.
>>>>
>>>>
>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>> wrote:
>>>>
>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>> mean?
>>>>>
>>>>> Why not for an enhanced for each loop?
>>>>>
>>>>> private static boolean contains(final Marker parent, final Marker...
>>>>> localParents) {
>>>>> //noinspection ForLoopReplaceableByForEach
>>>>> for (int i = 0, localParentsLength = localParents.length;
>>>>> i < localParentsLength; i++) {
>>>>> final Marker marker = localParents[i];
>>>>> if (marker == parent) {
>>>>> return true;
>>>>> }
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Gary
>>>>>
>>>>> --
>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>> Java Persistence with Hibernate, Second Edition
>>>>> <http://www.manning.com/bauer3/>
>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>> Blog: http://garygregory.wordpress.com
>>>>> Home: http://garygregory.com/
>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>>>
>>>>
>>>>
>>>
>>
>>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



--
Matt Sicker <***@gmail.com>
Remko Popma
2014-09-26 13:43:52 UTC
Permalink
On Windows it looks like normal for loops are slightly faster than for-each
loops, especially for small arrays of primitives. This could be noise,
since we are talking about 5 nanoseconds where the baseline (an empty
method invocation) is 12 nanos.

On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
nanos and 1910 nanos respectively) that any difference we are seeing is
just noise.

All benchmarks were run with one fork, one thread, 10 warmup iterations and
10 test iterations.

*Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
with hyperthreading switched on (4 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
12.432 0.550 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
2759.592 3.431 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
2761.729 3.127 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
292.880 1.065 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
288.751 1.101 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
41.826 0.870 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
36.894 0.782 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
22.393 0.618 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
17.146 0.560 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
31959.057 14.341 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
32461.985 14.353 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
3591.200 4.852 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
3445.998 4.010 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
438.207 1.923 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
439.576 2.139 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
58.957 1.247 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
60.712 1.284 ns/op


// For loops for Object arrays are similar but return the total XOR of the
element hashcodes.

private int forEachLoop(final int[] array) {
int result = 0;
for (final int element : array) {
result ^= element;
}
return result;
}

private int forLoop(final int[] array) {
int result = 0;
for (int i = 0; i < array.length; i++) {
result ^= array[i];
}
return result;
}



*Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon
X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
255.300 0.201 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
3938.055 1.207 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
3937.929 0.748 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
606.631 0.626 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
609.565 0.416 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
294.204 0.280 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
296.411 0.223 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
261.519 0.181 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
260.435 0.115 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
48154.673 18.846 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
47793.868 17.615 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
5256.767 2.451 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
5325.377 2.388 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
773.541 0.330 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
774.513 0.574 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
317.232 0.134 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
316.189 0.238 ns/op

*64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle
Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched
on (16 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
1910.576 29.256 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
5132.885 25.137 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
4811.572 52.072 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
1967.213 28.970 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
2004.501 31.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
1575.329 6.457 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
1957.714 27.815 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
1980.301 30.818 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
1589.120 8.449 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
43301.320 50.589 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
43574.129 55.272 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
5831.250 19.667 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
4823.096 13.180 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
1930.819 24.136 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
1625.806 10.385 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
1888.683 22.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
1581.979 6.322 ns/op




On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:

> The foreach over an array looks like it's supposed to compile to the same
> thing:
>
> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>
> Same goes for .length which is supposed to be a final field which would
> allow for inlining by the JIT I'd imagine (hence why we use final
> everywhere):
>
> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>
> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>
>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com
>> > wrote:
>>
>>> You can think that, but the testing in the testing I did at the time the
>>> difference was quite noticeable. I would have left it as a foreach if it
>>> wasn’t.
>>>
>>> Ralph
>>>
>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>>>
>>> I would be surprised if foreach over an array makes a speed difference.
>>> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
>>> has to be desugared using a for/index loop like you have there. I don't
>>> think this code is saving anything.
>>>
>>>
>>> Cheers,
>>> Paul
>>>
>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>> wrote:
>>>
>>>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>>>> with another commit email...
>>>> The class is MarkerManager in log4j-api.
>>>>
>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>> ***@dslextreme.com> wrote:
>>>>
>>>>> Configuration? If I recall correctly this method is called on every
>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>> testing when this code was written. And I also remember someone (probably
>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>> conversation then. I think that is why the comment was added.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>
>>>>> From what I remember, it had something to do with the incredibly large
>>>>> difference in speed between for loops and foreach loops on arrays. And by
>>>>> incredibly large, I mean most likely negligible.
>>>>>
>>>>> :-)
>>>>> I do remember reading that someone found a speed difference. But I've
>>>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>>>
>>>>>
>> I'd be curious to see the results!
>>
>> Gary
>>
>>
>>>
>>>>> On the other hand, this is configuration, so it only happens once and
>>>>> is very unlikely to be "hot" code so there is probably not much value in
>>>>> optimizing this loop.
>>>>>
>>>>>
>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>> mean?
>>>>>>
>>>>>> Why not for an enhanced for each loop?
>>>>>>
>>>>>> private static boolean contains(final Marker parent, final Marker...
>>>>>> localParents) {
>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>> for (int i = 0, localParentsLength = localParents.length;
>>>>>> i < localParentsLength; i++) {
>>>>>> final Marker marker = localParents[i];
>>>>>> if (marker == parent) {
>>>>>> return true;
>>>>>> }
>>>>>> }
>>>>>> return false;
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Gary
>>>>>>
>>>>>> --
>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>> <http://www.manning.com/bauer3/>
>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>> Blog: http://garygregory.wordpress.com
>>>>>> Home: http://garygregory.com/
>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Sicker <***@gmail.com>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
Mikael Ståldal
2014-09-26 14:23:01 UTC
Permalink
Have you compared the generated byte code (using javap -c) for the two
cases?

On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com> wrote:

> On Windows it looks like normal for loops are slightly faster than
> for-each loops, especially for small arrays of primitives. This could be
> noise, since we are talking about 5 nanoseconds where the baseline (an
> empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
> nanos and 1910 nanos respectively) that any difference we are seeing is
> just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations
> and 10 test iterations.
>
> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
> with hyperthreading switched on (4 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
> 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
> 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
> 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
> 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
> 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
> 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
> 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
> 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
> 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
> 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
> 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
> 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
> 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
> 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
> 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
> 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
> 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the
> element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
> cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
> 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
> 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
> 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
> 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
> 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
> 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
> 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
> 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
> 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
> 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
> 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
> 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
> 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
> 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
> 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
> 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
> 316.189 0.238 ns/op
>
> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
> switched on (16 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
> 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
> 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
> 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
> 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
> 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
> 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
> 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
> 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
> 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
> 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
> 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
> 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
> 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
> 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
> 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
> 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
> 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>
>> The foreach over an array looks like it's supposed to compile to the same
>> thing:
>>
>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>
>> Same goes for .length which is supposed to be a final field which would
>> allow for inlining by the JIT I'd imagine (hence why we use final
>> everywhere):
>>
>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>
>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>
>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>> ***@dslextreme.com> wrote:
>>>
>>>> You can think that, but the testing in the testing I did at the time
>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>> it wasn’t.
>>>>
>>>> Ralph
>>>>
>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>> wrote:
>>>>
>>>> I would be surprised if foreach over an array makes a speed difference.
>>>> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
>>>> has to be desugared using a for/index loop like you have there. I don't
>>>> think this code is saving anything.
>>>>
>>>>
>>>> Cheers,
>>>> Paul
>>>>
>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>> wrote:
>>>>
>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>>>>> with another commit email...
>>>>> The class is MarkerManager in log4j-api.
>>>>>
>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>> ***@dslextreme.com> wrote:
>>>>>
>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>> conversation then. I think that is why the comment was added.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>
>>>>>> From what I remember, it had something to do with the incredibly
>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>
>>>>>> :-)
>>>>>> I do remember reading that someone found a speed difference. But I've
>>>>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>>>>
>>>>>>
>>> I'd be curious to see the results!
>>>
>>> Gary
>>>
>>>
>>>>
>>>>>> On the other hand, this is configuration, so it only happens once and
>>>>>> is very unlikely to be "hot" code so there is probably not much value in
>>>>>> optimizing this loop.
>>>>>>
>>>>>>
>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>> mean?
>>>>>>>
>>>>>>> Why not for an enhanced for each loop?
>>>>>>>
>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>> Marker... localParents) {
>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>> for (int i = 0, localParentsLength =
>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>> final Marker marker = localParents[i];
>>>>>>> if (marker == parent) {
>>>>>>> return true;
>>>>>>> }
>>>>>>> }
>>>>>>> return false;
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gary
>>>>>>>
>>>>>>> --
>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>> Home: http://garygregory.com/
>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Sicker <***@gmail.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <http://www.manning.com/bauer3/>
>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>> Spring Batch in Action <http://www.manning.com/templier/>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
>>
>
>


--
Mikael Ståldal
Chief Software Architect
*Appear*
Phone: +46 8 545 91 572
Email: ***@appearnetworks.com
Remko Popma
2014-09-26 14:27:47 UTC
Permalink
Nope. Maybe I'll get around that next week. If you have time to do that,
please share!

On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
***@appearnetworks.com> wrote:

> Have you compared the generated byte code (using javap -c) for the two
> cases?
>
> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
> wrote:
>
>> On Windows it looks like normal for loops are slightly faster than
>> for-each loops, especially for small arrays of primitives. This could be
>> noise, since we are talking about 5 nanoseconds where the baseline (an
>> empty method invocation) is 12 nanos.
>>
>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
>> nanos and 1910 nanos respectively) that any difference we are seeing is
>> just noise.
>>
>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>> and 10 test iterations.
>>
>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
>> with hyperthreading switched on (4 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>> 12.432 0.550 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>> 2759.592 3.431 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>> 2761.729 3.127 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>> 292.880 1.065 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>> 288.751 1.101 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>> 41.826 0.870 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>> 36.894 0.782 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>> 22.393 0.618 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>> 17.146 0.560 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>> 31959.057 14.341 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>> 32461.985 14.353 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>> 3591.200 4.852 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>> 3445.998 4.010 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>> 438.207 1.923 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>> 439.576 2.139 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>> 58.957 1.247 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>> 60.712 1.284 ns/op
>>
>>
>> // For loops for Object arrays are similar but return the total XOR of
>> the element hashcodes.
>>
>> private int forEachLoop(final int[] array) {
>> int result = 0;
>> for (final int element : array) {
>> result ^= element;
>> }
>> return result;
>> }
>>
>> private int forLoop(final int[] array) {
>> int result = 0;
>> for (int i = 0; i < array.length; i++) {
>> result ^= array[i];
>> }
>> return result;
>> }
>>
>>
>>
>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>> cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>> 255.300 0.201 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>> 3938.055 1.207 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>> 3937.929 0.748 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>> 606.631 0.626 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>> 609.565 0.416 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>> 294.204 0.280 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>> 296.411 0.223 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>> 261.519 0.181 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>> 260.435 0.115 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>> 48154.673 18.846 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>> 47793.868 17.615 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>> 5256.767 2.451 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>> 5325.377 2.388 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>> 773.541 0.330 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>> 774.513 0.574 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>> 317.232 0.134 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>> 316.189 0.238 ns/op
>>
>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>> switched on (16 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>> 1910.576 29.256 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>> 5132.885 25.137 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>> 4811.572 52.072 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>> 1967.213 28.970 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>> 2004.501 31.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>> 1575.329 6.457 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>> 1957.714 27.815 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>> 1980.301 30.818 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>> 1589.120 8.449 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>> 43301.320 50.589 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>> 43574.129 55.272 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>> 5831.250 19.667 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>> 4823.096 13.180 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>> 1930.819 24.136 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>> 1625.806 10.385 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>> 1888.683 22.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>> 1581.979 6.322 ns/op
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>
>>> The foreach over an array looks like it's supposed to compile to the
>>> same thing:
>>>
>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>
>>> Same goes for .length which is supposed to be a final field which would
>>> allow for inlining by the JIT I'd imagine (hence why we use final
>>> everywhere):
>>>
>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>
>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>> ***@dslextreme.com> wrote:
>>>>
>>>>> You can think that, but the testing in the testing I did at the time
>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>> it wasn’t.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>> wrote:
>>>>>
>>>>> I would be surprised if foreach over an array makes a speed
>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>> I don't think this code is saving anything.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Paul
>>>>>
>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>> up with another commit email...
>>>>>> The class is MarkerManager in log4j-api.
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>> ***@dslextreme.com> wrote:
>>>>>>
>>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>
>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>
>>>>>>> :-)
>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>> this.)
>>>>>>>
>>>>>>>
>>>> I'd be curious to see the results!
>>>>
>>>> Gary
>>>>
>>>>
>>>>>
>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>> in optimizing this loop.
>>>>>>>
>>>>>>>
>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>>> mean?
>>>>>>>>
>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>
>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>> Marker... localParents) {
>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>> final Marker marker = localParents[i];
>>>>>>>> if (marker == parent) {
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>> }
>>>>>>>> return false;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gary
>>>>>>>>
>>>>>>>> --
>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>> Home: http://garygregory.com/
>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>
>>
>
>
> --
> Mikael Ståldal
> Chief Software Architect
> *Appear*
> Phone: +46 8 545 91 572
> Email: ***@appearnetworks.com
>
Ralph Goers
2014-09-26 15:02:39 UTC
Permalink
FWIW, as I recall I modified the MarkerManager class and was running some test against that on my Mac. I just kept making various modifications until I got the best performance I could get.

Ralph


On Sep 26, 2014, at 7:27 AM, Remko Popma <***@gmail.com> wrote:

> Nope. Maybe I'll get around that next week. If you have time to do that, please share!
>
> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <***@appearnetworks.com> wrote:
> Have you compared the generated byte code (using javap -c) for the two cases?
>
> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com> wrote:
> On Windows it looks like normal for loops are slightly faster than for-each loops, especially for small arrays of primitives. This could be noise, since we are talking about 5 nanoseconds where the baseline (an empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255 nanos and 1910 nanos respectively) that any difference we are seeing is just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations and 10 test iterations.
>
> Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165 316.189 0.238 ns/op
>
> 64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
> The foreach over an array looks like it's supposed to compile to the same thing:
>
> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>
> Same goes for .length which is supposed to be a final field which would allow for inlining by the JIT I'd imagine (hence why we use final everywhere):
>
> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>
> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com> wrote:
> You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.
>
> Ralph
>
> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>
>> I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.
>>
>>
>> Cheers,
>> Paul
>>
>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
>> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
>> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>>
>> Ralph
>>
>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>
>
> I'd be curious to see the results!
>
> Gary
>
>>>
>>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>>
>>>>
>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> JUnit in Action, Second Edition
>>>> Spring Batch in Action
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>
>>
>>
>
>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> JUnit in Action, Second Edition
> Spring Batch in Action
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
>
>
>
> --
> Mikael Ståldal
> Chief Software Architect
> Appear
> Phone: +46 8 545 91 572
> Email: ***@appearnetworks.com
>
Remko Popma
2014-09-26 15:07:31 UTC
Permalink
So both on the Mac and on Windows it is slightly faster to use a normal for
loop.
Then there is no reason to switch to a for-each loop, is there?
Let's change it back and put a comment on there that perf tests showed a
slight advantage for normal for loops.

On Sat, Sep 27, 2014 at 12:02 AM, Ralph Goers <***@dslextreme.com>
wrote:

> FWIW, as I recall I modified the MarkerManager class and was running some
> test against that on my Mac. I just kept making various modifications
> until I got the best performance I could get.
>
> Ralph
>
>
>
> On Sep 26, 2014, at 7:27 AM, Remko Popma <***@gmail.com> wrote:
>
> Nope. Maybe I'll get around that next week. If you have time to do that,
> please share!
>
> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
> ***@appearnetworks.com> wrote:
>
>> Have you compared the generated byte code (using javap -c) for the two
>> cases?
>>
>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>> wrote:
>>
>>> On Windows it looks like normal for loops are slightly faster than
>>> for-each loops, especially for small arrays of primitives. This could be
>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>> empty method invocation) is 12 nanos.
>>>
>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
>>> nanos and 1910 nanos respectively) that any difference we are seeing is
>>> just noise.
>>>
>>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>>> and 10 test iterations.
>>>
>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>> 12.432 0.550 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>> 2759.592 3.431 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>> 2761.729 3.127 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>> 292.880 1.065 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>> 288.751 1.101 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>> 41.826 0.870 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>> 36.894 0.782 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>> 22.393 0.618 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>> 17.146 0.560 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>> 31959.057 14.341 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>> 32461.985 14.353 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>> 3591.200 4.852 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>> 3445.998 4.010 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>> 438.207 1.923 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>> 439.576 2.139 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>> 58.957 1.247 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>> 60.712 1.284 ns/op
>>>
>>>
>>> // For loops for Object arrays are similar but return the total XOR of
>>> the element hashcodes.
>>>
>>> private int forEachLoop(final int[] array) {
>>> int result = 0;
>>> for (final int element : array) {
>>> result ^= element;
>>> }
>>> return result;
>>> }
>>>
>>> private int forLoop(final int[] array) {
>>> int result = 0;
>>> for (int i = 0; i < array.length; i++) {
>>> result ^= array[i];
>>> }
>>> return result;
>>> }
>>>
>>>
>>>
>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>> cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>> 255.300 0.201 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>> 3938.055 1.207 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>> 3937.929 0.748 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>> 606.631 0.626 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>> 609.565 0.416 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>> 294.204 0.280 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>> 296.411 0.223 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>> 261.519 0.181 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>> 260.435 0.115 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>> 48154.673 18.846 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>> 47793.868 17.615 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>> 5256.767 2.451 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>> 5325.377 2.388 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>> 773.541 0.330 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>> 774.513 0.574 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>> 317.232 0.134 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>> 316.189 0.238 ns/op
>>>
>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>> switched on (16 virtual cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>> 1910.576 29.256 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>> 5132.885 25.137 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>> 4811.572 52.072 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>> 1967.213 28.970 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>> 2004.501 31.554 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>> 1575.329 6.457 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>> 1957.714 27.815 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>> 1980.301 30.818 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>> 1589.120 8.449 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>> 43301.320 50.589 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>> 43574.129 55.272 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>> 5831.250 19.667 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>> 4823.096 13.180 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>> 1930.819 24.136 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>> 1625.806 10.385 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>> 1888.683 22.554 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>> 1581.979 6.322 ns/op
>>>
>>>
>>>
>>>
>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>
>>>> The foreach over an array looks like it's supposed to compile to the
>>>> same thing:
>>>>
>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>
>>>> Same goes for .length which is supposed to be a final field which would
>>>> allow for inlining by the JIT I'd imagine (hence why we use final
>>>> everywhere):
>>>>
>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>
>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>> ***@dslextreme.com> wrote:
>>>>>
>>>>>> You can think that, but the testing in the testing I did at the time
>>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>>> it wasn’t.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>> I don't think this code is saving anything.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Paul
>>>>>>
>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>>> up with another commit email...
>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>
>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>
>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>
>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>
>>>>>>>> :-)
>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>> this.)
>>>>>>>>
>>>>>>>>
>>>>> I'd be curious to see the results!
>>>>>
>>>>> Gary
>>>>>
>>>>>
>>>>>>
>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>> in optimizing this loop.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>>>> mean?
>>>>>>>>>
>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>
>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>> Marker... localParents) {
>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>> if (marker == parent) {
>>>>>>>>> return true;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> return false;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gary
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>> Java Persistence with Hibernate, Second Edition
>>>>> <http://www.manning.com/bauer3/>
>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>> Blog: http://garygregory.wordpress.com
>>>>> Home: http://garygregory.com/
>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>>>
>>>
>>>
>>
>>
>> --
>> Mikael Ståldal
>> Chief Software Architect
>> *Appear*
>> Phone: +46 8 545 91 572
>> Email: ***@appearnetworks.com
>>
>
>
>
Gary Gregory
2014-09-26 15:18:04 UTC
Permalink
What about the byte codes? If the byte codes are the same, then the test
results show noise and the enhanced for loops are fine.

Gary

On Fri, Sep 26, 2014 at 11:07 AM, Remko Popma <***@gmail.com> wrote:

> So both on the Mac and on Windows it is slightly faster to use a normal
> for loop.
> Then there is no reason to switch to a for-each loop, is there?
> Let's change it back and put a comment on there that perf tests showed a
> slight advantage for normal for loops.
>
> On Sat, Sep 27, 2014 at 12:02 AM, Ralph Goers <***@dslextreme.com>
> wrote:
>
>> FWIW, as I recall I modified the MarkerManager class and was running
>> some test against that on my Mac. I just kept making various modifications
>> until I got the best performance I could get.
>>
>> Ralph
>>
>>
>>
>> On Sep 26, 2014, at 7:27 AM, Remko Popma <***@gmail.com> wrote:
>>
>> Nope. Maybe I'll get around that next week. If you have time to do that,
>> please share!
>>
>> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
>> ***@appearnetworks.com> wrote:
>>
>>> Have you compared the generated byte code (using javap -c) for the two
>>> cases?
>>>
>>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>>> wrote:
>>>
>>>> On Windows it looks like normal for loops are slightly faster than
>>>> for-each loops, especially for small arrays of primitives. This could be
>>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>>> empty method invocation) is 12 nanos.
>>>>
>>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large
>>>> (255 nanos and 1910 nanos respectively) that any difference we are seeing
>>>> is just noise.
>>>>
>>>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>>>> and 10 test iterations.
>>>>
>>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>>> 12.432 0.550 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>>> 2759.592 3.431 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>>> 2761.729 3.127 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>>> 292.880 1.065 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>>> 288.751 1.101 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>>> 41.826 0.870 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>>> 36.894 0.782 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>>> 22.393 0.618 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>>> 17.146 0.560 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>>> 31959.057 14.341 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>>> 32461.985 14.353 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>>> 3591.200 4.852 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>>> 3445.998 4.010 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>>> 438.207 1.923 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>>> 439.576 2.139 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>>> 58.957 1.247 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>>> 60.712 1.284 ns/op
>>>>
>>>>
>>>> // For loops for Object arrays are similar but return the total XOR of
>>>> the element hashcodes.
>>>>
>>>> private int forEachLoop(final int[] array) {
>>>> int result = 0;
>>>> for (final int element : array) {
>>>> result ^= element;
>>>> }
>>>> return result;
>>>> }
>>>>
>>>> private int forLoop(final int[] array) {
>>>> int result = 0;
>>>> for (int i = 0; i < array.length; i++) {
>>>> result ^= array[i];
>>>> }
>>>> return result;
>>>> }
>>>>
>>>>
>>>>
>>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>>> cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>>> 255.300 0.201 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>>> 3938.055 1.207 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>>> 3937.929 0.748 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>>> 606.631 0.626 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>>> 609.565 0.416 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>>> 294.204 0.280 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>>> 296.411 0.223 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>>> 261.519 0.181 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>>> 260.435 0.115 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>>> 48154.673 18.846 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>>> 47793.868 17.615 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>>> 5256.767 2.451 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>>> 5325.377 2.388 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>>> 773.541 0.330 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>>> 774.513 0.574 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>>> 317.232 0.134 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>>> 316.189 0.238 ns/op
>>>>
>>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>>> switched on (16 virtual cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>>> 1910.576 29.256 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>>> 5132.885 25.137 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>>> 4811.572 52.072 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>>> 1967.213 28.970 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>>> 2004.501 31.554 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>>> 1575.329 6.457 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>>> 1957.714 27.815 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>>> 1980.301 30.818 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>>> 1589.120 8.449 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>>> 43301.320 50.589 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>>> 43574.129 55.272 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>>> 5831.250 19.667 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>>> 4823.096 13.180 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>>> 1930.819 24.136 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>>> 1625.806 10.385 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>>> 1888.683 22.554 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>>> 1581.979 6.322 ns/op
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>>
>>>>> The foreach over an array looks like it's supposed to compile to the
>>>>> same thing:
>>>>>
>>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>>
>>>>> Same goes for .length which is supposed to be a final field which
>>>>> would allow for inlining by the JIT I'd imagine (hence why we use final
>>>>> everywhere):
>>>>>
>>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>>
>>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>>> ***@dslextreme.com> wrote:
>>>>>>
>>>>>>> You can think that, but the testing in the testing I did at the time
>>>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>>>> it wasn’t.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>>> I don't think this code is saving anything.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Paul
>>>>>>>
>>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>>>> up with another commit email...
>>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>>
>>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>>
>>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>>
>>>>>>>>> Ralph
>>>>>>>>>
>>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>>
>>>>>>>>> :-)
>>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>>> this.)
>>>>>>>>>
>>>>>>>>>
>>>>>> I'd be curious to see the results!
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>>> in optimizing this loop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach"
>>>>>>>>>> comment mean?
>>>>>>>>>>
>>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>>
>>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>>> Marker... localParents) {
>>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>>> if (marker == parent) {
>>>>>>>>>> return true;
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> return false;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Gary
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>>> JUnit in Action, Second Edition
>>>>>>>>>> <http://www.manning.com/tahchiev/>
>>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>> <http://www.manning.com/bauer3/>
>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>> Blog: http://garygregory.wordpress.com
>>>>>> Home: http://garygregory.com/
>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Sicker <***@gmail.com>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Mikael Ståldal
>>> Chief Software Architect
>>> *Appear*
>>> Phone: +46 8 545 91 572
>>> Email: ***@appearnetworks.com
>>>
>>
>>
>>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Remko Popma
2014-09-26 15:23:38 UTC
Permalink
Fair enough. But since I spent a good amount of time on these benchmarks I
think the burden is on you now to either produce evidence that for-each
loops are equivalent, or change the code back.
:-)

On Sat, Sep 27, 2014 at 12:18 AM, Gary Gregory <***@gmail.com>
wrote:

> What about the byte codes? If the byte codes are the same, then the test
> results show noise and the enhanced for loops are fine.
>
> Gary
>
> On Fri, Sep 26, 2014 at 11:07 AM, Remko Popma <***@gmail.com>
> wrote:
>
>> So both on the Mac and on Windows it is slightly faster to use a normal
>> for loop.
>> Then there is no reason to switch to a for-each loop, is there?
>> Let's change it back and put a comment on there that perf tests showed a
>> slight advantage for normal for loops.
>>
>> On Sat, Sep 27, 2014 at 12:02 AM, Ralph Goers <***@dslextreme.com
>> > wrote:
>>
>>> FWIW, as I recall I modified the MarkerManager class and was running
>>> some test against that on my Mac. I just kept making various modifications
>>> until I got the best performance I could get.
>>>
>>> Ralph
>>>
>>>
>>>
>>> On Sep 26, 2014, at 7:27 AM, Remko Popma <***@gmail.com> wrote:
>>>
>>> Nope. Maybe I'll get around that next week. If you have time to do that,
>>> please share!
>>>
>>> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
>>> ***@appearnetworks.com> wrote:
>>>
>>>> Have you compared the generated byte code (using javap -c) for the two
>>>> cases?
>>>>
>>>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>>>> wrote:
>>>>
>>>>> On Windows it looks like normal for loops are slightly faster than
>>>>> for-each loops, especially for small arrays of primitives. This could be
>>>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>>>> empty method invocation) is 12 nanos.
>>>>>
>>>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large
>>>>> (255 nanos and 1910 nanos respectively) that any difference we are seeing
>>>>> is just noise.
>>>>>
>>>>> All benchmarks were run with one fork, one thread, 10 warmup
>>>>> iterations and 10 test iterations.
>>>>>
>>>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>>>> 12.432 0.550 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>>>> 2759.592 3.431 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>>>> 2761.729 3.127 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>>>> 292.880 1.065 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>>>> 288.751 1.101 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>>>> 41.826 0.870 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>>>> 36.894 0.782 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>>>> 22.393 0.618 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>>>> 17.146 0.560 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>>>> 31959.057 14.341 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>>>> 32461.985 14.353 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>>>> 3591.200 4.852 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>>>> 3445.998 4.010 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>>>> 438.207 1.923 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>>>> 439.576 2.139 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>>>> 58.957 1.247 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>>>> 60.712 1.284 ns/op
>>>>>
>>>>>
>>>>> // For loops for Object arrays are similar but return the total XOR of
>>>>> the element hashcodes.
>>>>>
>>>>> private int forEachLoop(final int[] array) {
>>>>> int result = 0;
>>>>> for (final int element : array) {
>>>>> result ^= element;
>>>>> }
>>>>> return result;
>>>>> }
>>>>>
>>>>> private int forLoop(final int[] array) {
>>>>> int result = 0;
>>>>> for (int i = 0; i < array.length; i++) {
>>>>> result ^= array[i];
>>>>> }
>>>>> return result;
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>>>> cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>>>> 255.300 0.201 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>>>> 3938.055 1.207 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>>>> 3937.929 0.748 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>>>> 606.631 0.626 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>>>> 609.565 0.416 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>>>> 294.204 0.280 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>>>> 296.411 0.223 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>>>> 261.519 0.181 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>>>> 260.435 0.115 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>>>> 48154.673 18.846 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>>>> 47793.868 17.615 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>>>> 5256.767 2.451 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>>>> 5325.377 2.388 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>>>> 773.541 0.330 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>>>> 774.513 0.574 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>>>> 317.232 0.134 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>>>> 316.189 0.238 ns/op
>>>>>
>>>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>>>> switched on (16 virtual cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>>>> 1910.576 29.256 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>>>> 5132.885 25.137 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>>>> 4811.572 52.072 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>>>> 1967.213 28.970 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>>>> 2004.501 31.554 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>>>> 1575.329 6.457 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>>>> 1957.714 27.815 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>>>> 1980.301 30.818 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>>>> 1589.120 8.449 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>>>> 43301.320 50.589 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>>>> 43574.129 55.272 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>>>> 5831.250 19.667 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>>>> 4823.096 13.180 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>>>> 1930.819 24.136 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>>>> 1625.806 10.385 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>>>> 1888.683 22.554 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>>>> 1581.979 6.322 ns/op
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>>>
>>>>>> The foreach over an array looks like it's supposed to compile to the
>>>>>> same thing:
>>>>>>
>>>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>>>
>>>>>> Same goes for .length which is supposed to be a final field which
>>>>>> would allow for inlining by the JIT I'd imagine (hence why we use final
>>>>>> everywhere):
>>>>>>
>>>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>>>
>>>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>
>>>>>>>> You can think that, but the testing in the testing I did at the
>>>>>>>> time the difference was quite noticeable. I would have left it as a
>>>>>>>> foreach if it wasn’t.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>>>> I don't think this code is saving anything.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Paul
>>>>>>>>
>>>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <
>>>>>>>> ***@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hm.. Why did I think it was configuration? I must have gotten
>>>>>>>>> mixed up with another commit email...
>>>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>>>
>>>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>>>
>>>>>>>>>> Ralph
>>>>>>>>>>
>>>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>>>
>>>>>>>>>> :-)
>>>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>>>> this.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I'd be curious to see the results!
>>>>>>>
>>>>>>> Gary
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>>>> in optimizing this loop.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach"
>>>>>>>>>>> comment mean?
>>>>>>>>>>>
>>>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>>>
>>>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>>>> Marker... localParents) {
>>>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>>>> if (marker == parent) {
>>>>>>>>>>> return true;
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> return false;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gary
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>>>> JUnit in Action, Second Edition
>>>>>>>>>>> <http://www.manning.com/tahchiev/>
>>>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>> Home: http://garygregory.com/
>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Sicker <***@gmail.com>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mikael Ståldal
>>>> Chief Software Architect
>>>> *Appear*
>>>> Phone: +46 8 545 91 572
>>>> Email: ***@appearnetworks.com
>>>>
>>>
>>>
>>>
>>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>
Mikael Ståldal
2014-09-26 15:34:18 UTC
Permalink
The byte code is actually different. This seems like a weakness of the JDK
Javac to me.

With standard loop:

private static boolean contains(org.apache.logging.log4j.Marker,
org.apache.logging.log4j.Marker...);
Code:
0: iconst_0
1: istore_2
2: aload_1
3: arraylength
4: istore_3
5: iload_2
6: iload_3
7: if_icmpge 29
10: aload_1
11: iload_2
12: aaload
13: astore 4
15: aload 4
17: aload_0
18: if_acmpne 23
21: iconst_1
22: ireturn
23: iinc 2, 1
26: goto 5
29: iconst_0
30: ireturn


With for-each:

private static boolean contains(org.apache.logging.log4j.Marker,
org.apache.logging.log4j.Marker...);
Code:
0: aload_1
1: astore_2
2: aload_2
3: arraylength
4: istore_3
5: iconst_0
6: istore 4
8: iload 4
10: iload_3
11: if_icmpge 34
14: aload_2
15: iload 4
17: aaload
18: astore 5
20: aload 5
22: aload_0
23: if_acmpne 28
26: iconst_1
27: ireturn
28: iinc 4, 1
31: goto 8
34: iconst_0
35: ireturn


On Fri, Sep 26, 2014 at 4:27 PM, Remko Popma <***@gmail.com> wrote:

> Nope. Maybe I'll get around that next week. If you have time to do that,
> please share!
>
> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
> ***@appearnetworks.com> wrote:
>
>> Have you compared the generated byte code (using javap -c) for the two
>> cases?
>>
>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>> wrote:
>>
>>> On Windows it looks like normal for loops are slightly faster than
>>> for-each loops, especially for small arrays of primitives. This could be
>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>> empty method invocation) is 12 nanos.
>>>
>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
>>> nanos and 1910 nanos respectively) that any difference we are seeing is
>>> just noise.
>>>
>>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>>> and 10 test iterations.
>>>
>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>> 12.432 0.550 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>> 2759.592 3.431 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>> 2761.729 3.127 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>> 292.880 1.065 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>> 288.751 1.101 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>> 41.826 0.870 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>> 36.894 0.782 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>> 22.393 0.618 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>> 17.146 0.560 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>> 31959.057 14.341 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>> 32461.985 14.353 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>> 3591.200 4.852 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>> 3445.998 4.010 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>> 438.207 1.923 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>> 439.576 2.139 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>> 58.957 1.247 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>> 60.712 1.284 ns/op
>>>
>>>
>>> // For loops for Object arrays are similar but return the total XOR of
>>> the element hashcodes.
>>>
>>> private int forEachLoop(final int[] array) {
>>> int result = 0;
>>> for (final int element : array) {
>>> result ^= element;
>>> }
>>> return result;
>>> }
>>>
>>> private int forLoop(final int[] array) {
>>> int result = 0;
>>> for (int i = 0; i < array.length; i++) {
>>> result ^= array[i];
>>> }
>>> return result;
>>> }
>>>
>>>
>>>
>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>> cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>> 255.300 0.201 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>> 3938.055 1.207 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>> 3937.929 0.748 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>> 606.631 0.626 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>> 609.565 0.416 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>> 294.204 0.280 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>> 296.411 0.223 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>> 261.519 0.181 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>> 260.435 0.115 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>> 48154.673 18.846 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>> 47793.868 17.615 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>> 5256.767 2.451 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>> 5325.377 2.388 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>> 773.541 0.330 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>> 774.513 0.574 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>> 317.232 0.134 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>> 316.189 0.238 ns/op
>>>
>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>> switched on (16 virtual cores)*
>>> Benchmark Mode Samples
>>> Score Score error Units
>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>> 1910.576 29.256 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>> 5132.885 25.137 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>> 4811.572 52.072 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>> 1967.213 28.970 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>> 2004.501 31.554 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>> 1575.329 6.457 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>> 1957.714 27.815 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>> 1980.301 30.818 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>> 1589.120 8.449 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>> 43301.320 50.589 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>> 43574.129 55.272 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>> 5831.250 19.667 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>> 4823.096 13.180 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>> 1930.819 24.136 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>> 1625.806 10.385 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>> 1888.683 22.554 ns/op
>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>> 1581.979 6.322 ns/op
>>>
>>>
>>>
>>>
>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>
>>>> The foreach over an array looks like it's supposed to compile to the
>>>> same thing:
>>>>
>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>
>>>> Same goes for .length which is supposed to be a final field which would
>>>> allow for inlining by the JIT I'd imagine (hence why we use final
>>>> everywhere):
>>>>
>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>
>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>> wrote:
>>>>
>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>> ***@dslextreme.com> wrote:
>>>>>
>>>>>> You can think that, but the testing in the testing I did at the time
>>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>>> it wasn’t.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>> I don't think this code is saving anything.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Paul
>>>>>>
>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>>> up with another commit email...
>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>
>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>
>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>
>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>
>>>>>>>> :-)
>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>> this.)
>>>>>>>>
>>>>>>>>
>>>>> I'd be curious to see the results!
>>>>>
>>>>> Gary
>>>>>
>>>>>
>>>>>>
>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>> in optimizing this loop.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>>>> mean?
>>>>>>>>>
>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>
>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>> Marker... localParents) {
>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>> if (marker == parent) {
>>>>>>>>> return true;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>> return false;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gary
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>> Java Persistence with Hibernate, Second Edition
>>>>> <http://www.manning.com/bauer3/>
>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>> Blog: http://garygregory.wordpress.com
>>>>> Home: http://garygregory.com/
>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>>>
>>>
>>>
>>
>>
>> --
>> Mikael Ståldal
>> Chief Software Architect
>> *Appear*
>> Phone: +46 8 545 91 572
>> Email: ***@appearnetworks.com
>>
>
>


--
Mikael Ståldal
Chief Software Architect
*Appear*
Phone: +46 8 545 91 572
Email: ***@appearnetworks.com
Paul Benedict
2014-09-26 15:59:52 UTC
Permalink
My recommendation is to write the core openjdk making list. Raise this
issue and see what they say about your findings... because there is not
supposed to be any performance impact using foreach.
On Sep 26, 2014 10:34 AM, "Mikael Ståldal" <
***@appearnetworks.com> wrote:

> The byte code is actually different. This seems like a weakness of the JDK
> Javac to me.
>
> With standard loop:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: iconst_0
> 1: istore_2
> 2: aload_1
> 3: arraylength
> 4: istore_3
> 5: iload_2
> 6: iload_3
> 7: if_icmpge 29
> 10: aload_1
> 11: iload_2
> 12: aaload
> 13: astore 4
> 15: aload 4
> 17: aload_0
> 18: if_acmpne 23
> 21: iconst_1
> 22: ireturn
> 23: iinc 2, 1
> 26: goto 5
> 29: iconst_0
> 30: ireturn
>
>
> With for-each:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: aload_1
> 1: astore_2
> 2: aload_2
> 3: arraylength
> 4: istore_3
> 5: iconst_0
> 6: istore 4
> 8: iload 4
> 10: iload_3
> 11: if_icmpge 34
> 14: aload_2
> 15: iload 4
> 17: aaload
> 18: astore 5
> 20: aload 5
> 22: aload_0
> 23: if_acmpne 28
> 26: iconst_1
> 27: ireturn
> 28: iinc 4, 1
> 31: goto 8
> 34: iconst_0
> 35: ireturn
>
>
> On Fri, Sep 26, 2014 at 4:27 PM, Remko Popma <***@gmail.com>
> wrote:
>
>> Nope. Maybe I'll get around that next week. If you have time to do that,
>> please share!
>>
>> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
>> ***@appearnetworks.com> wrote:
>>
>>> Have you compared the generated byte code (using javap -c) for the two
>>> cases?
>>>
>>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>>> wrote:
>>>
>>>> On Windows it looks like normal for loops are slightly faster than
>>>> for-each loops, especially for small arrays of primitives. This could be
>>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>>> empty method invocation) is 12 nanos.
>>>>
>>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large
>>>> (255 nanos and 1910 nanos respectively) that any difference we are seeing
>>>> is just noise.
>>>>
>>>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>>>> and 10 test iterations.
>>>>
>>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>>> 12.432 0.550 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>>> 2759.592 3.431 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>>> 2761.729 3.127 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>>> 292.880 1.065 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>>> 288.751 1.101 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>>> 41.826 0.870 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>>> 36.894 0.782 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>>> 22.393 0.618 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>>> 17.146 0.560 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>>> 31959.057 14.341 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>>> 32461.985 14.353 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>>> 3591.200 4.852 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>>> 3445.998 4.010 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>>> 438.207 1.923 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>>> 439.576 2.139 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>>> 58.957 1.247 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>>> 60.712 1.284 ns/op
>>>>
>>>>
>>>> // For loops for Object arrays are similar but return the total XOR of
>>>> the element hashcodes.
>>>>
>>>> private int forEachLoop(final int[] array) {
>>>> int result = 0;
>>>> for (final int element : array) {
>>>> result ^= element;
>>>> }
>>>> return result;
>>>> }
>>>>
>>>> private int forLoop(final int[] array) {
>>>> int result = 0;
>>>> for (int i = 0; i < array.length; i++) {
>>>> result ^= array[i];
>>>> }
>>>> return result;
>>>> }
>>>>
>>>>
>>>>
>>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>>> cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>>> 255.300 0.201 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>>> 3938.055 1.207 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>>> 3937.929 0.748 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>>> 606.631 0.626 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>>> 609.565 0.416 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>>> 294.204 0.280 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>>> 296.411 0.223 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>>> 261.519 0.181 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>>> 260.435 0.115 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>>> 48154.673 18.846 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>>> 47793.868 17.615 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>>> 5256.767 2.451 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>>> 5325.377 2.388 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>>> 773.541 0.330 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>>> 774.513 0.574 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>>> 317.232 0.134 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>>> 316.189 0.238 ns/op
>>>>
>>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>>> switched on (16 virtual cores)*
>>>> Benchmark Mode Samples
>>>> Score Score error Units
>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>>> 1910.576 29.256 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>>> 5132.885 25.137 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>>> 4811.572 52.072 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>>> 1967.213 28.970 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>>> 2004.501 31.554 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>>> 1575.329 6.457 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>>> 1957.714 27.815 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>>> 1980.301 30.818 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>>> 1589.120 8.449 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>>> 43301.320 50.589 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>>> 43574.129 55.272 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>>> 5831.250 19.667 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>>> 4823.096 13.180 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>>> 1930.819 24.136 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>>> 1625.806 10.385 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>>> 1888.683 22.554 ns/op
>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>>> 1581.979 6.322 ns/op
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>>
>>>>> The foreach over an array looks like it's supposed to compile to the
>>>>> same thing:
>>>>>
>>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>>
>>>>> Same goes for .length which is supposed to be a final field which
>>>>> would allow for inlining by the JIT I'd imagine (hence why we use final
>>>>> everywhere):
>>>>>
>>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>>
>>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>>> ***@dslextreme.com> wrote:
>>>>>>
>>>>>>> You can think that, but the testing in the testing I did at the time
>>>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>>>> it wasn’t.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>>> I don't think this code is saving anything.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Paul
>>>>>>>
>>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>>>> up with another commit email...
>>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>>
>>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>>
>>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>>
>>>>>>>>> Ralph
>>>>>>>>>
>>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>>
>>>>>>>>> :-)
>>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>>> this.)
>>>>>>>>>
>>>>>>>>>
>>>>>> I'd be curious to see the results!
>>>>>>
>>>>>> Gary
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>>> in optimizing this loop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach"
>>>>>>>>>> comment mean?
>>>>>>>>>>
>>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>>
>>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>>> Marker... localParents) {
>>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>>> if (marker == parent) {
>>>>>>>>>> return true;
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> return false;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Gary
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>>> JUnit in Action, Second Edition
>>>>>>>>>> <http://www.manning.com/tahchiev/>
>>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>> <http://www.manning.com/bauer3/>
>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>> Blog: http://garygregory.wordpress.com
>>>>>> Home: http://garygregory.com/
>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Sicker <***@gmail.com>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Mikael Ståldal
>>> Chief Software Architect
>>> *Appear*
>>> Phone: +46 8 545 91 572
>>> Email: ***@appearnetworks.com
>>>
>>
>>
>
>
> --
> Mikael Ståldal
> Chief Software Architect
> *Appear*
> Phone: +46 8 545 91 572
> Email: ***@appearnetworks.com
>
Ralph Goers
2014-09-26 17:28:29 UTC
Permalink
While true, if it is coded as an old-style for loop their is nothing really to be gained by converting it to a foreach except for making people happy that it uses the newer syntax. Even if Java is changed so that they are equivalent it won’t be faster.

Ralph

On Sep 26, 2014, at 8:59 AM, Paul Benedict <***@apache.org> wrote:

> My recommendation is to write the core openjdk making list. Raise this issue and see what they say about your findings... because there is not supposed to be any performance impact using foreach.
>
> On Sep 26, 2014 10:34 AM, "Mikael Ståldal" <***@appearnetworks.com> wrote:
> The byte code is actually different. This seems like a weakness of the JDK Javac to me.
>
> With standard loop:
>
> private static boolean contains(org.apache.logging.log4j.Marker, org.apache.logging.log4j.Marker...);
> Code:
> 0: iconst_0
> 1: istore_2
> 2: aload_1
> 3: arraylength
> 4: istore_3
> 5: iload_2
> 6: iload_3
> 7: if_icmpge 29
> 10: aload_1
> 11: iload_2
> 12: aaload
> 13: astore 4
> 15: aload 4
> 17: aload_0
> 18: if_acmpne 23
> 21: iconst_1
> 22: ireturn
> 23: iinc 2, 1
> 26: goto 5
> 29: iconst_0
> 30: ireturn
>
>
> With for-each:
>
> private static boolean contains(org.apache.logging.log4j.Marker, org.apache.logging.log4j.Marker...);
> Code:
> 0: aload_1
> 1: astore_2
> 2: aload_2
> 3: arraylength
> 4: istore_3
> 5: iconst_0
> 6: istore 4
> 8: iload 4
> 10: iload_3
> 11: if_icmpge 34
> 14: aload_2
> 15: iload 4
> 17: aaload
> 18: astore 5
> 20: aload 5
> 22: aload_0
> 23: if_acmpne 28
> 26: iconst_1
> 27: ireturn
> 28: iinc 4, 1
> 31: goto 8
> 34: iconst_0
> 35: ireturn
>
>
> On Fri, Sep 26, 2014 at 4:27 PM, Remko Popma <***@gmail.com> wrote:
> Nope. Maybe I'll get around that next week. If you have time to do that, please share!
>
> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <***@appearnetworks.com> wrote:
> Have you compared the generated byte code (using javap -c) for the two cases?
>
> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com> wrote:
> On Windows it looks like normal for loops are slightly faster than for-each loops, especially for small arrays of primitives. This could be noise, since we are talking about 5 nanoseconds where the baseline (an empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255 nanos and 1910 nanos respectively) that any difference we are seeing is just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations and 10 test iterations.
>
> Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165 316.189 0.238 ns/op
>
> 64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
> The foreach over an array looks like it's supposed to compile to the same thing:
>
> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>
> Same goes for .length which is supposed to be a final field which would allow for inlining by the JIT I'd imagine (hence why we use final everywhere):
>
> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>
> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com> wrote:
> You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.
>
> Ralph
>
> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>
>> I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.
>>
>>
>> Cheers,
>> Paul
>>
>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
>> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
>> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>>
>> Ralph
>>
>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>
>
> I'd be curious to see the results!
>
> Gary
>
>>>
>>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>>
>>>>
>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> JUnit in Action, Second Edition
>>>> Spring Batch in Action
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>
>>
>>
>
>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> JUnit in Action, Second Edition
> Spring Batch in Action
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
>
>
>
> --
> Mikael Ståldal
> Chief Software Architect
> Appear
> Phone: +46 8 545 91 572
> Email: ***@appearnetworks.com
>
>
>
>
> --
> Mikael Ståldal
> Chief Software Architect
> Appear
> Phone: +46 8 545 91 572
> Email: ***@appearnetworks.com
Remko Popma
2014-09-27 14:29:50 UTC
Permalink
I think we have consensus that in this performance-sensitive code it is
better to be safe and use a normal for loop.
I've taken the liberty to make the change in master.

On Sat, Sep 27, 2014 at 2:28 AM, Ralph Goers <***@dslextreme.com>
wrote:

> While true, if it is coded as an old-style for loop their is nothing
> really to be gained by converting it to a foreach except for making people
> happy that it uses the newer syntax. Even if Java is changed so that they
> are equivalent it won’t be faster.
>
> Ralph
>
> On Sep 26, 2014, at 8:59 AM, Paul Benedict <***@apache.org> wrote:
>
> My recommendation is to write the core openjdk making list. Raise this
> issue and see what they say about your findings... because there is not
> supposed to be any performance impact using foreach.
> On Sep 26, 2014 10:34 AM, "Mikael Ståldal" <
> ***@appearnetworks.com> wrote:
>
>> The byte code is actually different. This seems like a weakness of the
>> JDK Javac to me.
>>
>> With standard loop:
>>
>> private static boolean contains(org.apache.logging.log4j.Marker,
>> org.apache.logging.log4j.Marker...);
>> Code:
>> 0: iconst_0
>> 1: istore_2
>> 2: aload_1
>> 3: arraylength
>> 4: istore_3
>> 5: iload_2
>> 6: iload_3
>> 7: if_icmpge 29
>> 10: aload_1
>> 11: iload_2
>> 12: aaload
>> 13: astore 4
>> 15: aload 4
>> 17: aload_0
>> 18: if_acmpne 23
>> 21: iconst_1
>> 22: ireturn
>> 23: iinc 2, 1
>> 26: goto 5
>> 29: iconst_0
>> 30: ireturn
>>
>>
>> With for-each:
>>
>> private static boolean contains(org.apache.logging.log4j.Marker,
>> org.apache.logging.log4j.Marker...);
>> Code:
>> 0: aload_1
>> 1: astore_2
>> 2: aload_2
>> 3: arraylength
>> 4: istore_3
>> 5: iconst_0
>> 6: istore 4
>> 8: iload 4
>> 10: iload_3
>> 11: if_icmpge 34
>> 14: aload_2
>> 15: iload 4
>> 17: aaload
>> 18: astore 5
>> 20: aload 5
>> 22: aload_0
>> 23: if_acmpne 28
>> 26: iconst_1
>> 27: ireturn
>> 28: iinc 4, 1
>> 31: goto 8
>> 34: iconst_0
>> 35: ireturn
>>
>>
>> On Fri, Sep 26, 2014 at 4:27 PM, Remko Popma <***@gmail.com>
>> wrote:
>>
>>> Nope. Maybe I'll get around that next week. If you have time to do that,
>>> please share!
>>>
>>> On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <
>>> ***@appearnetworks.com> wrote:
>>>
>>>> Have you compared the generated byte code (using javap -c) for the two
>>>> cases?
>>>>
>>>> On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com>
>>>> wrote:
>>>>
>>>>> On Windows it looks like normal for loops are slightly faster than
>>>>> for-each loops, especially for small arrays of primitives. This could be
>>>>> noise, since we are talking about 5 nanoseconds where the baseline (an
>>>>> empty method invocation) is 12 nanos.
>>>>>
>>>>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large
>>>>> (255 nanos and 1910 nanos respectively) that any difference we are seeing
>>>>> is just noise.
>>>>>
>>>>> All benchmarks were run with one fork, one thread, 10 warmup
>>>>> iterations and 10 test iterations.
>>>>>
>>>>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU
>>>>> @1.70Ghz with hyperthreading switched on (4 virtual cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>>>>> 12.432 0.550 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>>>>> 2759.592 3.431 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>>>>> 2761.729 3.127 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>>>>> 292.880 1.065 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>>>>> 288.751 1.101 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>>>>> 41.826 0.870 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>>>>> 36.894 0.782 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>>>>> 22.393 0.618 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>>>>> 17.146 0.560 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>>>>> 31959.057 14.341 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>>>>> 32461.985 14.353 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>>>>> 3591.200 4.852 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>>>>> 3445.998 4.010 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>>>>> 438.207 1.923 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>>>>> 439.576 2.139 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>>>>> 58.957 1.247 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>>>>> 60.712 1.284 ns/op
>>>>>
>>>>>
>>>>> // For loops for Object arrays are similar but return the total XOR of
>>>>> the element hashcodes.
>>>>>
>>>>> private int forEachLoop(final int[] array) {
>>>>> int result = 0;
>>>>> for (final int element : array) {
>>>>> result ^= element;
>>>>> }
>>>>> return result;
>>>>> }
>>>>>
>>>>> private int forLoop(final int[] array) {
>>>>> int result = 0;
>>>>> for (int i = 0; i < array.length; i++) {
>>>>> result ^= array[i];
>>>>> }
>>>>> return result;
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>>>>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>>>>> cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>>>>> 255.300 0.201 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>>>>> 3938.055 1.207 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>>>>> 3937.929 0.748 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>>>>> 606.631 0.626 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>>>>> 609.565 0.416 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>>>>> 294.204 0.280 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>>>>> 296.411 0.223 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>>>>> 261.519 0.181 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>>>>> 260.435 0.115 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>>>>> 48154.673 18.846 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>>>>> 47793.868 17.615 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>>>>> 5256.767 2.451 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>>>>> 5325.377 2.388 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>>>>> 773.541 0.330 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>>>>> 774.513 0.574 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>>>>> 317.232 0.134 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>>>>> 316.189 0.238 ns/op
>>>>>
>>>>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>>>>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>>>>> switched on (16 virtual cores)*
>>>>> Benchmark Mode Samples
>>>>> Score Score error Units
>>>>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>>>>> 1910.576 29.256 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>>>>> 5132.885 25.137 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>>>>> 4811.572 52.072 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>>>>> 1967.213 28.970 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>>>>> 2004.501 31.554 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>>>>> 1575.329 6.457 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>>>>> 1957.714 27.815 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>>>>> 1980.301 30.818 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>>>>> 1589.120 8.449 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>>>>> 43301.320 50.589 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>>>>> 43574.129 55.272 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>>>>> 5831.250 19.667 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>>>>> 4823.096 13.180 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>>>>> 1930.819 24.136 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>>>>> 1625.806 10.385 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>>>>> 1888.683 22.554 ns/op
>>>>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>>>>> 1581.979 6.322 ns/op
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>>>>
>>>>>> The foreach over an array looks like it's supposed to compile to the
>>>>>> same thing:
>>>>>>
>>>>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>>>>
>>>>>> Same goes for .length which is supposed to be a final field which
>>>>>> would allow for inlining by the JIT I'd imagine (hence why we use final
>>>>>> everywhere):
>>>>>>
>>>>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>>>>
>>>>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>
>>>>>>>> You can think that, but the testing in the testing I did at the
>>>>>>>> time the difference was quite noticeable. I would have left it as a
>>>>>>>> foreach if it wasn’t.
>>>>>>>>
>>>>>>>> Ralph
>>>>>>>>
>>>>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I would be surprised if foreach over an array makes a speed
>>>>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>>>>> I don't think this code is saving anything.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Paul
>>>>>>>>
>>>>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <
>>>>>>>> ***@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hm.. Why did I think it was configuration? I must have gotten
>>>>>>>>> mixed up with another commit email...
>>>>>>>>> The class is MarkerManager in log4j-api.
>>>>>>>>>
>>>>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>>>>> ***@dslextreme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Configuration? If I recall correctly this method is called on
>>>>>>>>>> every log event that contains a Marker. But I am just guessing since Gary
>>>>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>>>>
>>>>>>>>>> Ralph
>>>>>>>>>>
>>>>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>>>>
>>>>>>>>>> :-)
>>>>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>>>>> this.)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>> I'd be curious to see the results!
>>>>>>>
>>>>>>> Gary
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>>>>> in optimizing this loop.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach"
>>>>>>>>>>> comment mean?
>>>>>>>>>>>
>>>>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>>>>
>>>>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>>>>> Marker... localParents) {
>>>>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>>>>> final Marker marker = localParents[i];
>>>>>>>>>>> if (marker == parent) {
>>>>>>>>>>> return true;
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> return false;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gary
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>>>>> JUnit in Action, Second Edition
>>>>>>>>>>> <http://www.manning.com/tahchiev/>
>>>>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>>>>> Home: http://garygregory.com/
>>>>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>> Home: http://garygregory.com/
>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Sicker <***@gmail.com>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Mikael Ståldal
>>>> Chief Software Architect
>>>> *Appear*
>>>> Phone: +46 8 545 91 572
>>>> Email: ***@appearnetworks.com
>>>>
>>>
>>>
>>
>>
>> --
>> Mikael Ståldal
>> Chief Software Architect
>> *Appear*
>> Phone: +46 8 545 91 572
>> Email: ***@appearnetworks.com
>>
>
>
Paul Benedict
2014-09-29 14:29:13 UTC
Permalink
Open JDKers, I am forwarding an email to get some clarification. It's been
a common understanding that foreach should perform no differently than the
equivalent for-loop . However, some fellow developers claim there is a
noticable difference in their microbenchmarking. Can you help explain what
is really going on? It's either the case there is a true difference (a
result that would surprise me) or the results are within a margin of error
that make the results insignificant. Please advise.

Cheers,
Paul

---------- Forwarded message ----------
From: Remko Popma <***@gmail.com>
Date: Fri, Sep 26, 2014 at 8:43 AM
Subject: Re: No for each loop comment?
To: Log4J Developers List <log4j-***@logging.apache.org>


On Windows it looks like normal for loops are slightly faster than for-each
loops, especially for small arrays of primitives. This could be noise,
since we are talking about 5 nanoseconds where the baseline (an empty
method invocation) is 12 nanos.

On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
nanos and 1910 nanos respectively) that any difference we are seeing is
just noise.

All benchmarks were run with one fork, one thread, 10 warmup iterations and
10 test iterations.

*Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
with hyperthreading switched on (4 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
12.432 0.550 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
2759.592 3.431 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
2761.729 3.127 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
292.880 1.065 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
288.751 1.101 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
41.826 0.870 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
36.894 0.782 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
22.393 0.618 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
17.146 0.560 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
31959.057 14.341 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
32461.985 14.353 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
3591.200 4.852 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
3445.998 4.010 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
438.207 1.923 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
439.576 2.139 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
58.957 1.247 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
60.712 1.284 ns/op


// For loops for Object arrays are similar but return the total XOR of the
element hashcodes.

private int forEachLoop(final int[] array) {
int result = 0;
for (final int element : array) {
result ^= element;
}
return result;
}

private int forLoop(final int[] array) {
int result = 0;
for (int i = 0; i < array.length; i++) {
result ^= array[i];
}
return result;
}



*Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon
X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
255.300 0.201 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
3938.055 1.207 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
3937.929 0.748 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
606.631 0.626 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
609.565 0.416 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
294.204 0.280 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
296.411 0.223 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
261.519 0.181 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
260.435 0.115 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
48154.673 18.846 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
47793.868 17.615 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
5256.767 2.451 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
5325.377 2.388 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
773.541 0.330 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
774.513 0.574 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
317.232 0.134 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
316.189 0.238 ns/op

*64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle
Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched
on (16 virtual cores)*
Benchmark Mode Samples
Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
1910.576 29.256 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
5132.885 25.137 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
4811.572 52.072 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
1967.213 28.970 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
2004.501 31.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
1575.329 6.457 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
1957.714 27.815 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
1980.301 30.818 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
1589.120 8.449 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
43301.320 50.589 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
43574.129 55.272 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
5831.250 19.667 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
4823.096 13.180 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
1930.819 24.136 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
1625.806 10.385 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
1888.683 22.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
1581.979 6.322 ns/op




On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:

> The foreach over an array looks like it's supposed to compile to the same
> thing:
>
> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>
> Same goes for .length which is supposed to be a final field which would
> allow for inlining by the JIT I'd imagine (hence why we use final
> everywhere):
>
> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>
>
>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>> wrote:
>>>>
>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>> mean?
>>>>>
>>>>> Why not for an enhanced for each loop?
>>>>>
>>>>> private static boolean contains(final Marker parent, final Marker...
>>>>> localParents) {
>>>>> //noinspection ForLoopReplaceableByForEach
>>>>> for (int i = 0, localParentsLength = localParents.length;
>>>>> i < localParentsLength; i++) {
>>>>> final Marker marker = localParents[i];
>>>>> if (marker == parent) {
>>>>> return true;
>>>>> }
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Gary
>>>>>
>>>>>
>>>>>
Andrew Haley
2014-09-29 15:21:55 UTC
Permalink
On 09/29/2014 03:29 PM, Paul Benedict wrote:
> Open JDKers, I am forwarding an email to get some clarification. It's been
> a common understanding that foreach should perform no differently than the
> equivalent for-loop . However, some fellow developers claim there is a
> noticable difference in their microbenchmarking. Can you help explain what
> is really going on? It's either the case there is a true difference (a
> result that would surprise me) or the results are within a margin of error
> that make the results insignificant. Please advise.

The actual code that such a forEach loop generates is this:

private int forLoop(final int[] array) {
int result = 0;
int[] a = array;
int len = a.length;
for (int i = 0; i < len; i++) {
int element = a[i];
result ^= element;
}
return result;
}

If you get different timings for this one, then the measurements are
suspect.

Java microbenchmarking is notoriously difficult. Please try to use
jmh; you'll get better and easier to interpret results.

Andrew.


http://openjdk.java.net/projects/code-tools/jmh/
Vitaly Davidovich
2014-09-29 16:31:11 UTC
Permalink
I think Paul's email already has jmh output.

I looked at the generated asm on 7u60 x64 linux, and didn't see any
material difference.

Paul, have you or anyone else looked at the machine code diffs between the
two? Looking at timing is useful, but it's possible to get caught up in
noise; the generated assembly should provide more conclusive data on
whether any real difference is there or not.

On Mon, Sep 29, 2014 at 11:21 AM, Andrew Haley <aph-H+wXaHxf7aLQT0dZR+***@public.gmane.org> wrote:

> On 09/29/2014 03:29 PM, Paul Benedict wrote:
> > Open JDKers, I am forwarding an email to get some clarification. It's
> been
> > a common understanding that foreach should perform no differently than
> the
> > equivalent for-loop . However, some fellow developers claim there is a
> > noticable difference in their microbenchmarking. Can you help explain
> what
> > is really going on? It's either the case there is a true difference (a
> > result that would surprise me) or the results are within a margin of
> error
> > that make the results insignificant. Please advise.
>
> The actual code that such a forEach loop generates is this:
>
> private int forLoop(final int[] array) {
> int result = 0;
> int[] a = array;
> int len = a.length;
> for (int i = 0; i < len; i++) {
> int element = a[i];
> result ^= element;
> }
> return result;
> }
>
> If you get different timings for this one, then the measurements are
> suspect.
>
> Java microbenchmarking is notoriously difficult. Please try to use
> jmh; you'll get better and easier to interpret results.
>
> Andrew.
>
>
> http://openjdk.java.net/projects/code-tools/jmh/
>
>
Andrew Haley
2014-09-29 16:48:39 UTC
Permalink
On 09/29/2014 05:31 PM, Vitaly Davidovich wrote:
> I think Paul's email already has jmh output.

Oh duh, sorry Paul.

Andrew.
Paul Benedict
2014-09-29 17:58:48 UTC
Permalink
Bytecode output courtesy of Mikael Ståldal:

With standard loop:

private static boolean contains(org.apache.logging.log4j.Marker,
org.apache.logging.log4j.Marker...);
Code:
0: iconst_0
1: istore_2
2: aload_1
3: arraylength
4: istore_3
5: iload_2
6: iload_3
7: if_icmpge 29
10: aload_1
11: iload_2
12: aaload
13: astore 4
15: aload 4
17: aload_0
18: if_acmpne 23
21: iconst_1
22: ireturn
23: iinc 2, 1
26: goto 5
29: iconst_0
30: ireturn


With for-each:

private static boolean contains(org.apache.logging.log4j.Marker,
org.apache.logging.log4j.Marker...);
Code:
0: aload_1
1: astore_2
2: aload_2
3: arraylength
4: istore_3
5: iconst_0
6: istore 4
8: iload 4
10: iload_3
11: if_icmpge 34
14: aload_2
15: iload 4
17: aaload
18: astore 5
20: aload 5
22: aload_0
23: if_acmpne 28
26: iconst_1
27: ireturn
28: iinc 4, 1
31: goto 8
34: iconst_0
35: ireturn



Cheers,
Paul

On Mon, Sep 29, 2014 at 11:31 AM, Vitaly Davidovich <***@gmail.com>
wrote:

> I think Paul's email already has jmh output.
>
> I looked at the generated asm on 7u60 x64 linux, and didn't see any
> material difference.
>
> Paul, have you or anyone else looked at the machine code diffs between the
> two? Looking at timing is useful, but it's possible to get caught up in
> noise; the generated assembly should provide more conclusive data on
> whether any real difference is there or not.
>
> On Mon, Sep 29, 2014 at 11:21 AM, Andrew Haley <***@redhat.com> wrote:
>
>> On 09/29/2014 03:29 PM, Paul Benedict wrote:
>> > Open JDKers, I am forwarding an email to get some clarification. It's
>> been
>> > a common understanding that foreach should perform no differently than
>> the
>> > equivalent for-loop . However, some fellow developers claim there is a
>> > noticable difference in their microbenchmarking. Can you help explain
>> what
>> > is really going on? It's either the case there is a true difference (a
>> > result that would surprise me) or the results are within a margin of
>> error
>> > that make the results insignificant. Please advise.
>>
>> The actual code that such a forEach loop generates is this:
>>
>> private int forLoop(final int[] array) {
>> int result = 0;
>> int[] a = array;
>> int len = a.length;
>> for (int i = 0; i < len; i++) {
>> int element = a[i];
>> result ^= element;
>> }
>> return result;
>> }
>>
>> If you get different timings for this one, then the measurements are
>> suspect.
>>
>> Java microbenchmarking is notoriously difficult. Please try to use
>> jmh; you'll get better and easier to interpret results.
>>
>> Andrew.
>>
>>
>> http://openjdk.java.net/projects/code-tools/jmh/
>>
>>
>
Vitaly Davidovich
2014-09-29 18:03:39 UTC
Permalink
Bytecode isn't that interesting when discussing peak performance of jit'd
code. Do you have assembly dumps?

The only noteworthy aspect of the bytecode is that the enhanced for loop
version is slightly bigger, and combined with other code in some method,
may inhibit inlining.

Speaking of which, have you tried running the jmh benchmarks with tiered
compilation disabled? If not, please do as it may introduce variance/noise.

Sent from my phone
On Sep 29, 2014 1:58 PM, "Paul Benedict" <pbenedict-1oDqGaOF3Lkdnm+***@public.gmane.org> wrote:

> Bytecode output courtesy of Mikael Ståldal:
>
> With standard loop:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: iconst_0
> 1: istore_2
> 2: aload_1
> 3: arraylength
> 4: istore_3
> 5: iload_2
> 6: iload_3
> 7: if_icmpge 29
> 10: aload_1
> 11: iload_2
> 12: aaload
> 13: astore 4
> 15: aload 4
> 17: aload_0
> 18: if_acmpne 23
> 21: iconst_1
> 22: ireturn
> 23: iinc 2, 1
> 26: goto 5
> 29: iconst_0
> 30: ireturn
>
>
> With for-each:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: aload_1
> 1: astore_2
> 2: aload_2
> 3: arraylength
> 4: istore_3
> 5: iconst_0
> 6: istore 4
> 8: iload 4
> 10: iload_3
> 11: if_icmpge 34
> 14: aload_2
> 15: iload 4
> 17: aaload
> 18: astore 5
> 20: aload 5
> 22: aload_0
> 23: if_acmpne 28
> 26: iconst_1
> 27: ireturn
> 28: iinc 4, 1
> 31: goto 8
> 34: iconst_0
> 35: ireturn
>
>
>
> Cheers,
> Paul
>
> On Mon, Sep 29, 2014 at 11:31 AM, Vitaly Davidovich <vitalyd-***@public.gmane.org>
> wrote:
>
>> I think Paul's email already has jmh output.
>>
>> I looked at the generated asm on 7u60 x64 linux, and didn't see any
>> material difference.
>>
>> Paul, have you or anyone else looked at the machine code diffs between
>> the two? Looking at timing is useful, but it's possible to get caught up in
>> noise; the generated assembly should provide more conclusive data on
>> whether any real difference is there or not.
>>
>> On Mon, Sep 29, 2014 at 11:21 AM, Andrew Haley <aph-H+wXaHxf7aLQT0dZR+***@public.gmane.org> wrote:
>>
>>> On 09/29/2014 03:29 PM, Paul Benedict wrote:
>>> > Open JDKers, I am forwarding an email to get some clarification. It's
>>> been
>>> > a common understanding that foreach should perform no differently than
>>> the
>>> > equivalent for-loop . However, some fellow developers claim there is a
>>> > noticable difference in their microbenchmarking. Can you help explain
>>> what
>>> > is really going on? It's either the case there is a true difference (a
>>> > result that would surprise me) or the results are within a margin of
>>> error
>>> > that make the results insignificant. Please advise.
>>>
>>> The actual code that such a forEach loop generates is this:
>>>
>>> private int forLoop(final int[] array) {
>>> int result = 0;
>>> int[] a = array;
>>> int len = a.length;
>>> for (int i = 0; i < len; i++) {
>>> int element = a[i];
>>> result ^= element;
>>> }
>>> return result;
>>> }
>>>
>>> If you get different timings for this one, then the measurements are
>>> suspect.
>>>
>>> Java microbenchmarking is notoriously difficult. Please try to use
>>> jmh; you'll get better and easier to interpret results.
>>>
>>> Andrew.
>>>
>>>
>>> http://openjdk.java.net/projects/code-tools/jmh/
>>>
>>>
>>
>
Chris Newland
2014-10-03 17:04:55 UTC
Permalink
Hi Paul,

I've created a tool called JITWatch[1] which might be useful here. It
parses the LogCompilation output, shows the source/bytecode/assembly and
can highlight inlining failures due to exceeding the thresholds.

If you could point me to the code you're having trouble with I'd be happy
to run it through the tool and let you know the findings.

Kind regards,

Chris

[1] https://github.com/AdoptOpenJDK/jitwatch


On Mon, September 29, 2014 19:03, Vitaly Davidovich wrote:
> Bytecode isn't that interesting when discussing peak performance of jit'd
> code. Do you have assembly dumps?
>
> The only noteworthy aspect of the bytecode is that the enhanced for loop
> version is slightly bigger, and combined with other code in some method,
> may inhibit inlining.
>
> Speaking of which, have you tried running the jmh benchmarks with tiered
> compilation disabled? If not, please do as it may introduce
> variance/noise.
>
> Sent from my phone
> On Sep 29, 2014 1:58 PM, "Paul Benedict" <pbenedict-1oDqGaOF3Lkdnm+***@public.gmane.org> wrote:
>
>
>> Bytecode output courtesy of Mikael Ståldal:
>>
>>
>> With standard loop:
>>
>>
>> private static boolean contains(org.apache.logging.log4j.Marker,
>> org.apache.logging.log4j.Marker...); Code:
>> 0: iconst_0
>> 1: istore_2
>> 2: aload_1
>> 3: arraylength
>> 4: istore_3
>> 5: iload_2
>> 6: iload_3
>> 7: if_icmpge 29
>> 10: aload_1
>> 11: iload_2
>> 12: aaload
>> 13: astore 4
>> 15: aload 4
>> 17: aload_0
>> 18: if_acmpne 23
>> 21: iconst_1
>> 22: ireturn
>> 23: iinc 2, 1
>> 26: goto 5
>> 29: iconst_0
>> 30: ireturn
>>
>>
>>
>> With for-each:
>>
>>
>> private static boolean contains(org.apache.logging.log4j.Marker,
>> org.apache.logging.log4j.Marker...); Code:
>> 0: aload_1
>> 1: astore_2
>> 2: aload_2
>> 3: arraylength
>> 4: istore_3
>> 5: iconst_0
>> 6: istore 4
>> 8: iload 4
>> 10: iload_3
>> 11: if_icmpge 34
>> 14: aload_2
>> 15: iload 4
>> 17: aaload
>> 18: astore 5
>> 20: aload 5
>> 22: aload_0
>> 23: if_acmpne 28
>> 26: iconst_1
>> 27: ireturn
>> 28: iinc 4, 1
>> 31: goto 8
>> 34: iconst_0
>> 35: ireturn
>>
>>
>>
>>
>> Cheers,
>> Paul
>>
>>
>> On Mon, Sep 29, 2014 at 11:31 AM, Vitaly Davidovich <vitalyd-***@public.gmane.org>
>> wrote:
>>
>>
>>> I think Paul's email already has jmh output.
>>>
>>>
>>> I looked at the generated asm on 7u60 x64 linux, and didn't see any
>>> material difference.
>>>
>>> Paul, have you or anyone else looked at the machine code diffs
>>> between the two? Looking at timing is useful, but it's possible to get
>>> caught up in noise; the generated assembly should provide more
>>> conclusive data on whether any real difference is there or not.
>>>
>>> On Mon, Sep 29, 2014 at 11:21 AM, Andrew Haley <aph-H+wXaHxf7aLQT0dZR+***@public.gmane.org>
>>> wrote:
>>>
>>>
>>>> On 09/29/2014 03:29 PM, Paul Benedict wrote:
>>>>
>>>>> Open JDKers, I am forwarding an email to get some clarification.
>>>>> It's
>>>>>
>>>> been
>>>>> a common understanding that foreach should perform no differently
>>>>> than
>>>> the
>>>>> equivalent for-loop . However, some fellow developers claim there
>>>>> is a noticable difference in their microbenchmarking. Can you help
>>>>> explain
>>>> what
>>>>> is really going on? It's either the case there is a true
>>>>> difference (a result that would surprise me) or the results are
>>>>> within a margin of
>>>> error
>>>>> that make the results insignificant. Please advise.
>>>>
>>>> The actual code that such a forEach loop generates is this:
>>>>
>>>>
>>>> private int forLoop(final int[] array) { int result = 0; int[] a =
>>>> array; int len = a.length; for (int i = 0; i < len; i++) { int element
>>>> = a[i];
>>>> result ^= element; }
>>>> return result; }
>>>>
>>>>
>>>> If you get different timings for this one, then the measurements
>>>> are suspect.
>>>>
>>>> Java microbenchmarking is notoriously difficult. Please try to use
>>>> jmh; you'll get better and easier to interpret results.
>>>>
>>>> Andrew.
>>>>
>>>>
>>>>
>>>> http://openjdk.java.net/projects/code-tools/jmh/
>>>>
>>>>
>>>>
>>>
>>
>
Gary Gregory
2014-09-29 19:42:24 UTC
Permalink
FWIW, I've verified the same byte codes on Oracle Java 7 and 8 on Windows 7
(all 64 bit).

Gary

On Mon, Sep 29, 2014 at 1:58 PM, Paul Benedict <***@apache.org> wrote:

> Bytecode output courtesy of Mikael Ståldal:
>
> With standard loop:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: iconst_0
> 1: istore_2
> 2: aload_1
> 3: arraylength
> 4: istore_3
> 5: iload_2
> 6: iload_3
> 7: if_icmpge 29
> 10: aload_1
> 11: iload_2
> 12: aaload
> 13: astore 4
> 15: aload 4
> 17: aload_0
> 18: if_acmpne 23
> 21: iconst_1
> 22: ireturn
> 23: iinc 2, 1
> 26: goto 5
> 29: iconst_0
> 30: ireturn
>
>
> With for-each:
>
> private static boolean contains(org.apache.logging.log4j.Marker,
> org.apache.logging.log4j.Marker...);
> Code:
> 0: aload_1
> 1: astore_2
> 2: aload_2
> 3: arraylength
> 4: istore_3
> 5: iconst_0
> 6: istore 4
> 8: iload 4
> 10: iload_3
> 11: if_icmpge 34
> 14: aload_2
> 15: iload 4
> 17: aaload
> 18: astore 5
> 20: aload 5
> 22: aload_0
> 23: if_acmpne 28
> 26: iconst_1
> 27: ireturn
> 28: iinc 4, 1
> 31: goto 8
> 34: iconst_0
> 35: ireturn
>
>
>
> Cheers,
> Paul
>
> On Mon, Sep 29, 2014 at 11:31 AM, Vitaly Davidovich <***@gmail.com>
> wrote:
>
>> I think Paul's email already has jmh output.
>>
>> I looked at the generated asm on 7u60 x64 linux, and didn't see any
>> material difference.
>>
>> Paul, have you or anyone else looked at the machine code diffs between
>> the two? Looking at timing is useful, but it's possible to get caught up in
>> noise; the generated assembly should provide more conclusive data on
>> whether any real difference is there or not.
>>
>> On Mon, Sep 29, 2014 at 11:21 AM, Andrew Haley <***@redhat.com> wrote:
>>
>>> On 09/29/2014 03:29 PM, Paul Benedict wrote:
>>> > Open JDKers, I am forwarding an email to get some clarification. It's
>>> been
>>> > a common understanding that foreach should perform no differently than
>>> the
>>> > equivalent for-loop . However, some fellow developers claim there is a
>>> > noticable difference in their microbenchmarking. Can you help explain
>>> what
>>> > is really going on? It's either the case there is a true difference (a
>>> > result that would surprise me) or the results are within a margin of
>>> error
>>> > that make the results insignificant. Please advise.
>>>
>>> The actual code that such a forEach loop generates is this:
>>>
>>> private int forLoop(final int[] array) {
>>> int result = 0;
>>> int[] a = array;
>>> int len = a.length;
>>> for (int i = 0; i < len; i++) {
>>> int element = a[i];
>>> result ^= element;
>>> }
>>> return result;
>>> }
>>>
>>> If you get different timings for this one, then the measurements are
>>> suspect.
>>>
>>> Java microbenchmarking is notoriously difficult. Please try to use
>>> jmh; you'll get better and easier to interpret results.
>>>
>>> Andrew.
>>>
>>>
>>> http://openjdk.java.net/projects/code-tools/jmh/
>>>
>>>
>>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Gary Gregory
2014-09-29 15:39:27 UTC
Permalink
It's important to note that our experiments show that the byte codes are
different.

Gary

On Mon, Sep 29, 2014 at 10:29 AM, Paul Benedict <***@apache.org>
wrote:

> Open JDKers, I am forwarding an email to get some clarification. It's been
> a common understanding that foreach should perform no differently than the
> equivalent for-loop . However, some fellow developers claim there is a
> noticable difference in their microbenchmarking. Can you help explain what
> is really going on? It's either the case there is a true difference (a
> result that would surprise me) or the results are within a margin of error
> that make the results insignificant. Please advise.
>
> Cheers,
> Paul
>
> ---------- Forwarded message ----------
> From: Remko Popma <***@gmail.com>
> Date: Fri, Sep 26, 2014 at 8:43 AM
> Subject: Re: No for each loop comment?
> To: Log4J Developers List <log4j-***@logging.apache.org>
>
>
> On Windows it looks like normal for loops are slightly faster than
> for-each loops, especially for small arrays of primitives. This could be
> noise, since we are talking about 5 nanoseconds where the baseline (an
> empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
> nanos and 1910 nanos respectively) that any difference we are seeing is
> just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations
> and 10 test iterations.
>
> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
> with hyperthreading switched on (4 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
> 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
> 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
> 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
> 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
> 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
> 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
> 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
> 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
> 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
> 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
> 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
> 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
> 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
> 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
> 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
> 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
> 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the
> element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
> cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
> 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
> 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
> 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
> 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
> 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
> 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
> 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
> 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
> 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
> 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
> 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
> 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
> 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
> 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
> 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
> 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
> 316.189 0.238 ns/op
>
> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
> switched on (16 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
> 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
> 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
> 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
> 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
> 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
> 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
> 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
> 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
> 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
> 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
> 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
> 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
> 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
> 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
> 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
> 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
> 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>
>> The foreach over an array looks like it's supposed to compile to the same
>> thing:
>>
>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>
>> Same goes for .length which is supposed to be a final field which would
>> allow for inlining by the JIT I'd imagine (hence why we use final
>> everywhere):
>>
>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>
>>
>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>> mean?
>>>>>>
>>>>>> Why not for an enhanced for each loop?
>>>>>>
>>>>>> private static boolean contains(final Marker parent, final Marker...
>>>>>> localParents) {
>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>> for (int i = 0, localParentsLength = localParents.length;
>>>>>> i < localParentsLength; i++) {
>>>>>> final Marker marker = localParents[i];
>>>>>> if (marker == parent) {
>>>>>> return true;
>>>>>> }
>>>>>> }
>>>>>> return false;
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Gary
>>>>>>
>>>>>>
>>>>>>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Ralph Goers
2014-09-29 17:34:08 UTC
Permalink
Remko,

While testing the performance of for vs foreach is interesting, it would be more meaningful to test a MarkerManager implementation that uses for loops vs for each. As I recall I tested using tests similar to what is in MarkerTest looping on those. What matters is what the performance is on those kinds of tests. That is why the code individually tests 1 or 2 parents vs always using a for loop - I found that the overhead of setting up the for loop for 1 or 2 items was greater than doing the test and manually testing them.

What I would suggest is making a few variations of MarkerManager -
1. The current code.
2. Replace the for loops with for each loops.
3. for loops only (don’t specifically test 1 or 2 items)
4. forach loops only (don’t specifically test 1 or 2 items)

Any other variations you can think of such as removing the assignment.

Ralph





On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:

> On Windows it looks like normal for loops are slightly faster than for-each loops, especially for small arrays of primitives. This could be noise, since we are talking about 5 nanoseconds where the baseline (an empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255 nanos and 1910 nanos respectively) that any difference we are seeing is just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations and 10 test iterations.
>
> Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165 316.189 0.238 ns/op
>
> 64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched on (16 virtual cores)
> Benchmark Mode Samples Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
> The foreach over an array looks like it's supposed to compile to the same thing:
>
> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>
> Same goes for .length which is supposed to be a final field which would allow for inlining by the JIT I'd imagine (hence why we use final everywhere):
>
> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>
> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com> wrote:
> You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.
>
> Ralph
>
> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>
>> I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.
>>
>>
>> Cheers,
>> Paul
>>
>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
>> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
>> The class is MarkerManager in log4j-api.
>>
>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
>> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>>
>> Ralph
>>
>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>
>>>
>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>
>>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>>> :-)
>>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>
>
> I'd be curious to see the results!
>
> Gary
>
>>>
>>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>>
>>>>
>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>
>>>> Why not for an enhanced for each loop?
>>>>
>>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>>> //noinspection ForLoopReplaceableByForEach
>>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>>> final Marker marker = localParents[i];
>>>> if (marker == parent) {
>>>> return true;
>>>> }
>>>> }
>>>> return false;
>>>> }
>>>>
>>>> Thanks,
>>>> Gary
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> JUnit in Action, Second Edition
>>>> Spring Batch in Action
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>>
>>>>
>>>> --
>>>> Matt Sicker <***@gmail.com>
>>
>>
>>
>
>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> JUnit in Action, Second Edition
> Spring Batch in Action
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
Gary Gregory
2014-09-29 17:42:03 UTC
Permalink
It would be nice to make the MarkerManager/Log4jMarker pluggable and use an
interface. Then we could test a NoOp Marker Manager or even the value null?
Perhaps there is a performance gain to be had by not making this feature
available at all. It's a narrow use case but it would be a way to squeeze
out every last ounce of CPU. Maybe.

Gary

On Mon, Sep 29, 2014 at 1:34 PM, Ralph Goers <***@dslextreme.com>
wrote:

> Remko,
>
> While testing the performance of for vs foreach is interesting, it would
> be more meaningful to test a MarkerManager implementation that uses for
> loops vs for each. As I recall I tested using tests similar to what is in
> MarkerTest looping on those. What matters is what the performance is on
> those kinds of tests. That is why the code individually tests 1 or 2
> parents vs always using a for loop - I found that the overhead of setting
> up the for loop for 1 or 2 items was greater than doing the test and
> manually testing them.
>
> What I would suggest is making a few variations of MarkerManager -
> 1. The current code.
> 2. Replace the for loops with for each loops.
> 3. for loops only (don’t specifically test 1 or 2 items)
> 4. forach loops only (don’t specifically test 1 or 2 items)
>
> Any other variations you can think of such as removing the assignment.
>
> Ralph
>
>
>
>
>
> On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:
>
> On Windows it looks like normal for loops are slightly faster than
> for-each loops, especially for small arrays of primitives. This could be
> noise, since we are talking about 5 nanoseconds where the baseline (an
> empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
> nanos and 1910 nanos respectively) that any difference we are seeing is
> just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations
> and 10 test iterations.
>
> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
> with hyperthreading switched on (4 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
> 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
> 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
> 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
> 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
> 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
> 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
> 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
> 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
> 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
> 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
> 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
> 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
> 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
> 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
> 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
> 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
> 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the
> element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
> cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
> 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
> 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
> 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
> 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
> 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
> 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
> 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
> 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
> 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
> 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
> 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
> 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
> 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
> 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
> 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
> 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
> 316.189 0.238 ns/op
>
> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
> switched on (16 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
> 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
> 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
> 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
> 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
> 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
> 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
> 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
> 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
> 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
> 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
> 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
> 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
> 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
> 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
> 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
> 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
> 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>
>> The foreach over an array looks like it's supposed to compile to the same
>> thing:
>>
>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>
>> Same goes for .length which is supposed to be a final field which would
>> allow for inlining by the JIT I'd imagine (hence why we use final
>> everywhere):
>>
>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>
>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>
>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>> ***@dslextreme.com> wrote:
>>>
>>>> You can think that, but the testing in the testing I did at the time
>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>> it wasn’t.
>>>>
>>>> Ralph
>>>>
>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>> wrote:
>>>>
>>>> I would be surprised if foreach over an array makes a speed difference.
>>>> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
>>>> has to be desugared using a for/index loop like you have there. I don't
>>>> think this code is saving anything.
>>>>
>>>>
>>>> Cheers,
>>>> Paul
>>>>
>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>> wrote:
>>>>
>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>>>>> with another commit email...
>>>>> The class is MarkerManager in log4j-api.
>>>>>
>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>> ***@dslextreme.com> wrote:
>>>>>
>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>> conversation then. I think that is why the comment was added.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>
>>>>>> From what I remember, it had something to do with the incredibly
>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>
>>>>>> :-)
>>>>>> I do remember reading that someone found a speed difference. But I've
>>>>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>>>>
>>>>>>
>>> I'd be curious to see the results!
>>>
>>> Gary
>>>
>>>
>>>>
>>>>>> On the other hand, this is configuration, so it only happens once and
>>>>>> is very unlikely to be "hot" code so there is probably not much value in
>>>>>> optimizing this loop.
>>>>>>
>>>>>>
>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>> mean?
>>>>>>>
>>>>>>> Why not for an enhanced for each loop?
>>>>>>>
>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>> Marker... localParents) {
>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>> for (int i = 0, localParentsLength =
>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>> final Marker marker = localParents[i];
>>>>>>> if (marker == parent) {
>>>>>>> return true;
>>>>>>> }
>>>>>>> }
>>>>>>> return false;
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gary
>>>>>>>
>>>>>>> --
>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>> Home: http://garygregory.com/
>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Sicker <***@gmail.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <http://www.manning.com/bauer3/>
>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>> Spring Batch in Action <http://www.manning.com/templier/>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
>>
>
>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Gary Gregory
2014-09-29 17:42:18 UTC
Permalink
After 2.1... that is.

Gary

On Mon, Sep 29, 2014 at 1:42 PM, Gary Gregory <***@gmail.com>
wrote:

> It would be nice to make the MarkerManager/Log4jMarker pluggable and use
> an interface. Then we could test a NoOp Marker Manager or even the value
> null? Perhaps there is a performance gain to be had by not making this
> feature available at all. It's a narrow use case but it would be a way to
> squeeze out every last ounce of CPU. Maybe.
>
> Gary
>
> On Mon, Sep 29, 2014 at 1:34 PM, Ralph Goers <***@dslextreme.com>
> wrote:
>
>> Remko,
>>
>> While testing the performance of for vs foreach is interesting, it would
>> be more meaningful to test a MarkerManager implementation that uses for
>> loops vs for each. As I recall I tested using tests similar to what is in
>> MarkerTest looping on those. What matters is what the performance is on
>> those kinds of tests. That is why the code individually tests 1 or 2
>> parents vs always using a for loop - I found that the overhead of setting
>> up the for loop for 1 or 2 items was greater than doing the test and
>> manually testing them.
>>
>> What I would suggest is making a few variations of MarkerManager -
>> 1. The current code.
>> 2. Replace the for loops with for each loops.
>> 3. for loops only (don’t specifically test 1 or 2 items)
>> 4. forach loops only (don’t specifically test 1 or 2 items)
>>
>> Any other variations you can think of such as removing the assignment.
>>
>> Ralph
>>
>>
>>
>>
>>
>> On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:
>>
>> On Windows it looks like normal for loops are slightly faster than
>> for-each loops, especially for small arrays of primitives. This could be
>> noise, since we are talking about 5 nanoseconds where the baseline (an
>> empty method invocation) is 12 nanos.
>>
>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
>> nanos and 1910 nanos respectively) that any difference we are seeing is
>> just noise.
>>
>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>> and 10 test iterations.
>>
>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
>> with hyperthreading switched on (4 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>> 12.432 0.550 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>> 2759.592 3.431 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>> 2761.729 3.127 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>> 292.880 1.065 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>> 288.751 1.101 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>> 41.826 0.870 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>> 36.894 0.782 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>> 22.393 0.618 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>> 17.146 0.560 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>> 31959.057 14.341 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>> 32461.985 14.353 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>> 3591.200 4.852 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>> 3445.998 4.010 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>> 438.207 1.923 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>> 439.576 2.139 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>> 58.957 1.247 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>> 60.712 1.284 ns/op
>>
>>
>> // For loops for Object arrays are similar but return the total XOR of
>> the element hashcodes.
>>
>> private int forEachLoop(final int[] array) {
>> int result = 0;
>> for (final int element : array) {
>> result ^= element;
>> }
>> return result;
>> }
>>
>> private int forLoop(final int[] array) {
>> int result = 0;
>> for (int i = 0; i < array.length; i++) {
>> result ^= array[i];
>> }
>> return result;
>> }
>>
>>
>>
>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>> cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>> 255.300 0.201 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>> 3938.055 1.207 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>> 3937.929 0.748 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>> 606.631 0.626 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>> 609.565 0.416 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>> 294.204 0.280 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>> 296.411 0.223 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>> 261.519 0.181 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>> 260.435 0.115 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>> 48154.673 18.846 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>> 47793.868 17.615 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>> 5256.767 2.451 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>> 5325.377 2.388 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>> 773.541 0.330 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>> 774.513 0.574 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>> 317.232 0.134 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>> 316.189 0.238 ns/op
>>
>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>> switched on (16 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>> 1910.576 29.256 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>> 5132.885 25.137 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>> 4811.572 52.072 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>> 1967.213 28.970 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>> 2004.501 31.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>> 1575.329 6.457 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>> 1957.714 27.815 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>> 1980.301 30.818 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>> 1589.120 8.449 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>> 43301.320 50.589 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>> 43574.129 55.272 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>> 5831.250 19.667 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>> 4823.096 13.180 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>> 1930.819 24.136 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>> 1625.806 10.385 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>> 1888.683 22.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>> 1581.979 6.322 ns/op
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>
>>> The foreach over an array looks like it's supposed to compile to the
>>> same thing:
>>>
>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>
>>> Same goes for .length which is supposed to be a final field which would
>>> allow for inlining by the JIT I'd imagine (hence why we use final
>>> everywhere):
>>>
>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>
>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>> ***@dslextreme.com> wrote:
>>>>
>>>>> You can think that, but the testing in the testing I did at the time
>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>> it wasn’t.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>> wrote:
>>>>>
>>>>> I would be surprised if foreach over an array makes a speed
>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>> I don't think this code is saving anything.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Paul
>>>>>
>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>> up with another commit email...
>>>>>> The class is MarkerManager in log4j-api.
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>> ***@dslextreme.com> wrote:
>>>>>>
>>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>
>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>
>>>>>>> :-)
>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>> this.)
>>>>>>>
>>>>>>>
>>>> I'd be curious to see the results!
>>>>
>>>> Gary
>>>>
>>>>
>>>>>
>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>> in optimizing this loop.
>>>>>>>
>>>>>>>
>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>>> mean?
>>>>>>>>
>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>
>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>> Marker... localParents) {
>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>> final Marker marker = localParents[i];
>>>>>>>> if (marker == parent) {
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>> }
>>>>>>>> return false;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gary
>>>>>>>>
>>>>>>>> --
>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>> Home: http://garygregory.com/
>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>
>>
>>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Ralph Goers
2014-09-29 17:54:42 UTC
Permalink
I’m not sure there is any value there. If you don’t add any Marker Filters to your configuration you won’t have the overhead. If you do have Marker Filters and they are no-op’d then your configuration won’t work as expected.

The only real use case would be for these kinds of tests.

Ralph

On Sep 29, 2014, at 10:42 AM, Gary Gregory <***@gmail.com> wrote:

> It would be nice to make the MarkerManager/Log4jMarker pluggable and use an interface. Then we could test a NoOp Marker Manager or even the value null? Perhaps there is a performance gain to be had by not making this feature available at all. It's a narrow use case but it would be a way to squeeze out every last ounce of CPU. Maybe.
>
> Gary
>
> On Mon, Sep 29, 2014 at 1:34 PM, Ralph Goers <***@dslextreme.com> wrote:
> Remko,
>
> While testing the performance of for vs foreach is interesting, it would be more meaningful to test a MarkerManager implementation that uses for loops vs for each. As I recall I tested using tests similar to what is in MarkerTest looping on those. What matters is what the performance is on those kinds of tests. That is why the code individually tests 1 or 2 parents vs always using a for loop - I found that the overhead of setting up the for loop for 1 or 2 items was greater than doing the test and manually testing them.
>
> What I would suggest is making a few variations of MarkerManager -
> 1. The current code.
> 2. Replace the for loops with for each loops.
> 3. for loops only (don’t specifically test 1 or 2 items)
> 4. forach loops only (don’t specifically test 1 or 2 items)
>
> Any other variations you can think of such as removing the assignment.
>
> Ralph
>
>
>
>
>
> On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:
>
>> On Windows it looks like normal for loops are slightly faster than for-each loops, especially for small arrays of primitives. This could be noise, since we are talking about 5 nanoseconds where the baseline (an empty method invocation) is 12 nanos.
>>
>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255 nanos and 1910 nanos respectively) that any difference we are seeing is just noise.
>>
>> All benchmarks were run with one fork, one thread, 10 warmup iterations and 10 test iterations.
>>
>> Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)
>> Benchmark Mode Samples Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947 12.432 0.550 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597 2759.592 3.431 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494 2761.729 3.127 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124 292.880 1.065 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155 288.751 1.101 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980 41.826 0.870 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770 36.894 0.782 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847 22.393 0.618 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552 17.146 0.560 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839 31959.057 14.341 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137 32461.985 14.353 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495 3591.200 4.852 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560 3445.998 4.010 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796 438.207 1.923 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333 439.576 2.139 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924 58.957 1.247 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416 60.712 1.284 ns/op
>>
>>
>> // For loops for Object arrays are similar but return the total XOR of the element hashcodes.
>>
>> private int forEachLoop(final int[] array) {
>> int result = 0;
>> for (final int element : array) {
>> result ^= element;
>> }
>> return result;
>> }
>>
>> private int forLoop(final int[] array) {
>> int result = 0;
>> for (int i = 0; i < array.length; i++) {
>> result ^= array[i];
>> }
>> return result;
>> }
>>
>>
>>
>> Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)
>> Benchmark Mode Samples Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212 255.300 0.201 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808 3938.055 1.207 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897 3937.929 0.748 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989 606.631 0.626 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973 609.565 0.416 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933 294.204 0.280 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070 296.411 0.223 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400 261.519 0.181 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637 260.435 0.115 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872 48154.673 18.846 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777 47793.868 17.615 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432 5256.767 2.451 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644 5325.377 2.388 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653 773.541 0.330 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570 774.513 0.574 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754 317.232 0.134 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165 316.189 0.238 ns/op
>>
>> 64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched on (16 virtual cores)
>> Benchmark Mode Samples Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783 1910.576 29.256 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584 5132.885 25.137 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672 4811.572 52.072 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119 1967.213 28.970 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804 2004.501 31.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439 1575.329 6.457 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215 1957.714 27.815 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826 1980.301 30.818 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654 1589.120 8.449 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947 43301.320 50.589 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117 43574.129 55.272 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697 5831.250 19.667 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244 4823.096 13.180 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502 1930.819 24.136 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619 1625.806 10.385 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226 1888.683 22.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838 1581.979 6.322 ns/op
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>> The foreach over an array looks like it's supposed to compile to the same thing:
>>
>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>
>> Same goes for .length which is supposed to be a final field which would allow for inlining by the JIT I'd imagine (hence why we use final everywhere):
>>
>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>
>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com> wrote:
>> You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.
>>
>> Ralph
>>
>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:
>>
>>> I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.
>>>
>>>
>>> Cheers,
>>> Paul
>>>
>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
>>> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
>>> The class is MarkerManager in log4j-api.
>>>
>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
>>> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>>>
>>> Ralph
>>>
>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>>>
>>>>
>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>
>>>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>>>> :-)
>>>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>>
>>
>> I'd be curious to see the results!
>>
>> Gary
>>
>>>>
>>>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>>>
>>>>>
>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>>>
>>>>> Why not for an enhanced for each loop?
>>>>>
>>>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>>>> //noinspection ForLoopReplaceableByForEach
>>>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>>>> final Marker marker = localParents[i];
>>>>> if (marker == parent) {
>>>>> return true;
>>>>> }
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Gary
>>>>>
>>>>> --
>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>> Java Persistence with Hibernate, Second Edition
>>>>> JUnit in Action, Second Edition
>>>>> Spring Batch in Action
>>>>> Blog: http://garygregory.wordpress.com
>>>>> Home: http://garygregory.com/
>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Matt Sicker <***@gmail.com>
>>>
>>>
>>>
>>
>>
>>
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> JUnit in Action, Second Edition
>> Spring Batch in Action
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
>>
>
>
>
>
> --
> E-Mail: ***@gmail.com | ***@apache.org
> Java Persistence with Hibernate, Second Edition
> JUnit in Action, Second Edition
> Spring Batch in Action
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
Remko Popma
2014-09-30 00:19:53 UTC
Permalink
On Tue, Sep 30, 2014 at 2:34 AM, Ralph Goers <***@dslextreme.com>
wrote:

> Remko,
>
> While testing the performance of for vs foreach is interesting, it would
> be more meaningful to test a MarkerManager implementation that uses for
> loops vs for each. As I recall I tested using tests similar to what is in
> MarkerTest looping on those. What matters is what the performance is on
> those kinds of tests. That is why the code individually tests 1 or 2
> parents vs always using a for loop - I found that the overhead of setting
> up the for loop for 1 or 2 items was greater than doing the test and
> manually testing them.
>

> What I would suggest is making a few variations of MarkerManager -
> 1. The current code.
> 2. Replace the for loops with for each loops.
> 3. for loops only (don’t specifically test 1 or 2 items)
> 4. forach loops only (don’t specifically test 1 or 2 items)
>
> Any other variations you can think of such as removing the assignment.
>

Ralph, I was hoping to settle the dispute but seem to have made things
worse.
I agree that writing a benchmark for MarkerManager instead of focusing too
narrowly on the for loop would have added more long term value.

>
I'll try to not make the same mistake again and want to spend my time next
on things that benefit the most users. In terms of performance, I am
thinking to:
1. Document current performance (to help users decide which logging API and
impl to use in their applications)
2. Improve log4j2 performance

For (1) I am thinking of doing a performance comparison similar to what I
did for async loggers: compare the various logging APIs (log4j-1.2, JUL,
SLF4J, log4j2) with both the log4j2 implementation and their native
implementation. JUL finally got rid of the synchronized block in Java7, so
it may be interesting to do this comparison on Java 6, 7 and 8. In addtion,
logj2 now has a FileAppender, RandomAccessFileAppender and a
MemoryMappedFileAppender, so depending on how many multi-threaded scenarios
we want to do, this will be a lot of work.

For (2), we should just use a profiler to find the biggest bottleneck,
address that, measure again, fix the next biggest bottleneck, etc. Here
also we need to determine which scenarios to profile. In a scenario with
markers, if MarkerManager$Log4jMarker#contains is a bottleneck we'll get to
it then.

I hope this makes sense, -Remko


> Ralph
>
>
>
>
>
> On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:
>
> On Windows it looks like normal for loops are slightly faster than
> for-each loops, especially for small arrays of primitives. This could be
> noise, since we are talking about 5 nanoseconds where the baseline (an
> empty method invocation) is 12 nanos.
>
> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
> nanos and 1910 nanos respectively) that any difference we are seeing is
> just noise.
>
> All benchmarks were run with one fork, one thread, 10 warmup iterations
> and 10 test iterations.
>
> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
> with hyperthreading switched on (4 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
> 12.432 0.550 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
> 2759.592 3.431 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
> 2761.729 3.127 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
> 292.880 1.065 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
> 288.751 1.101 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
> 41.826 0.870 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
> 36.894 0.782 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
> 22.393 0.618 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
> 17.146 0.560 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
> 31959.057 14.341 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
> 32461.985 14.353 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
> 3591.200 4.852 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
> 3445.998 4.010 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
> 438.207 1.923 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
> 439.576 2.139 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
> 58.957 1.247 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
> 60.712 1.284 ns/op
>
>
> // For loops for Object arrays are similar but return the total XOR of the
> element hashcodes.
>
> private int forEachLoop(final int[] array) {
> int result = 0;
> for (final int element : array) {
> result ^= element;
> }
> return result;
> }
>
> private int forLoop(final int[] array) {
> int result = 0;
> for (int i = 0; i < array.length; i++) {
> result ^= array[i];
> }
> return result;
> }
>
>
>
> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
> cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
> 255.300 0.201 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
> 3938.055 1.207 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
> 3937.929 0.748 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
> 606.631 0.626 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
> 609.565 0.416 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
> 294.204 0.280 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
> 296.411 0.223 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
> 261.519 0.181 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
> 260.435 0.115 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
> 48154.673 18.846 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
> 47793.868 17.615 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
> 5256.767 2.451 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
> 5325.377 2.388 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
> 773.541 0.330 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
> 774.513 0.574 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
> 317.232 0.134 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
> 316.189 0.238 ns/op
>
> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
> switched on (16 virtual cores)*
> Benchmark Mode Samples
> Score Score error Units
> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
> 1910.576 29.256 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
> 5132.885 25.137 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
> 4811.572 52.072 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
> 1967.213 28.970 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
> 2004.501 31.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
> 1575.329 6.457 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
> 1957.714 27.815 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
> 1980.301 30.818 ns/op
> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
> 1589.120 8.449 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
> 43301.320 50.589 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
> 43574.129 55.272 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
> 5831.250 19.667 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
> 4823.096 13.180 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
> 1930.819 24.136 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
> 1625.806 10.385 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
> 1888.683 22.554 ns/op
> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
> 1581.979 6.322 ns/op
>
>
>
>
> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>
>> The foreach over an array looks like it's supposed to compile to the same
>> thing:
>>
>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>
>> Same goes for .length which is supposed to be a final field which would
>> allow for inlining by the JIT I'd imagine (hence why we use final
>> everywhere):
>>
>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>
>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>
>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>> ***@dslextreme.com> wrote:
>>>
>>>> You can think that, but the testing in the testing I did at the time
>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>> it wasn’t.
>>>>
>>>> Ralph
>>>>
>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>> wrote:
>>>>
>>>> I would be surprised if foreach over an array makes a speed difference.
>>>> AFAIK, foreach is synatic sugar. There is no iterator for an array so it
>>>> has to be desugared using a for/index loop like you have there. I don't
>>>> think this code is saving anything.
>>>>
>>>>
>>>> Cheers,
>>>> Paul
>>>>
>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>> wrote:
>>>>
>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed up
>>>>> with another commit email...
>>>>> The class is MarkerManager in log4j-api.
>>>>>
>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>> ***@dslextreme.com> wrote:
>>>>>
>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>> conversation then. I think that is why the comment was added.
>>>>>>
>>>>>> Ralph
>>>>>>
>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>
>>>>>> From what I remember, it had something to do with the incredibly
>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>
>>>>>> :-)
>>>>>> I do remember reading that someone found a speed difference. But I've
>>>>>> never verified it. (Note to self: write a quick jmh benchmark for this.)
>>>>>>
>>>>>>
>>> I'd be curious to see the results!
>>>
>>> Gary
>>>
>>>
>>>>
>>>>>> On the other hand, this is configuration, so it only happens once and
>>>>>> is very unlikely to be "hot" code so there is probably not much value in
>>>>>> optimizing this loop.
>>>>>>
>>>>>>
>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>> mean?
>>>>>>>
>>>>>>> Why not for an enhanced for each loop?
>>>>>>>
>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>> Marker... localParents) {
>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>> for (int i = 0, localParentsLength =
>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>> final Marker marker = localParents[i];
>>>>>>> if (marker == parent) {
>>>>>>> return true;
>>>>>>> }
>>>>>>> }
>>>>>>> return false;
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gary
>>>>>>>
>>>>>>> --
>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>> Home: http://garygregory.com/
>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Matt Sicker <***@gmail.com>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> <http://www.manning.com/bauer3/>
>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>> Spring Batch in Action <http://www.manning.com/templier/>
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>
>>
>>
>> --
>> Matt Sicker <***@gmail.com>
>>
>
>
>
Gary Gregory
2014-09-30 02:16:11 UTC
Permalink
This is all very interesting. It will be good to see what the folks on the
open JDK list say...

Gary

On Mon, Sep 29, 2014 at 8:19 PM, Remko Popma <***@gmail.com> wrote:

>
> On Tue, Sep 30, 2014 at 2:34 AM, Ralph Goers <***@dslextreme.com>
> wrote:
>
>> Remko,
>>
>> While testing the performance of for vs foreach is interesting, it would
>> be more meaningful to test a MarkerManager implementation that uses for
>> loops vs for each. As I recall I tested using tests similar to what is in
>> MarkerTest looping on those. What matters is what the performance is on
>> those kinds of tests. That is why the code individually tests 1 or 2
>> parents vs always using a for loop - I found that the overhead of setting
>> up the for loop for 1 or 2 items was greater than doing the test and
>> manually testing them.
>>
>
>> What I would suggest is making a few variations of MarkerManager -
>> 1. The current code.
>> 2. Replace the for loops with for each loops.
>> 3. for loops only (don’t specifically test 1 or 2 items)
>> 4. forach loops only (don’t specifically test 1 or 2 items)
>>
>> Any other variations you can think of such as removing the assignment.
>>
>
> Ralph, I was hoping to settle the dispute but seem to have made things
> worse.
> I agree that writing a benchmark for MarkerManager instead of focusing too
> narrowly on the for loop would have added more long term value.
>
>>
> I'll try to not make the same mistake again and want to spend my time next
> on things that benefit the most users. In terms of performance, I am
> thinking to:
> 1. Document current performance (to help users decide which logging API
> and impl to use in their applications)
> 2. Improve log4j2 performance
>
> For (1) I am thinking of doing a performance comparison similar to what I
> did for async loggers: compare the various logging APIs (log4j-1.2, JUL,
> SLF4J, log4j2) with both the log4j2 implementation and their native
> implementation. JUL finally got rid of the synchronized block in Java7, so
> it may be interesting to do this comparison on Java 6, 7 and 8. In addtion,
> logj2 now has a FileAppender, RandomAccessFileAppender and a
> MemoryMappedFileAppender, so depending on how many multi-threaded scenarios
> we want to do, this will be a lot of work.
>
> For (2), we should just use a profiler to find the biggest bottleneck,
> address that, measure again, fix the next biggest bottleneck, etc. Here
> also we need to determine which scenarios to profile. In a scenario with
> markers, if MarkerManager$Log4jMarker#contains is a bottleneck we'll get to
> it then.
>
> I hope this makes sense, -Remko
>
>
>> Ralph
>>
>>
>>
>>
>>
>> On Sep 26, 2014, at 6:43 AM, Remko Popma <***@gmail.com> wrote:
>>
>> On Windows it looks like normal for loops are slightly faster than
>> for-each loops, especially for small arrays of primitives. This could be
>> noise, since we are talking about 5 nanoseconds where the baseline (an
>> empty method invocation) is 12 nanos.
>>
>> On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255
>> nanos and 1910 nanos respectively) that any difference we are seeing is
>> just noise.
>>
>> All benchmarks were run with one fork, one thread, 10 warmup iterations
>> and 10 test iterations.
>>
>> *Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz
>> with hyperthreading switched on (4 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947
>> 12.432 0.550 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597
>> 2759.592 3.431 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494
>> 2761.729 3.127 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124
>> 292.880 1.065 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155
>> 288.751 1.101 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980
>> 41.826 0.870 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770
>> 36.894 0.782 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847
>> 22.393 0.618 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552
>> 17.146 0.560 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839
>> 31959.057 14.341 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137
>> 32461.985 14.353 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495
>> 3591.200 4.852 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560
>> 3445.998 4.010 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796
>> 438.207 1.923 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333
>> 439.576 2.139 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924
>> 58.957 1.247 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416
>> 60.712 1.284 ns/op
>>
>>
>> // For loops for Object arrays are similar but return the total XOR of
>> the element hashcodes.
>>
>> private int forEachLoop(final int[] array) {
>> int result = 0;
>> for (final int element : array) {
>> result ^= element;
>> }
>> return result;
>> }
>>
>> private int forLoop(final int[] array) {
>> int result = 0;
>> for (int i = 0; i < array.length; i++) {
>> result ^= array[i];
>> }
>> return result;
>> }
>>
>>
>>
>> *Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core
>> Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual
>> cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212
>> 255.300 0.201 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808
>> 3938.055 1.207 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897
>> 3937.929 0.748 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989
>> 606.631 0.626 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973
>> 609.565 0.416 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933
>> 294.204 0.280 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070
>> 296.411 0.223 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400
>> 261.519 0.181 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637
>> 260.435 0.115 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872
>> 48154.673 18.846 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777
>> 47793.868 17.615 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432
>> 5256.767 2.451 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644
>> 5325.377 2.388 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653
>> 773.541 0.330 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570
>> 774.513 0.574 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754
>> 317.232 0.134 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165
>> 316.189 0.238 ns/op
>>
>> *64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06
>> (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading
>> switched on (16 virtual cores)*
>> Benchmark Mode Samples
>> Score Score error Units
>> o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783
>> 1910.576 29.256 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584
>> 5132.885 25.137 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672
>> 4811.572 52.072 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119
>> 1967.213 28.970 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804
>> 2004.501 31.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439
>> 1575.329 6.457 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215
>> 1957.714 27.815 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826
>> 1980.301 30.818 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654
>> 1589.120 8.449 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947
>> 43301.320 50.589 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117
>> 43574.129 55.272 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697
>> 5831.250 19.667 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244
>> 4823.096 13.180 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502
>> 1930.819 24.136 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619
>> 1625.806 10.385 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226
>> 1888.683 22.554 ns/op
>> o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838
>> 1581.979 6.322 ns/op
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
>>
>>> The foreach over an array looks like it's supposed to compile to the
>>> same thing:
>>>
>>> https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html
>>>
>>> Same goes for .length which is supposed to be a final field which would
>>> allow for inlining by the JIT I'd imagine (hence why we use final
>>> everywhere):
>>>
>>> http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7
>>>
>>> On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
>>>
>>>> On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <
>>>> ***@dslextreme.com> wrote:
>>>>
>>>>> You can think that, but the testing in the testing I did at the time
>>>>> the difference was quite noticeable. I would have left it as a foreach if
>>>>> it wasn’t.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org>
>>>>> wrote:
>>>>>
>>>>> I would be surprised if foreach over an array makes a speed
>>>>> difference. AFAIK, foreach is synatic sugar. There is no iterator for an
>>>>> array so it has to be desugared using a for/index loop like you have there.
>>>>> I don't think this code is saving anything.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Paul
>>>>>
>>>>> On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hm.. Why did I think it was configuration? I must have gotten mixed
>>>>>> up with another commit email...
>>>>>> The class is MarkerManager in log4j-api.
>>>>>>
>>>>>> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <
>>>>>> ***@dslextreme.com> wrote:
>>>>>>
>>>>>>> Configuration? If I recall correctly this method is called on every
>>>>>>> log event that contains a Marker. But I am just guessing since Gary
>>>>>>> neglected to say what class this is. But I do remember doing extensive
>>>>>>> testing when this code was written. And I also remember someone (probably
>>>>>>> Gary) mentioning then that it should use a for-loop and we had this same
>>>>>>> conversation then. I think that is why the comment was added.
>>>>>>>
>>>>>>> Ralph
>>>>>>>
>>>>>>> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>>>>>>
>>>>>>> From what I remember, it had something to do with the incredibly
>>>>>>> large difference in speed between for loops and foreach loops on arrays.
>>>>>>> And by incredibly large, I mean most likely negligible.
>>>>>>>
>>>>>>> :-)
>>>>>>> I do remember reading that someone found a speed difference. But
>>>>>>> I've never verified it. (Note to self: write a quick jmh benchmark for
>>>>>>> this.)
>>>>>>>
>>>>>>>
>>>> I'd be curious to see the results!
>>>>
>>>> Gary
>>>>
>>>>
>>>>>
>>>>>>> On the other hand, this is configuration, so it only happens once
>>>>>>> and is very unlikely to be "hot" code so there is probably not much value
>>>>>>> in optimizing this loop.
>>>>>>>
>>>>>>>
>>>>>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment
>>>>>>>> mean?
>>>>>>>>
>>>>>>>> Why not for an enhanced for each loop?
>>>>>>>>
>>>>>>>> private static boolean contains(final Marker parent, final
>>>>>>>> Marker... localParents) {
>>>>>>>> //noinspection ForLoopReplaceableByForEach
>>>>>>>> for (int i = 0, localParentsLength =
>>>>>>>> localParents.length; i < localParentsLength; i++) {
>>>>>>>> final Marker marker = localParents[i];
>>>>>>>> if (marker == parent) {
>>>>>>>> return true;
>>>>>>>> }
>>>>>>>> }
>>>>>>>> return false;
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gary
>>>>>>>>
>>>>>>>> --
>>>>>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>>>>>> Java Persistence with Hibernate, Second Edition
>>>>>>>> <http://www.manning.com/bauer3/>
>>>>>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>>>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>>>>>> Blog: http://garygregory.wordpress.com
>>>>>>>> Home: http://garygregory.com/
>>>>>>>> Tweet! http://twitter.com/GaryGregory
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Matt Sicker <***@gmail.com>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> E-Mail: ***@gmail.com | ***@apache.org
>>>> Java Persistence with Hibernate, Second Edition
>>>> <http://www.manning.com/bauer3/>
>>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>>>> Spring Batch in Action <http://www.manning.com/templier/>
>>>> Blog: http://garygregory.wordpress.com
>>>> Home: http://garygregory.com/
>>>> Tweet! http://twitter.com/GaryGregory
>>>>
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>>>
>>
>>
>>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Ralph Goers
2014-09-25 15:53:13 UTC
Permalink
Yes, and notice that it has an internal Log4jMarker class, which is the “real” Marker class in Log4j 2 and which is where that code is. So it is called anytime a Marker filter is present.

Ralph



On Sep 25, 2014, at 8:47 AM, Remko Popma <***@gmail.com> wrote:

> Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
> The class is MarkerManager in log4j-api.
>
> On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
> Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.
>
> Ralph
>
> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>
>>
>> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>>
>>> From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
>> :-)
>> I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)
>>
>> On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.
>>
>>>
>>> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>>
>>> Why not for an enhanced for each loop?
>>>
>>> private static boolean contains(final Marker parent, final Marker... localParents) {
>>> //noinspection ForLoopReplaceableByForEach
>>> for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
>>> final Marker marker = localParents[i];
>>> if (marker == parent) {
>>> return true;
>>> }
>>> }
>>> return false;
>>> }
>>>
>>> Thanks,
>>> Gary
>>>
>>> --
>>> E-Mail: ***@gmail.com | ***@apache.org
>>> Java Persistence with Hibernate, Second Edition
>>> JUnit in Action, Second Edition
>>> Spring Batch in Action
>>> Blog: http://garygregory.wordpress.com
>>> Home: http://garygregory.com/
>>> Tweet! http://twitter.com/GaryGregory
>>>
>>>
>>>
>>> --
>>> Matt Sicker <***@gmail.com>
>
>
Gary Gregory
2014-09-25 15:50:50 UTC
Permalink
Sorry, that one is in MarkerManager.

Gary

On Thu, Sep 25, 2014 at 11:18 AM, Ralph Goers <***@dslextreme.com>
wrote:

> Configuration? If I recall correctly this method is called on every log
> event that contains a Marker. But I am just guessing since Gary neglected
> to say what class this is. But I do remember doing extensive testing when
> this code was written. And I also remember someone (probably Gary)
> mentioning then that it should use a for-loop and we had this same
> conversation then. I think that is why the comment was added.
>
> Ralph
>
> On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:
>
>
> On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:
>
> From what I remember, it had something to do with the incredibly large
> difference in speed between for loops and foreach loops on arrays. And by
> incredibly large, I mean most likely negligible.
>
> :-)
> I do remember reading that someone found a speed difference. But I've
> never verified it. (Note to self: write a quick jmh benchmark for this.)
>
> On the other hand, this is configuration, so it only happens once and is
> very unlikely to be "hot" code so there is probably not much value in
> optimizing this loop.
>
>
> On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
>
>> Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?
>>
>> Why not for an enhanced for each loop?
>>
>> private static boolean contains(final Marker parent, final Marker...
>> localParents) {
>> //noinspection ForLoopReplaceableByForEach
>> for (int i = 0, localParentsLength = localParents.length; i <
>> localParentsLength; i++) {
>> final Marker marker = localParents[i];
>> if (marker == parent) {
>> return true;
>> }
>> }
>> return false;
>> }
>>
>> Thanks,
>> Gary
>>
>> --
>> E-Mail: ***@gmail.com | ***@apache.org
>> Java Persistence with Hibernate, Second Edition
>> <http://www.manning.com/bauer3/>
>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
>> Spring Batch in Action <http://www.manning.com/templier/>
>> Blog: http://garygregory.wordpress.com
>> Home: http://garygregory.com/
>> Tweet! http://twitter.com/GaryGregory
>>
>
>
>
> --
> Matt Sicker <***@gmail.com>
>
>
>


--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Gary Gregory
2014-09-26 17:09:19 UTC
Permalink
Which JDK version and vendor did you use?

Gary

<div>-------- Original message --------</div><div>From: Mikael Ståldal <***@appearnetworks.com> </div><div>Date:09/26/2014 11:34 (GMT-05:00) </div><div>To: Log4J Developers List <log4j-***@logging.apache.org> </div><div>Subject: Re: No for each loop comment? </div><div>
</div>The byte code is actually different. This seems like a weakness of the JDK Javac to me.

With standard loop:

private static boolean contains(org.apache.logging.log4j.Marker, org.apache.logging.log4j.Marker...);
Code:
0: iconst_0
1: istore_2
2: aload_1
3: arraylength
4: istore_3
5: iload_2
6: iload_3
7: if_icmpge 29
10: aload_1
11: iload_2
12: aaload
13: astore 4
15: aload 4
17: aload_0
18: if_acmpne 23
21: iconst_1
22: ireturn
23: iinc 2, 1
26: goto 5
29: iconst_0
30: ireturn


With for-each:

private static boolean contains(org.apache.logging.log4j.Marker, org.apache.logging.log4j.Marker...);
Code:
0: aload_1
1: astore_2
2: aload_2
3: arraylength
4: istore_3
5: iconst_0
6: istore 4
8: iload 4
10: iload_3
11: if_icmpge 34
14: aload_2
15: iload 4
17: aaload
18: astore 5
20: aload 5
22: aload_0
23: if_acmpne 28
26: iconst_1
27: ireturn
28: iinc 4, 1
31: goto 8
34: iconst_0
35: ireturn


On Fri, Sep 26, 2014 at 4:27 PM, Remko Popma <***@gmail.com> wrote:
Nope. Maybe I'll get around that next week. If you have time to do that, please share!

On Fri, Sep 26, 2014 at 11:23 PM, Mikael Ståldal <***@appearnetworks.com> wrote:
Have you compared the generated byte code (using javap -c) for the two cases?

On Fri, Sep 26, 2014 at 3:43 PM, Remko Popma <***@gmail.com> wrote:
On Windows it looks like normal for loops are slightly faster than for-each loops, especially for small arrays of primitives. This could be noise, since we are talking about 5 nanoseconds where the baseline (an empty method invocation) is 12 nanos.

On Solaris 10 and Red Hat Enterprise Linux the baseline is so large (255 nanos and 1910 nanos respectively) that any difference we are seeing is just noise.

All benchmarks were run with one fork, one thread, 10 warmup iterations and 10 test iterations.

Windows 7 (64bit) with Java 1.8.0_05, 2-core Intel i5-3317u CPU @1.70Ghz with hyperthreading switched on (4 virtual cores)
Benchmark Mode Samples Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 154947 12.432 0.550 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 126597 2759.592 3.431 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 126494 2761.729 3.127 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 154124 292.880 1.065 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 156155 288.751 1.101 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 191980 41.826 0.870 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 193770 36.894 0.782 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 190847 22.393 0.618 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 192552 17.146 0.560 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 173839 31959.057 14.341 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 171137 32461.985 14.353 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 97495 3591.200 4.852 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 101560 3445.998 4.010 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 102796 438.207 1.923 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 102333 439.576 2.139 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 113924 58.957 1.247 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 120416 60.712 1.284 ns/op


// For loops for Object arrays are similar but return the total XOR of the element hashcodes.

private int forEachLoop(final int[] array) {
int result = 0;
for (final int element : array) {
result ^= element;
}
return result;
}

private int forLoop(final int[] array) {
int result = 0;
for (int i = 0; i < array.length; i++) {
result ^= array[i];
}
return result;
}



Solaris 10 (64bit) with JDK1.7.0_06-b24 (Oracle Hotspot), 2 quad-core Xeon X5570 dual CPUs @2.93Ghz with hyperthreading switched on (16 virtual cores)
Benchmark Mode Samples Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 110212 255.300 0.201 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 187808 3938.055 1.207 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 187897 3937.929 0.748 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 123989 606.631 0.626 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 123973 609.565 0.416 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 126933 294.204 0.280 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 127070 296.411 0.223 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 113400 261.519 0.181 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 111637 260.435 0.115 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 115872 48154.673 18.846 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 116777 47793.868 17.615 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 138432 5256.767 2.451 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 136644 5325.377 2.388 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 166653 773.541 0.330 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 166570 774.513 0.574 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 178754 317.232 0.134 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 180165 316.189 0.238 ns/op

64 bit RHEL 6.5 (Linux 2.6.32-431.el6.x86_64) with JDK1.7.0_05-b06 (Oracle Hotspot), 4 quad-core Xeon X5570 CPUs @2.93GHz with hyperthreading switched on (16 virtual cores)
Benchmark Mode Samples Score Score error Units
o.a.l.l.p.j.LoopsBenchmark.baseline sample 114783 1910.576 29.256 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForEachLoop sample 194584 5132.885 25.137 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10000ForLoop sample 196672 4811.572 52.072 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForEachLoop sample 133119 1967.213 28.970 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray1000ForLoop sample 133804 2004.501 31.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForEachLoop sample 142439 1575.329 6.457 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray100ForLoop sample 142215 1957.714 27.815 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForEachLoop sample 130826 1980.301 30.818 ns/op
o.a.l.l.p.j.LoopsBenchmark.intArray10ForLoop sample 132654 1589.120 8.449 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForEachLoop sample 126947 43301.320 50.589 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10000ForLoop sample 126117 43574.129 55.272 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForEachLoop sample 143697 5831.250 19.667 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray1000ForLoop sample 163244 4823.096 13.180 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForEachLoop sample 162502 1930.819 24.136 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray100ForLoop sample 171619 1625.806 10.385 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForEachLoop sample 172226 1888.683 22.554 ns/op
o.a.l.l.p.j.LoopsBenchmark.objArray10ForLoop sample 188838 1581.979 6.322 ns/op




On Fri, Sep 26, 2014 at 2:56 AM, Matt Sicker <***@gmail.com> wrote:
The foreach over an array looks like it's supposed to compile to the same thing:

https://jcp.org/aboutJava/communityprocess/jsr/tiger/enhanced-for.html

Same goes for .length which is supposed to be a final field which would allow for inlining by the JIT I'd imagine (hence why we use final everywhere):

http://docs.oracle.com/javase/specs/jls/se7/html/jls-10.html#jls-10.7

On 25 September 2014 11:33, Gary Gregory <***@gmail.com> wrote:
On Thu, Sep 25, 2014 at 11:55 AM, Ralph Goers <***@dslextreme.com> wrote:
You can think that, but the testing in the testing I did at the time the difference was quite noticeable. I would have left it as a foreach if it wasn’t.

Ralph

On Sep 25, 2014, at 8:51 AM, Paul Benedict <***@apache.org> wrote:

I would be surprised if foreach over an array makes a speed difference. AFAIK, foreach is synatic sugar. There is no iterator for an array so it has to be desugared using a for/index loop like you have there. I don't think this code is saving anything.


Cheers,
Paul

On Thu, Sep 25, 2014 at 10:47 AM, Remko Popma <***@gmail.com> wrote:
Hm.. Why did I think it was configuration? I must have gotten mixed up with another commit email...
The class is MarkerManager in log4j-api.

On Fri, Sep 26, 2014 at 12:18 AM, Ralph Goers <***@dslextreme.com> wrote:
Configuration? If I recall correctly this method is called on every log event that contains a Marker. But I am just guessing since Gary neglected to say what class this is. But I do remember doing extensive testing when this code was written. And I also remember someone (probably Gary) mentioning then that it should use a for-loop and we had this same conversation then. I think that is why the comment was added.

Ralph

On Sep 24, 2014, at 9:10 PM, Remko Popma <***@gmail.com> wrote:


On 2014/09/25, at 12:46, Matt Sicker <***@gmail.com> wrote:

From what I remember, it had something to do with the incredibly large difference in speed between for loops and foreach loops on arrays. And by incredibly large, I mean most likely negligible.
:-)
I do remember reading that someone found a speed difference. But I've never verified it. (Note to self: write a quick jmh benchmark for this.)

I'd be curious to see the results!

Gary


On the other hand, this is configuration, so it only happens once and is very unlikely to be "hot" code so there is probably not much value in optimizing this loop.


On 24 September 2014 22:12, Gary Gregory <***@gmail.com> wrote:
Why does this "//noinspection ForLoopReplaceableByForEach" comment mean?

Why not for an enhanced for each loop?

private static boolean contains(final Marker parent, final Marker... localParents) {
//noinspection ForLoopReplaceableByForEach
for (int i = 0, localParentsLength = localParents.length; i < localParentsLength; i++) {
final Marker marker = localParents[i];
if (marker == parent) {
return true;
}
}
return false;
}

Thanks,
Gary

--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
JUnit in Action, Second Edition
Spring Batch in Action
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory



--
Matt Sicker <***@gmail.com>







--
E-Mail: ***@gmail.com | ***@apache.org
Java Persistence with Hibernate, Second Edition
JUnit in Action, Second Edition
Spring Batch in Action
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory



--
Matt Sicker <***@gmail.com>




--
Mikael Ståldal
Chief Software Architect
Appear
Phone: +46 8 545 91 572
Email: ***@appearnetworks.com




--
Mikael Ståldal
Chief Software Architect
Appear
Phone: +46 8 545 91 572
Email: ***@appearnetworks.com
Loading...