Inline ASM "x" constraint question

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Inline ASM "x" constraint question

Rodrigo Hernandez
Hi,
I am having trouble getting my inline asm SSE code right (everything
seems in order, but runtime
results are skewed), so I took a look at the GCC manual here:

http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Machine-Constraints.html#Machine-Constraints

seems like I should be able to use the "x" constraint to specify a xmm
register and turn this:
asm
(
       "movups %0,%%xmm0\n\t"
       "movups %1,%%xmm1\n\t"
       "addps %%xmm1,%%xmm0\n\t"
         "movups %%xmm0,%2\n\t"
         :
         : "m" (var1[0]), "m" (var2[0]),"m"(result[0])
         : "%xmm0", "%xmm1"
);

into this:

asm
(
         "addps %2,%1\n\t"
         "movups %1,%0\n\t"
         : "=x"(result[0])
         : "x" (var1[0]), "x" (var2[0])
);

but I get an "impossible constraint in `asm'" error, now the variables
here are 16 bit aligned float[4] arrays, my guess is that "x" is
expecting a 128 bit type, I dont know what this type is.

I looked for examples and found none, perhaps the constraint is too new,
so does anyone actually know how to use it?

Thanks.




-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy.  
Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Inline ASM "x" constraint question

Ross Ridge
> I am having trouble getting my inline asm SSE code right ...

There a couple of a problems with your inline asm code, but the problem
you've noticed is that you need to use an appropriate vector type
with the "x" constraint.  Your code needs to look like this:


        typedef float __attribute__((vector_size(16))) V4SF;

        void
        foo(float const *arg1, float const *arg2, float *ret) {
                V4SF v1, v2, r;
                asm("movups %1,%0" : "=x" (v1) : "m" (*arg1));
                asm("movups %1,%0" : "=x" (v2) : "m" (*arg2));
                asm("addps %1,%0" : "=x" (r) : "xm" (v1), "0" (v2));
                asm("movups %1,%0" : "=m" (ret) : "x" (r));
        }


However, since your inline assemembly has other problems, you might want
to use Intel C/C++ style intrinsics.  That way you don't have to worry
about getting the constraints right:

        #include <xmmintrin.h>

        void
        foo2(float const *arg1, float const *arg2, float *ret) {
                __m128 r = _mm_add_ps(_mm_loadu_ps(arg1), _mm_loadu_ps(arg2));
                _mm_storeu_ps(ret, r);
        }

Note, because you're using unaligned float arrays you can't just cast
them to either of the vector types (V4SF or __m128).  Using casts will
generate instructions that assume your float arrays are properly aligned.

                                                Ross Ridge

--
 l/  //  Ross Ridge -- The Great HTMU
[oo][oo]  [hidden email]
-()-/()/  http://www.csclub.uwaterloo.ca/u/rridge/ 
 db  //  


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: Inline ASM "x" constraint question

Rodrigo Hernandez
Hi Ross,

Thanks for your response, I figured most of that, plus, you need to pass
-march=pentium3 to the
compiler in order for it to recognize the xmm registers.

In theory I am using 16 bit aligned float vectors ( typedef
__declspec(align(16)) float xmmfloat; ), however in practice I am
getting some (6 or so out of hundreads) vectors 8 bit aligned which
causes my code to segfault, I am using movups now, but I am curious as
to why are those few vectors (as in geometric vectors) created with a
different alignment.

Also my cpuid calls were all wrong, I wasnt clovering all the regsiters
in the asm call and it seemed to trash the EBP resgister, which ended
into a segfault after some cycles.

Anyway, I decided not to use the "X" constraint, and went back to a
modified version of my original code:

     asm
       (
        "lea %0,%%eax\n\t"
        "lea %1,%%ebx\n\t"
        "movups (%%eax),%%xmm0\n\t"
        "movups (%%ebx),%%xmm1\n\t"
        "addps %%xmm1,%%xmm0\n\t"
        "lea %2,%%eax\n\t"
        "movups %%xmm0,(%%eax)\n\t"
        :
        : "m" (v[0]), "m" (vVector.v[0]),"m"(result.v[0])
        : "%eax","%ebx","%xmm0","%xmm1","memory"
        );

This one works :)

/I will check whats with the wrongly aligned variables today./

Thanks again.


Ross Ridge wrote:

>>I am having trouble getting my inline asm SSE code right ...
>>    
>>
>
>There a couple of a problems with your inline asm code, but the problem
>you've noticed is that you need to use an appropriate vector type
>with the "x" constraint.  Your code needs to look like this:
>
>
> typedef float __attribute__((vector_size(16))) V4SF;
>
> void
> foo(float const *arg1, float const *arg2, float *ret) {
> V4SF v1, v2, r;
> asm("movups %1,%0" : "=x" (v1) : "m" (*arg1));
> asm("movups %1,%0" : "=x" (v2) : "m" (*arg2));
> asm("addps %1,%0" : "=x" (r) : "xm" (v1), "0" (v2));
> asm("movups %1,%0" : "=m" (ret) : "x" (r));
> }
>
>
>However, since your inline assemembly has other problems, you might want
>to use Intel C/C++ style intrinsics.  That way you don't have to worry
>about getting the constraints right:
>
> #include <xmmintrin.h>
>
> void
> foo2(float const *arg1, float const *arg2, float *ret) {
> __m128 r = _mm_add_ps(_mm_loadu_ps(arg1), _mm_loadu_ps(arg2));
> _mm_storeu_ps(ret, r);
> }
>
>Note, because you're using unaligned float arrays you can't just cast
>them to either of the vector types (V4SF or __m128).  Using casts will
>generate instructions that assume your float arrays are properly aligned.
>
> Ross Ridge
>
>  
>



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Inline ASM "x" constraint question

Ross Ridge
In reply to this post by Rodrigo Hernandez
>In theory I am using 16 bit aligned float vectors ( typedef
>__declspec(align(16)) float xmmfloat; ), however in practice I am getting
>some (6 or so out of hundreads) vectors 8 bit aligned which causes my
>code to segfault,

There are number of possible reasons why this might be.  One is that stack
may not be properly aligned.  While GCC keeps the stack 128-bit aligned,
and ensures the stack is 128-bit aligned in main, Windows itself only
keeps the stack 32-bit aligned.  If you have any callback functions
in your code (including any windows procedures), then your stack will
become misaligned.  The other problem is that the memory allocation
functions (malloc() and operator new()) in the MSVCRT.DLL C runtime
library MinGW uses only has 32-bit alignment.  You'll need to align
dynamically allocated memory yourself.  A third problem is that depending
on ther version of GCC and binutiles you're using statically allocated
constants may be put into sections that don't support 128-bit alignment.

>Anyway, I decided not to use the "X" constraint, and went back to a
>modified version of my original code:

Your inline assembly is both terribly inefficient, and contains errors.
It would be many times faster to just add the vectors element by
element. eg:

        result.v[0] = v[0] + vVector.v[0];
        result.v[1] = v[1] + vVector.v[1];
        result.v[2] = v[2] + vVector.v[2];
        result.v[3] = v[3] + vVector.v[3];

Even the more efficient examples I gave in my previous message would be
slower than just adding one element at time.   It's not worth using SSE
instructions unless you're doing a series of operations.  If you really
think you can benefit from using SSE instructions, then don't use inline
assembly, use the intrinsics in <xmmintrin.h>.  Try to keep your vectors
in __m128 variables as much as possible.

                                                        Ross Ridge

--
 l/  //  Ross Ridge -- The Great HTMU
[oo][oo]  [hidden email]
-()-/()/  http://www.csclub.uwaterloo.ca/u/rridge/ 
 db  //  


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: Inline ASM "x" constraint question

Rodrigo Hernandez
Ross Ridge wrote:

>There are number of possible reasons why this might be.  One is that stack
>may not be properly aligned.  While GCC keeps the stack 128-bit aligned,
>and ensures the stack is 128-bit aligned in main, Windows itself only
>keeps the stack 32-bit aligned.  If you have any callback functions
>in your code (including any windows procedures), then your stack will
>become misaligned.  The other problem is that the memory allocation
>functions (malloc() and operator new()) in the MSVCRT.DLL C runtime
>library MinGW uses only has 32-bit alignment.  You'll need to align
>dynamically allocated memory yourself.  A third problem is that depending
>on ther version of GCC and binutiles you're using statically allocated
>constants may be put into sections that don't support 128-bit alignment.
>
>  
>
I get the same result in Linux though, I found out it had to do with
dynamic allocation,
as most of the missaligned variables came from new calls, I wrote a test
case if anyone is interested, currently on Linux, all variables pushed
into the vector return 8, is there a way tell GCC to align the class to
a 16 byte boundary?:

Thanks.

#include <stdio.h>
#include <vector>
#include <math.h>

typedef float xmmfloat __attribute__ ((aligned (16)));
const int X=0;
const int Y=1;
const int Z=2;
const int W=3;

class CVector3
{
public:
  CVector3()
  {
    fprintf(stdout,"Packing: %d\n",((unsigned int)v)%16);
  };
  CVector3(const CVector3& vVector)
  {
    fprintf(stdout,"Packing: %d\n",((unsigned int)v)%16);
    v[X]=(vVector.v[X]);
    v[Y]=(vVector.v[Y]);
    v[Z]=(vVector.v[Z]);
    v[W]=(vVector.v[W]);
  };
  xmmfloat v[4];
};

int main()
{
  CVector3 v1;
  CVector3 v2;
  CVector3 v3;
  CVector3 v4;
  std::vector<CVector3> v;
  v.push_back(v1);
  v.push_back(v2);
  v.push_back(v3);
  v.push_back(v4);
  for(std::vector<CVector3>::iterator i=v.begin();
      i!=v.end();++i)
    {
      fprintf(stdout,"STD Vector Packing: %d\n",((unsigned
int)i->v)%16);    
    }
  return 0;
}


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

RE: Re: Inline ASM "x" constraint question

Danny Smith

 Rodrigo Hernandez wrote:
>


 is there a way tell GCC to align
> the class to
> a 16 byte boundary?:
>

> Thanks.
>
> #include <stdio.h>
> #include <vector>
> #include <math.h>


// Override operator new with aligned version:
// (This requires gcc  with mm_malloc support.  Otherwise use the
aligned_malloc  in mingw's malloc.h)  
#include <mm_malloc.h>
inline void* operator new (std::size_t sz)
{
  // fixme: throw bad_alloc on error
  void *p = (void *) _mm_malloc (sz, 16);
  return p;
}


>
> typedef float xmmfloat __attribute__ ((aligned (16)));
> const int X=0;
> const int Y=1;
> const int Z=2;
> const int W=3;


....

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.7.5/18 - Release Date: 6/15/2005
 



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: Inline ASM "x" constraint question

Rodrigo Hernandez

Thanks Danny, I guess I will have to join the masses building GCC 4.0
for MinGW,
since I need the code to be cross-platform, and wasn't able to find a
substitute for gcc 3.4.3
Linux. :)

Danny Smith wrote:

>// Override operator new with aligned version:
>// (This requires gcc  with mm_malloc support.  Otherwise use the
>aligned_malloc  in mingw's malloc.h)  
>#include <mm_malloc.h>
>inline void* operator new (std::size_t sz)
>{
>  // fixme: throw bad_alloc on error
>  void *p = (void *) _mm_malloc (sz, 16);
>  return p;
>}
>  
>



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

RE: Re: Inline ASM "x" constraint question

Danny Smith


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of
> Rodrigo Hernandez
> Sent: Friday, June 17, 2005 1:44 AM
> To: [hidden email]
> Subject: Re: [Mingw-users] Re: Inline ASM "x" constraint question
>
>
>
> Thanks Danny, I guess I will have to join the masses building GCC 4.0
> for MinGW,
> since I need the code to be cross-platform, and wasn't able to find a
> substitute for gcc 3.4.3
> Linux. :)

mm_malloc.h is included in the mingw distro of gcc-3.4.4 (as a backport
from trunk).

>
> Danny Smith wrote:
>
> >// Override operator new with aligned version:
> >// (This requires gcc  with mm_malloc support.  Otherwise use the
> >aligned_malloc  in mingw's malloc.h)
> >#include <mm_malloc.h>
> >inline void* operator new (std::size_t sz)
> >{
> >  // fixme: throw bad_alloc on error
> >  void *p = (void *) _mm_malloc (sz, 16);
> >  return p;
> >}
> >  

Oops I forgot to add that you also need to override operator delete
(with a wrapper for _mm_free) as well

Danny

> >
>
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration
> Strategies from IBM. Find simple to follow Roadmaps,
> straightforward articles, informative Webcasts and more! Get
> everything you need to get up to speed, fast.
> http://ads.osdn.com/?ad_id=7477&alloc_id=16492> &op=click
>
> _______________________________________________
>
> MinGW-users mailing list
> [hidden email]
>
> You may change your MinGW Account Options or unsubscribe at:
> https://lists.sourceforge.net/lists/listinfo/m> ingw-users
>
> --
>
> No virus found in this incoming message.
>
> Checked by AVG Anti-Virus.
> Version: 7.0.323 / Virus Database: 267.7.5/18 - Release Date:
> 6/15/2005
>  
>

--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.323 / Virus Database: 267.7.5/18 - Release Date: 6/15/2005
 



-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users
Reply | Threaded
Open this post in threaded view
|

Re: Re: Inline ASM "x" constraint question

Rodrigo Hernandez
Danny Smith wrote:

>mm_malloc.h is included in the mingw distro of gcc-3.4.4 (as a backport
>from trunk).
>
>  
>
I just cross canadian compiled 4.0.0, runs on wine, but I am yet to test
it on the real thing :)

>Oops I forgot to add that you also need to override operator delete
>(with a wrapper for _mm_free) as well
>  
>
Thanks, I was wondering about this too.


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
MinGW-users mailing list
[hidden email]

You may change your MinGW Account Options or unsubscribe at:
https://lists.sourceforge.net/lists/listinfo/mingw-users