Instruction format naming convention

WillTrojak · January 29, 2025, 10:19am

I’ve been doing some work on a domain-specific compiler for Power, and a minor issue has been that there are multiple subtypes for each instruction format.

For example, in 3.1c there are 4 A-form formats.

I wondered if people would be open to a more concrete name scheme for these sub-forms? This would mean forms are unambiguous but also immutable in future ISAs, with new sub-forms just appended.

As an example, in the z/Architecture ISA sub-forms are given a letter suffix to differentiate them (most easily seen in SA22-7871-10).

Will

Brad · January 29, 2025, 3:35pm

Welcome, Will!

If you’d like to give a more specific pointer to the z arch, I’ll take a look, but I’m not sure it’s necessary for the discussion.

I have no big problem with your suggestion, as far as it goes, but can imagine that the next discussion on the subject will be “the sub-forms are not in suffix order in the document, please fix.” That one I do have a problem with, because there’s a convention for how the sub-forms are illustrated. We could come up with a very complicated suffix structure to solve that problem, but the solution would be pretty baroque. If we go in with the understanding that the answer to the second suggestion is “no,” then maybe we can do this.

Is there a reason the sub-form needs to be actual architecture? There are a lot of standardizations defined outside of the architecture for toolchain and other uses. Why can’t this be handled that way?

WillTrojak · January 30, 2025, 10:00am

Here’s an example from z:

For z, the convention seems to be that a new sub-form name is based on appending to the existing list for the form. To me this seems like a reasonable option for backwards compatibility.

As far as I can see, I don’t see what the current convention is for Power.

What I’m building generates the opcodes itself, as this is faster than relying on another tool when you know exactly what you want to generate. However, you need a function that takes the instruction and mnemonic inputs and produces the 32/64-bit opcode based on the form. What has been challenging is what to name the various forms. Take the X-forms in 3.1C, with the C amendment forms were added in the middle of the list in the ISA, meaning a simple numeric system isn’t currently reliable. So far, I’ve come up with additional sub-form names, but this isn’t ideal and will probably get very confusing in time.

If you have some pointers to how this has been standardised outside of the ISA that would be great.

segher.ibm · January 30, 2025, 1:44pm

Hi!

WillTrojak January 29 I’ve been doing some work on a domain-specific compiler for Power, and a minor issue has been that there are multiple subtypes for each instruction format. For example, in 3. 1c there are 4 A-form formats. I wondered if

WillTrojak
January 29

I’ve been doing some work on a domain-specific compiler for Power, and a minor issue has been that there are multiple subtypes for each instruction format.

For example, in 3.1c there are 4 A-form formats.

Five.

I wondered if people would be open to a more concrete name scheme for these sub-forms? This would mean forms are unambiguous but also immutable in future ISAs, with new sub-forms just appended.

In most cases it is just the labels of the fields that change. They aren’t really different forms. One exception (maybe the only one?) is M-form, where the variant with RB is very different from the one with the immediate SH field.

If your assembler has problems with the names of the instruction forms, just don’t use them at all?

Segher
Unless otherwise stated above:

IBM Nederland B.V.
Gevestigd te Amsterdam
Inschrijving Handelsregister Amsterdam Nr. 33054214

segher.ibm · January 30, 2025, 1:54pm

Oh, btw, maybe it would be good if the description of the forms said what they are about? It would help people new to it, new to the names

A-form: operations with four register arguments, like isel and many floating point instructions (most have some of those register fields unused)

Something like that? The ISA doc is a reference, not a “learn yourself Power” book, but it can be more helpful in places

Segher
Unless otherwise stated above:

IBM Nederland B.V.
Gevestigd te Amsterdam
Inschrijving Handelsregister Amsterdam Nr. 33054214

WillTrojak · January 30, 2025, 2:17pm

Sure, a lot of the forms are the same, with just some changes in the names of input. For example, the x-form many forms can be handled as a generic “5,5,5” and “5,5,5,1”.

How would you suggest handling the instruction construction instead of having a function for each form?

What I have done so far is pull out the few hundred relevant instructions from the ISA and label the forms with custom sub-forms. (Really just the instructions to support matmuls using VSX and MMA).
Based on the ISA, I can then find a mask of unused/input bits across all these instructions (ie which bits are not used by the opcode). I then encode a form ID in these bits. At generation time, I can filter this ID from the instruction and then use an abstracted function to build the instruction for each form.

Brad · January 30, 2025, 7:23pm

Thanks for the z examples!

Unfortunately, where you want to go is where I wrote above that we do not want to go because it destroys the existing presentation. The presentation prioritizes leading reserved fields and subfields, fields chopped up more finely over less finely, and then goes alphabetically for cases that use all the same bits. Because of that, it is typical to insert new sub-forms in the middle of the tables.

I hope you and Segher can find a different approach that you’ll be happy with.

Brad · January 30, 2025, 10:03pm

Forgot to answer the last question.

No, I don’t know of THIS problem having been solved outside the ISA. I just meant that we have a lot of things like linkage conventions that are managed outside of the ISA. A list of subfeatures could be maintained outside of the ISA, reordered to be in chronological order so that the suffix would have a natural/easy correspondence. It would be very easy to maintain, just a question of the scope of interest and who would maintain it.