Appendix on GPT3 and Commonsense Reasoning

This article is an appendix to GPT3 and Commonsense Reasoning, or the main article. Here, in stead of make an inference in 8 commonsense reasoning dimensions, we shall focus only on the two hardest dimensions, causal and counterfactual inferences.

Note that, for compactness of presentation, this blog article uses lots of multi-tabs display and maybe not mobile-friendly. In case of using mobile phone, the readers are suggested to apply Desktop-version on the mobile browser menu.

= High-Quality Prompts on Causal and Counterfactual Inferences =

To encourage higher quality reasoning in this two important dimensions, we gave higher-quality inference prompt and go 3-shots instead of 2-shots. However, due to limit of 2048 tokens inference, we have to sacrifice the other 6 reasoning dimensions, and dedicate all prompt only for these 2 reasoning dimensions.

In this high-quality prompt, we design to emphasize "structure reasoning". On causal inference, we will hypothesize two cases call Case 1) and Case 2) where their initial conditions, Case 1.1) and Case 2.1) must be non-overlapped. This way we can see the model make a reasoning in two genuinely separated directions.

Moreover, on each case, we encourage a temporal connection, by using the number system like 1.1) 1.2) 1.3) etc. This encourages the model to understand better that each event has to follow the one before. This also partly prevents pre-matured ending since all reasoning-chain examples contain more than 4-reasoning in chain, i.e. all reasoning end after 1.4) or 2.4) event.

Counterfactual analysis follows similar notation by using A1) A2) … and B1) B2) … to indicate two different hypothesized counterfactual arguments. Where the two of them must begin with non-overlapped events.

Using this prompt design, the reasoning given by GPT-3 are much more better elaborated, but reasoning still have many flaws as discussed below. The 3-shots prompt given to this new setting are shown as followed:

 Alice and Elsa were running toward the finish line. However, as Alice somehow fell down to the ground, Elsa turned her back and went to help Alice. A teacher and friends also went to see what happen. - Probable events before and after the story

Before the story, it may be the time for PE class for Elsa and Alice, so they should change uniforms for the class. Then, in the class, a teacher may randomly asked a pair of students to make a running competition.

After the strory, Case 1.1) Alice was seriously hurt 1.2) a teacher had to bring Alice to a hospital 1.3) a doctor asked Alice where did she feel hurt 1.4) the doctor examined Alice's conditions at the mentioned place 1.5) the doctor gave her some medicine and necessary treatment 1.6) Alice went back home to tell her parents what happen

Case 2.1) Alice was not seriously hurt 2.2) Alice might just take a rest while the teacher brought her to the first aid room 2.3) she had a basic treatment 2.4) she would be able to go back to the PE class 2.5) she could cheer on the other competitors 2.6) she could attend other classes until the school finished

- Analyze the interesting event in the story, if any, and hypothesize that the interesting event would not occur if

The interesting part was when Alice got fell down since normally people can continue to run if there is no abnormal conditions. Therefore, she might trip over stone or injured somewhere.

- Hypothesize Counterfactual Story : Alice was perfectly healthy, slept well and there were no stone on the race

Alternative A1) Elsa might have a little winning edge A2) Elsa won the race A3) Elsa got more points than Alice A4) Alice would promise to try harder next time A5) the next competition of other students began

Alternative B1) Alice was trying to use the superman-dive to win B2) she succeeded without injury this time B3) She just flew pass Elsa and the goal fist B4) everyone applaused for her great performance B5) Elsa and Alice shook their hands promised to rematch again.

 A man called his son and daughter the day before Christmas and said he and their mom were going to divorce. The son and daughter were hurry to go back home to stop their parents. The old man turned to his wife and said "they're coming for Christmas now"

- Probable events before and after the story

Before the story, dad and mom would talk about possibilities that the children would not come home since the children may be too busy at their works. But the parents thought that Christmas was more important to works. So they thought about a fake divorce plan.

After the story, at home, Case 1.1) the children felt very angry knowing that they were fooled. 1.2) they promised the parents they would never come back in Christmas again 1.3) the parents said sorry and explained they really missed the kids 1.4) the parents made the best dinner

Case 2.1) the children did not resent their parent 2.2) they understood the value of family reunion in this special time 2.3) the family help make the best party 2.4) They spent great time together.

- Analyze the interesting event in the story

The interesting part of the story was when dad happily spoke the truth that he tricked his children. This part is interesting because normally parents will not lie to their children unless something really important.

- Hypothesize Counterfactual Story : either dad really did not happy or dad told the divorce was true

Alternative A1) Dad felt guilty about lying to their children A2) dad called them back to tell the truth A3) the children got annoy at first A4) eventually they understood each other A5) the children still came back on Chirstmas

Alternative B1) Dad confirmed the truth of divorce B2) Children came back begging their parents to change their minds B3) the parents would not change their minds B4) the parents told them that even though the divorce would happen, they still loved the children anyway B5) this was not quite a happy Christmas for the family



It was very exciting to arrive the legendary island where "Origin of Species" was inspired from. However, as Giulia was not well-prepared, she did not even know where should he sleep tonight! At least, she had $1000 which hopefully was enough.

- Probable events before and after the story

The story suggested that she was alone. Since Giulia was not well-prepared, it is possible that she went to other places, e.g. Santa Elena, nearby the island first. Then, she might just had a sudden thought that this place was not too far from Galapagos so it was worth a try. she contacted some local tourist for a ticket, but forgot about the hotel.

After the story Case 1.1) She somehow found a cheap hotel 1.2) she had enough money left so she hired a local guide 1.3) the guide brought her to many famous islands e.g. Floreana and Bartolome 1.4) she likely also met great animals like Galapagos Tortoises and Lava Lizards

Case 2.1) She could find only an expensive hotel left 2.2) She used most money for the hotel 2.3) Since she had not much budget, she decided to explore the travel by walking 2.4) she asked a lot of locals for great places nearby 2.5) she would find exotic animals if she were really lucky

- Analyze the interesting event in the story

The most interesting part is when she realized that she had no place to sleep tonight. Since every person has to find a safe and comfort place to take a rest especially at night. And since she had never been in the island before, it was exciting how would she find out the hotel.

- Hypothesize Counterfactual Story : she decided to sleep elsewhere

Alternative A1) She decided to sleep at the port A2) she bought a sleeping bag A3) she was able to sleep there and travel for a few days A4) a port officer found out and came to tell her that she could not sleep here

Alternative B1) She decided to search for a homestay B2) She walked every nearby villages to find out a comfortable place B3) with some luck, she should able to find a good local homestay B4) she would ask the house how to have a great travel here B5) she learned local tips and able to make her great adventure

We use add the Galapagos Travel story to be the 3rd shot example to show the model that it can use deep factoid knowledge like the names Galapagos and its sub islands. Note that in the  main article, by investigating token logits, these deep factoids are never be among the top-10 logits, so it's likely that without this extra prompt, the model will be able to perform deep factoid reasoning by itself.

In this appendix we focus on the Shakespere historical fiction used in the  main article. To ensure that we cover correct GPT-3 best capability, here we use various Temperature setting varied from 0, 0.1, 0.2, …, 0.7, 0.8 and generate reasoning 2 times for each temperature. You can see the results Below:

Story : Being William Shakespeare’s apprentice would be great if he weren’t always stealing your ideas and claiming them as his own. So, James write a brilliant satiric play exposing him. He loves it and takes it to the stage.

Below, the Temp 0.2-Generated Text 2 and Temp 0.3-Generated Text 1 show good-quality reasonings overall.

 Generated Text 2 with Temperature 0.0 is better reasoning than what we have in the  main article. After-story Case 1) appeared to follow the main story greatly with a contradictory in 1.2). Case 2) maybe improbable but not impossible, so we think it's acceptable story. Alternative A) is a nice counterfactual story. Only Alternative B) does not look sensible from B3) onward. With bad-luck token sampling, the model wrongly states that Shakespere was an actor. Overall, it's much better than the reasoning given in the main article.

While at Temperature 0, GPT-3 should perform deterministic generation, but perhaps a technical implementation in GPT-3 did not make it really determisnistic. Therefore, the two generated texts are different, starting diverge in the middle of the texts, where Generated Text 1 repeats many sentences and become rubbish.


 * -| Generated Text 1=


 * -|Generated Text 2=



The two generated texts here are both not high quality. We can see that the after-story reasonings often do not follow the given main story. Sometimes, the after-story reasoning given by GPT-3 are actually counterfactual (not causal inference from the main story). We can also see some contradcited sentences in counterfactual stories.


 * -| Generated Text 1=


 * -|Generated Text 2=



Here, the Generated Text 2 is quite good overall. After-story Case 1) is OK (implied that the satire was a success) while Case 2) flawly acted as a counterfactual reasoning. Both hypothesized Alternatives A) and B) are acceptable. Story B) is a bit strange, but plausible.

Generated Text 1 in contrast adds an extra piece to the main story about a greedy king story. Here, GPT-3 probably tried to imitate our Story Prompt3 where we hypothesize more details in the before-story. After-story Case 1) is quite strange while Case 2) does not follow the main story about the satiric play. Both counterfactual stories are acceptable though.
 * -| Generated Text 1=


 * -|Generated Text 2=



Here, the Generated Text 1, both after-stories look OK. To interpret both Case 1) and 2), we have to assume that the satiric play scheme was success. This kind of extra-assumption requirement is not perfect, but may be acceptable and not make the reasoning a total flaw. From 2.2) onward, to make sense, 'he' must mean William Shakespere and not James. Or otherwise, the Case 2) is just another counterfactual story.

Both counterfactual stories A) and B) are good. In fact, the Story B) could also be a perfect after-story to the main story.

In contrast, Generated Text 2 are not of good quality. We can see a lot of generated sentences that contradicts the main story as well as contradicts each other.


 * -| Generated Text 1=


 * -|Generated Text 2=

= Summary =

To summarize, we get quite much better, more deduction detailed, reasoning compared to the main article. Nevertheless, we can still inconsistency or contradictory sentences here and there. As mentioned in the  main article, we suspect that one part is due to a random-token sampling paradigm employed in text generation. Another part is because GPT-3 learned sentence correlation rather than the sensibly deducted sentence from the current pretraining paradigmm. A sensibly deducted sentence is a sentence that is caused or enabled by the events explained in previous sentences.

We may able to reduce this kind of inconsistency provided more-shots and even higher-quality examples. Therefore, it is very interesting to see the power of this GPT-3 if we can break the 2048-tokens limitation.