Appendix on GPT3 and Commonsense Reasoning

From Toward AGI

This article is an appendix to GPT3 and Commonsense Reasoning, or the main article. Here, in stead of make an inference in 8 commonsense reasoning dimensions, we shall focus only on the two hardest dimensions, causal and counterfactual inferences.

Note that, for compactness of presentation, this blog article uses lots of multi-tabs display and maybe not mobile-friendly. In case of using mobile phone, the readers are suggested to apply Desktop-version on the mobile browser menu.

High-Quality Prompts on Causal and Counterfactual Inferences

To encourage higher quality reasoning in this two important dimensions, we gave higher-quality inference prompt and go 3-shots instead of 2-shots. However, due to limit of 2048 tokens inference, we have to sacrifice the other 6 reasoning dimensions, and dedicate all prompt only for these 2 reasoning dimensions.

In this high-quality prompt, we design to emphasize "structure reasoning". On causal inference, we will hypothesize two cases call Case 1) and Case 2) where their initial conditions, Case 1.1) and Case 2.1) must be non-overlapped. This way we can see the model make a reasoning in two genuinely separated directions.

Moreover, on each case, we encourage a temporal connection, by using the number system like 1.1) 1.2) 1.3) etc. This encourages the model to understand better that each event has to follow the one before. This also partly prevents pre-matured ending since all reasoning-chain examples contain more than 4-reasoning in chain, i.e. all reasoning end after 1.4) or 2.4) event.

Counterfactual analysis follows similar notation by using A1) A2) … and B1) B2) … to indicate two different hypothesized counterfactual arguments. Where the two of them must begin with non-overlapped events.

Using this prompt design, the reasoning given by GPT-3 are much more better elaborated, but reasoning still have many flaws as discussed below. The 3-shots prompt given to this new setting are shown as followed:

   Alice and Elsa were running toward the finish line. However, as Alice somehow fell down to the ground, Elsa turned her back and went to help Alice. A teacher and friends also went to see what happen.

- Probable events before and after the story

Before the story, it may be the time for PE class for Elsa and Alice, so they should change uniforms for the class. Then, in the class, a teacher may randomly asked a pair of students to make a running competition.

After the strory, Case 1.1) Alice was seriously hurt 1.2) a teacher had to bring Alice to a hospital 1.3) a doctor asked Alice where did she feel hurt 1.4) the doctor examined Alice's conditions at the mentioned place 1.5) the doctor gave her some medicine and necessary treatment 1.6) Alice went back home to tell her parents what happen

Case 2.1) Alice was not seriously hurt 2.2) Alice might just take a rest while the teacher brought her to the first aid room 2.3) she had a basic treatment 2.4) she would be able to go back to the PE class 2.5) she could cheer on the other competitors 2.6) she could attend other classes until the school finished

- Analyze the interesting event in the story, if any, and hypothesize that the interesting event would not occur if

The interesting part was when Alice got fell down since normally people can continue to run if there is no abnormal conditions. Therefore, she might trip over stone or injured somewhere.

- Hypothesize Counterfactual Story : Alice was perfectly healthy, slept well and there were no stone on the race

Alternative A1) Elsa might have a little winning edge A2) Elsa won the race A3) Elsa got more points than Alice A4) Alice would promise to try harder next time A5) the next competition of other students began

Alternative B1) Alice was trying to use the superman-dive to win B2) she succeeded without injury this time B3) She just flew pass Elsa and the goal fist B4) everyone applaused for her great performance B5) Elsa and Alice shook their hands promised to rematch again.

   A man called his son and daughter the day before Christmas and said he and their mom were going to divorce. The son and daughter were hurry to go back home to stop their parents. The old man turned to his wife and said "they're coming for Christmas now"

- Probable events before and after the story

Before the story, dad and mom would talk about possibilities that the children would not come home since the children may be too busy at their works. But the parents thought that Christmas was more important to works. So they thought about a fake divorce plan.

After the story, at home, Case 1.1) the children felt very angry knowing that they were fooled. 1.2) they promised the parents they would never come back in Christmas again 1.3) the parents said sorry and explained they really missed the kids 1.4) the parents made the best dinner

Case 2.1) the children did not resent their parent 2.2) they understood the value of family reunion in this special time 2.3) the family help make the best party 2.4) They spent great time together.

- Analyze the interesting event in the story

The interesting part of the story was when dad happily spoke the truth that he tricked his children. This part is interesting because normally parents will not lie to their children unless something really important.

- Hypothesize Counterfactual Story : either dad really did not happy or dad told the divorce was true

Alternative A1) Dad felt guilty about lying to their children A2) dad called them back to tell the truth A3) the children got annoy at first A4) eventually they understood each other A5) the children still came back on Chirstmas

Alternative B1) Dad confirmed the truth of divorce B2) Children came back begging their parents to change their minds B3) the parents would not change their minds B4) the parents told them that even though the divorce would happen, they still loved the children anyway B5) this was not quite a happy Christmas for the family

   It was very exciting to arrive the legendary island where "Origin of Species" was inspired from. However, as Giulia was not well-prepared, she did not even know where should he sleep tonight! At least, she had $1000 which hopefully was enough.

- Probable events before and after the story

The story suggested that she was alone. Since Giulia was not well-prepared, it is possible that she went to other places, e.g. Santa Elena, nearby the island first. Then, she might just had a sudden thought that this place was not too far from Galapagos so it was worth a try. she contacted some local tourist for a ticket, but forgot about the hotel.

After the story Case 1.1) She somehow found a cheap hotel 1.2) she had enough money left so she hired a local guide 1.3) the guide brought her to many famous islands e.g. Floreana and Bartolome 1.4) she likely also met great animals like Galapagos Tortoises and Lava Lizards

Case 2.1) She could find only an expensive hotel left 2.2) She used most money for the hotel 2.3) Since she had not much budget, she decided to explore the travel by walking 2.4) she asked a lot of locals for great places nearby 2.5) she would find exotic animals if she were really lucky

- Analyze the interesting event in the story

The most interesting part is when she realized that she had no place to sleep tonight. Since every person has to find a safe and comfort place to take a rest especially at night. And since she had never been in the island before, it was exciting how would she find out the hotel.

- Hypothesize Counterfactual Story : she decided to sleep elsewhere

Alternative A1) She decided to sleep at the port A2) she bought a sleeping bag A3) she was able to sleep there and travel for a few days A4) a port officer found out and came to tell her that she could not sleep here

Alternative B1) She decided to search for a homestay B2) She walked every nearby villages to find out a comfortable place B3) with some luck, she should able to find a good local homestay B4) she would ask the house how to have a great travel here B5) she learned local tips and able to make her great adventure

We use add the Galapagos Travel story to be the 3rd shot example to show the model that it can use deep factoid knowledge like the names Galapagos and its sub islands. Note that in the main article, by investigating token logits, these deep factoids are never be among the top-10 logits, so it's likely that without this extra prompt, the model will be able to perform deep factoid reasoning by itself.

See also this notebook to generate the Appendix's 3-shots prompt ready to inference.

Test on Temperature Parameters

First, we focus on the Shakespere historical fiction used in the main article. To ensure that we cover correct GPT-3 best capability, here we use various Temperature setting varied from 0, 0.1, 0.2, …, 0.7, 0.8 and generate reasoning 2 times for each temperature. You can see the results Below:

   Story : Being William Shakespeare’s apprentice would be great if he weren’t always stealing your ideas and claiming them as his own. So, James write a brilliant satiric play exposing him. He loves it and takes it to the stage.

Below, the Temp 0.2-Generated Text 2 and Temp 0.3-Generated Text 1 show good-quality reasonings overall.

Generated Text 2 with Temperature 0.0 is better reasoning than what we have in the main article. After-story Case 1) appeared to follow the main story greatly with a contradictory in 1.2). Case 2) maybe improbable but not impossible, so we think it's acceptable story. Alternative A) is a nice counterfactual story. Only Alternative B) does not look sensible from B3) onward. With bad-luck token sampling, the model wrongly states that Shakespere was an actor. Overall, it's much better than the reasoning given in the main article.

While at Temperature 0, GPT-3 should perform deterministic generation, but perhaps a technical implementation in GPT-3 did not make it really determisnistic. Therefore, the two generated texts are different, starting diverge in the middle of the texts, where Generated Text 1 repeats many sentences and become rubbish.


The two generated texts here are both not high quality. We can see that the after-story reasonings often do not follow the given main story. Sometimes, the after-story reasoning given by GPT-3 are actually counterfactual (not causal inference from the main story). We can also see some contradcited sentences in counterfactual stories.


Here, the Generated Text 2 is quite good overall. After-story Case 1) is OK (implied that the satire was a success) while Case 2) flawly acted as a counterfactual reasoning. Both hypothesized Alternatives A) and B) are acceptable. Story B) is a bit strange, but plausible.

Generated Text 1 in contrast adds an extra piece to the main story about a greedy king story. Here, GPT-3 probably tried to imitate our Story Prompt3 where we hypothesize more details in the before-story. After-story Case 1) is quite strange while Case 2) does not follow the main story about the satiric play. Both counterfactual stories are acceptable though.


Here, the Generated Text 1, both after-stories look OK. To interpret both Case 1) and 2), we have to assume that the satiric play scheme was success. This kind of extra-assumption requirement is not perfect, but may be acceptable and not make the reasoning a total flaw. From 2.2) onward, to make sense, 'he' must mean William Shakespere and not James. Or otherwise, the Case 2) is just another counterfactual story.

Both counterfactual stories A) and B) are good. In fact, the Story B) could also be a perfect after-story to the main story.

In contrast, Generated Text 2 are not of good quality. We can see a lot of generated sentences that contradicts the main story as well as contradicts each other.


Reasoning Results on Other Stories

Following the previous section, we found that Temperature 0.3 gave most reasonable reasoning. Therefore, we apply this temperature to the remaining 6 stories (from original 8 stories, one, Galapagos Travel, was added as a prompt, and another one, Shakespere, was analyzed in the previous section). In this appendix, we generate two reasoning for each story, and do not manually analyze which one is the best.

Overall, the reasoning quality in the focus two dimensions are considerably better than the shorter prompt analyzed in the main article. For example, in the comedy (Punch Set) story, the model seems to understand complex coreferences almost perfectly on both generated texts whereas previously in the main article, the model got confused on coreferences on all generated texts. This may be an evidence that GPT-3 could still do better commonsense reasoning if we could provided even higher-quality prompt.

In this Biography genre, there is one essence of this story which we would expect the model to know and reason:

  • There's a hint on Alain Bombard name that this person is real, and the model should retrieve all the related events
  • The after-story is interesting. What will he do about his success ?
   On the contrary to his colleagues believes, Alain Bombard thought that people could stay alive in the sea by drinking sea water and eating small fish and plants from the sea. He set out in a small boat to cross the Atlantic Ocean. He was able to stay alive for 65 days before finishing the journey.
Alain Bombard

Following is the essence of this story which we would expect the model to know and reason:

  • The most interesting part is in the last sentence where we expect them to do something to gain more resources.
  • Before the story, somehow these Aliens had to abandon their planet.
  • After the story, what should the Alien do to get more resources?
  • We can safely assume that the Alien possessed high technology e.g. spaceship and cloaking machine.
  • From the narrative, Alien, at least initially, had been peaceful and had no bad intention, so they did not want to contact human.
  • There is a small hint that Alien island is in South Pacific, so it should be a non-habitant island similar to Henderson island.
   Alien race seeking refuge landed on earth on a small island in the south pacific. For a hundred years they've managed to keep the island cloaked and secret from our human population. But now they've exhausted the resources.

Following is the essence of this story which we would expect the model to know and reason:

  • The most interesting part should be in the last sentence since it's not usual that people can participate at ATP master
  • It is highly likely that Ling is a professional tennis player capable of playing at the ATP master
  • From the names of the two characters, it is possible (though not necessary) that this story happen in China, so in that case the competition has to be Beijing ATP master
  • It is clear that Ling felt somewhat hopeful to get the new racket provided the fact that the big-box store sell everything
  • However, in the story Ling felt frustrated / annoyed / angry that he could not get what he want
  • The after-story part is also interesting what Ling should do to get his needed racket
   Ling went to a big-box store selling everything on the planet to buy his favorite tennis racket. But a staff named Xin said that the store would not sell the racket since it's defective. Ling complained that he has a ATP master to participate tomorrow and he needed the racket now.
Shopping at Big-box Store

Following is the essence of this story which we would expect the model to know and reason:

  • The most interesting part of the story is of course when one of the lilies start talking.
  • It's truly exciting to know what is "dark secret", how could the lilies be able to talk, etc.
  • There's not much other hints in the main story, so the ToU questions may not be too difficult
   As a new job for a prominent wealthy family, one of Chandra's first task is to water all of the house plants. While Chandra is watering the lilies, one of the plants starts talking to warn him of a dark family secret.
Mysterious House

This pandemic story is unknown to GPT-3 which limited its training data to 2019. Following is the essence of this story which we would expect the model to know and reason:

  • Two most interesting parts of the story are "Coronavirus spreading everywhere and killing millions" [global information] and "work extremely hard on the mRNA vaccine" [local information],

so this narrative is quite difficult that the model would have to reason on these two scales altogether.

  • On the global scale, it would be great if the model predict the following.
    • before the story, all people around the world would live normally
    • after this story, before vaccine get invented, more people will die, and there's highly possibility of economic crisis and other catastrophic consequences.
  • On the local scale, we expect the following.
    • On the factoid part, it will be best if the model know the name of Uğur Şahin who has been CEO of BioNTech and reponsible for the vaccine Pfizer-BioNTech in the real world.
    • Therefore, in the best case it would be able to infer the location of the company, and interesting facts on vaccine or mRNA technology.
    • Since millions of people are dying everywhere, it is obvious the the major emotions of every people included fear, desperate and sad.
    • The after story should relate to whether the vaccine is success or not.
   In 2020, Coronavirus surprises everybody by spreading everywhere, killing millions of people and turn off most world travels. Uğur Şahin told all staffs in his company to work extremely hard on their mRNA vaccine research before situations got worse.
CoronaVirus Pandemic - Unknown to GPT-3 having latest data on 2019

This is a very difficult narrative. Following is the essence of this story which we would expect the model to know and reason:

  • The most interesting sentence is when Praew told Eriko in the last sentence.
  • The punch set was given to Eriko by Praew years ago, then Eriko forgot that Praew gave her, so she gave Praew back.
    • I.e. Praew --> Eriko --> Praew is the possession flow of this punch set.
    • Grammatically, this is ambiguous due to the coreference she gave her in the last sentence.
  • It is not polite, socially, to bring the gift back to the original giver
  • Eriko must immediately have some excuses for Praew after the given story.
  • They both must feel somewhat awkward feeling on the last sentence of the story.
  • Since the gift was given to Praew in the story, the event might take place at either Praew's wedding party or Praew's house
  • Also, Praew's role as newly wed bride should be emphasized, and Eriko must be her guest or even best friend.
   Eriko never used a crystal punch set she got as a wedding gift. When Praew got married, Eriko wrapped the set as her gift. When Praew opened the gift, she looked curiously and told Eriko it was the same punch set she gave her Years ago.
A Crystal Punch Set


To summarize, we get quite much better, more deduction detailed, reasoning compared to the main article. Nevertheless, we can still inconsistency or contradictory sentences here and there, especially in the counterfactual analysis. As mentioned in the main article, we suspect that one part is due to a random-token sampling paradigm employed in text generation. Another part is because GPT-3 learned sentence correlation rather than the sensibly deducted sentence from the current pretraining paradigmm. A sensibly deducted sentence is a sentence that is caused or enabled by the events explained in previous sentences.

We may able to reduce this kind of inconsistency provided more-shots and even higher-quality examples. Therefore, it is very interesting to see the power of this GPT-3 if we can break the 2048-tokens limitation.