How to Build Your Own Blockchain Part 3 — Writing Nodes that Mine and Talk

Hello all and welcome to Part 3 of building the JackBlockChain — JBC. Quick past intro, in Part 1 I coded and went over the top level math and requirements for a single node to mine its own blockchain; I create new blocks that have the valid information, save them to a folder, and then start mining a new block. Part 2 covered having multiple nodes and them having the ability to sync. If node 1 was doing the mining on its own and node 2 wanted to grab node 1’s blockchain, it can now do so.

For Part 3, read the TL;DR right below to see what we got going for us. And then read the rest of the post to get a (hopefully) great sense of how this happened.

Other Posts in This Series

TL;DR

Nodes will compete to see who gets credit for mining a block. It’s a race! To do this, we’re adjusting mine.py to check if we have a valid block by only checking a section of nonce values rather than all the nonces until a match. Then APScheduler will handle running the mining jobs with the different nonce ranges. We shift the mining to the background if we want node.py to mine as well as being a Flask web service. By the end, we can have different nodes that are competing for first mining and broadcasting their mined blocks!

Before we start, here’s the code on Github if you want to checkout the whole thing. There are code segments on here to illustrate about what I did, but if you want to see the entire code, look there. The code works for me, but I’m also working on cleaning everything up, and writing a usable README so people can clone and run it themselves. Twitter, and contact if you want to get in contact.

Mining with APScheduler and Mining Again

The first step here is to adjust mining to have the ability to stop if a different node has found the block with the index that it’s working on. From Part 1, the mining is a while loop which will only break whenz it finds a valid nonce. We need the ability to stop the mining if we’re notified of a different node’s success.

I’m not going to lie here, it took awhile for me to figure out the best way to do this. Read the list below for all my different thoughts and why they didn’t work out.

  1. Run on APScheduler, and when our Flask app gets hit with a block from a different node, stop the mining job for that index, and start mining for index + 1. This seemed like the thing to do, but I found out that the APScheduler remove_job() function doesn’t work if the job has already been taken off the queue and is running. So I can’t stop mining.
  2. Celery? Celery would definitely work and I looked into it, but frankly, the amount of code and config required to get started with a basic use case is pretty large. Check out the intro page and you can see what I’m talking about. Celery would definitely be an option, I’m not trying to say it’s not worth it, but in this case, it has too much overhead.
  3. Gevent? I’ve used Gevent in the past and am definitely a fan of using it for scraping where I grab a bunch of pages at once, and don’t have to wait for the request to return. Makes gathering pages much quicker. It can work in this case, but is mostly used for queues rather than single jobs. I’m not mining for a bunch of blocks at once, so don’t need to create multiple that Gevent would run. I only have one.
  4. rq? Nope, same issue for stopping jobs that are running.
  5. After doing some research of other implementations, like looking at the Python Ethereum libraries on Github, I saw that the way they mine is using this term rounds and starting_nonce. Instead of using an entire while loop that goes through nonces until finding a correct one, we only will check nonces in the values of [starting_nonce: starting_nonce+rounds]. If it’s successful, we move to the next block. If not, we run the job again with a different starting_nonce value.
  6.  When a mining job is called, it only checks a certain number of nonces at one time. If successful, it returns the valid nonce and hash. If not, it returns None. Combining that thought with APScheduler, it seems we have a deal.
  7. Then I realized that since we have these functions, I’d be able to use any of the libraries listed above to do the mining. If it lets you queue jobs, it’ll work when you split the mining into different parts. Actually, a good post would be to implement the mining using all the libraries that work. Let me know if you’d want that.

Below is the code that was rewritten in mine.py. The main attractions are as follows.

  • Since when you run mine.py you’re only wanting to do the mining, the APSchedule we’re going to use is the BlockingScheduler. Later you’ll see we use the BackgroundScheduler.
  • There are multiple functions below for mining with different starting levels. You can mine for a the next block on a chain, mine for the block after a block, or mine and attempt to find the valid nonce for a block you just generated.
  • Besides the scheduled job for mining the block, we also have a listener which checks the return values to see if we keep mining for the current block with the nonces in the range described in section 4 above, or go to the next block.
#mine.py

import apscheduler
from apscheduler.schedulers.blocking import BlockingScheduler

#if we're running mine.py, we don't want it in the background
#because the script would return after starting. So we want the
#BlockingScheduler to run the code.
sched = BlockingScheduler(standalone=True)

import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

STANDARD_ROUNDS = 100000

def mine_for_block(chain=None, rounds=STANDARD_ROUNDS, start_nonce=0):
  if not chain:
    chain = sync.sync_local() #gather last node

  prev_block = chain.most_recent_block()
  return mine_from_prev_block(prev_block, rounds=rounds, start_nonce=start_nonce)

def mine_from_prev_block(prev_block, rounds=STANDARD_ROUNDS, start_nonce=0):
  #create new block with correct
  new_block = utils.create_new_block_from_prev(prev_block=prev_block)
  return mine_block(new_block, rounds=rounds, start_nonce=start_nonce)

def mine_block(new_block, rounds=STANDARD_ROUNDS, start_nonce=0):
  #Attempting to find a valid nonce to match the required difficulty
  #of leading zeros. We're only going to try 1000
  nonce_range = [i+start_nonce for i in range(rounds)]
  for nonce in nonce_range:
    new_block.nonce = nonce
    new_block.update_self_hash()
    if str(new_block.hash[0:NUM_ZEROS]) == '0' * NUM_ZEROS:
      print "block %s mined. Nonce: %s" % (new_block.index, new_block.nonce)
      assert new_block.is_valid()
      return new_block, rounds, start_nonce

  #couldn't find a hash to work with, return rounds and start_nonce
  #as well so we can know what we tried
  return None, rounds, start_nonce

def mine_for_block_listener(event):
  new_block, rounds, start_nonce = event.retval
  #if didn't mine, new_block is None
  #we'd use rounds and start_nonce to know what the next
  #mining task should use
  if new_block:
    print "Mined a new block"
    new_block.self_save()
    sched.add_job(mine_from_prev_block, args=[new_block], kwargs={'rounds':STANDARD_ROUNDS, 'start_nonce':0}, id='mine_for_block') #add the block again
  else:
    print "No dice mining a new block. Restarting with different nonce range"
    sched.add_job(mine_for_block, kwargs={'rounds':rounds, 'start_nonce':start_nonce+rounds}, id='mine_for_block') #add the block again
sched.print_jobs()

if __name__ == '__main__':

  sched.add_job(mine_for_block, kwargs={'rounds':STANDARD_ROUNDS, 'start_nonce':0}, id='mine_for_block') #add the block again
  sched.add_listener(mine_for_block_listener, apscheduler.events.EVENT_JOB_EXECUTED)#, args=sched)
  sched.start()

When we run this, the node will mine successfully but in different jobs rather than only one. Great starting point.

Node Mining

The next part I want to add is the ability for the node.py Flask node to run the mining as well. Like I said above, running mine.py will only do the mining, but we’re going to need the mining to run in the background below the Flask node.  For this, we load the BackgroundScheduler, tell the imported mine that we’re using out scheduler instead of the one in that file, and then add the job and listener as before.

When we run this, we’ll see the output being the same, where we have a logging of the jobs being run, and also have the ability to go to /blockchain.json, reloading it, and see the new nodes as they come.

#node.py
.....
import mine
.....
from apscheduler.schedulers.background import BackgroundScheduler
sched = BackgroundScheduler(standalone=True)
.....

if __name__ == '__main__':

.....

  mine.sched = sched #to override the BlockingScheduler in this case, sched is the BackgroundSchedule
  sched.add_job(mine.mine_for_block, kwargs={'rounds':STANDARD_ROUNDS, 'start_nonce':0}, id='mine_for_block') #add the block again
  sched.add_listener(mine.mine_for_block_listener, apscheduler.events.EVENT_JOB_EXECUTED)
  sched.start()

  node.run(host='127.0.0.1', port=port)

Argparse

Intermission time! Previously, in order to see what port we want the node to run on, I simply checked if I passed an argument, and if I did, that’s the port.

if __name__ == '__main__':
  if len(sys.argv) >= 2:
    port = sys.argv[1]
  else:
    port = 5000

Simple, but too simple. Now, since I have the node being able to mine as well, I want to be able to specify whether or not the node should mine or not mine.

Enter argparse. It’s very nice, and only takes a few line for me to define what args I’m looking for. We want to be able to say what port to run on defaulting to 5000, and also have the ability to say if we want to mine.

if __name__ == '__main__':
  #args!
  parser = argparse.ArgumentParser(description='JBC Node')
  parser.add_argument('--port', '-p', default='5000',
                 help='what port we will run the node on')
  parser.add_argument('--mine', '-m', dest='mine', action='store_true')
  args = parser.parse_args()

  #only mine if we want to
  if args.mine:
    mine.sched = sched #to override the BlockingScheduler in the
    #in this case, sched is the background sched
    sched.add_job(mine.mine_for_block, kwargs={'rounds':STANDARD_ROUNDS, 'start_nonce':0}, id='mining') #add the block again
    sched.add_listener(mine.mine_for_block_listener, apscheduler.events.EVENT_JOB_EXECUTED)#, args=sched)
  
  sched.start() #want this to start so we can validate on the schedule and not rely on Flask

  #now we know what port to use
  node.run(host='127.0.0.1', port=args.port)

For examples,python node.py -m will run the node on port 5000 and mine as well. python node.py -p 5001 will run the node on port 5001 and not mine. python --port=5002 -m will be on port 5002 and mine as well. You get the idea.

Alright, back to the mining show.

Listening Nodes

In order for a node to be broadcasted, we need to have a Flask endpoint that accepts block dicts. We don’t want the node to run the job of validation — we want this to be thrown into the schedule.

Initially here, the validation will check to see if it’s a valid block, and if it is, save it.

#node.py
@node.route('/mined', methods=['POST'])
def mined():
  possible_block_dict = request.get_json()
  sched.add_job(mine.validate_possible_block, args=[possible_block_dict], id='validate_possible_block') #add the block again
  return jsonify(received=True)

#mine.py

def validate_possible_block(possible_block_dict):
  possible_block = Block(possible_block_dict)
  if possible_block.is_valid():
    possible_block.self_save()
    return True
return False

To test, we run a node on one port to do the mining, and another node that is sitting there waiting for broadcasts. We watch the chaindir of the non-mining node and we’ll see the nodes mined by its peer showing up.

Side, the way we’re communicating using Flask and http is pretty simple compared to the different blockchains. Checkout how Ethereum lets nodes talk to each other. Wanted to mention this so people reading don’t assume all blockchains spit simple json information back and forth over http. There are better ways.

Competing Mining Nodes

Time for the finale! The whole point of this post is to have multiple nodes that are both mining, the first to get a valid block broadcasts to the other nodes, where they all receive the block, and all start mining for the next.

Remember right above when I said we wanted validate_possible_block to be a job? This is because we want to check validation when there isn’t a mining job running. If the block is valid, we want to remove the mining job in the schedule queue. Since the listener inserted the next mining job with increased nonce range in the queue, we remove that one, and then insert a mining job for block after the new valid block and go from there.

def validate_possible_block(possible_block_dict):
  possible_block = Block(possible_block_dict)
  if possible_block.is_valid():
    #this means someone else won
    possible_block.self_save()
    #we want to kill and restart the mining block so it knows it lost
    try:
      sched.remove_job('mining') 
      print "removed running mine job in validating possible block"
    except apscheduler.jobstores.base.JobLookupError:
      print "mining job didn't exist when validating possible block"
    print "readding mine for block validating_possible_block"
    sched.add_job(mine_for_block, kwargs={'rounds':STANDARD_ROUNDS, 'start_nonce':0}, id='mining') #add the block again

    return True
  return False

When running this and looking at the nodes, it’s not simple to see which node won the mining of that block. This post isn’t about the data in a block, but we need a simple way to tell the world which node won.

In node.py we’re going to write to a data.txt file that has the data for a block mined by a node on this port. Then in utils.py, where we create an unmined, invalid block, we read the data file and input that into the header.

#node.py

if __name__ == '__main__':
  .....
  filename = '%sdata.txt' % (CHAINDATA_DIR)
  with open(filename, 'w') as data_file:
    data_file.write("Mined by node on port %s" % args.port)
  .....

#utils.py

def create_new_block_from_prev(prev_block=None):
  if not prev_block:
  #index zero and arbitrary previous hash
    index = 0
    prev_hash = ''
  else:
    index = int(prev_block.index) + 1
    prev_hash = prev_block.hash

  filename = '%sdata.txt' % (CHAINDATA_DIR)
  with open(filename, 'r') as data_file:
    data = data_file.read()
  nonce = 0
  timestamp = datetime.datetime.utcnow().strftime('%Y%m%d%H%M%S%f')
  block_info_dict = dict_from_block_attributes(index=index, timestamp=timestamp, data=data, prev_hash=prev_hash, nonce=nonce)
  print block_info_dict
  new_block = block.Block(block_info_dict)
  return new_block

Side note, having a file with the data information seems like a good thing to have in the future when we’re trying to save actual data on the blockchain. Hmm….

When you start up the two nodes, $python node.py -m and in the 5001 hardlinked directory, $python node.py -p 5001 -m we can watch all the logging, wait for nodes to be written to each chaindata directory and see that new nodes are showing up at the same times and different nodes are winning.

When running this with four nodes competing, it’s awesome to sit there watching the chaindir and seeing which node mined the next block that shows up. Watching code you wrote that works is one of the best feelings in programming. Remember that.

Here are a couple screenshots when I ran the mining with nodes on ports 5000 and 5001. Port 5000 won 9 blocks, port 5001 won 7.

Port 5000

Port 5001

Conclusion

Look at this, we have the ability for a distributed set of nodes to be competing, trying to mine the blocks, getting credit for the mining, and being able to sync with other blocks when they want to run and mine the blockchain themselves. That’s pretty far down the line. But there are still more to go.

The only data in each block is which node won the mining. The idea of a blockchain is distributed data storage (the term ‘ledger’ if you’ve heard that). We want more data. The blockchain only continues for with the first node that is broadcasted. What if two nodes find a valid node at the same time? How do we determine which block to go with? Along those lines, how do we make sure that the beginning blocks of the chain are never changed?

I think the next big part of this series is dealing with the block header. Right now it’s slamming strings together and calculating the hash from that. If you look at Bitcoin and Ethereum blockchains, they have way more sophisticated headers so miners know exactly exactly which blocks are valid no matter where they’re calculated. Another issue is storing the difficulty required for the hash. Right now it’s in the config

That’ll be next.

If you’re still reading all the way down here, feel more than free to get in touch and ask questions or say hi. Like hearing from people. Twitter, contact, Github. Catch y’all next time.

1 thought on “How to Build Your Own Blockchain Part 3 — Writing Nodes that Mine and Talk

  1. This one and others post on the series of blockchains are awesome!! Keep going!
    Thanks you for your amazing job.

    Best regards, Nelson

    Like

Leave a comment