Hive Application Programming Interface (API)

1. What is Hive

Hive™ is a distributed network-based file storage system that utilizes HTTP in order to store and retrieve user files. Files are uploading using PUT and downloaded using GET requests. Files are deleted using DELETE requests. One may get a file listing, change the name of files, publish files for public access, etc. all using standard HTTP requests.

2. Audience

This document is an application programmming interface (API) to the Hive system. It is intended for software developers who wish to write clients that store, retrieve, delete, or alter files on Hive.

3. Authorizing Requests

Every request must be authorized by Hive. The header used is a standard HTTP header ("Authorization"), but the format of the header is proprietary. There were security reasons for why we felt it was necessary to introduce a non-standard authentication scheme. Still, the mechanism does use a standard HTTP header, which should not present a challenge in any development environment.

The Authorization header must always be populated with the following information:

Authorization: hive;<userid>;<server nonce>;<client nonce>;<Base64(SHA1(server nonce || password || client nonce))>

The "server nonce" field is a Base64-encoded nonce that is provided by the Nonce Service. The "client nonce" is a nonce generated locally by the client and is intended to help thwart a man-in-the-middle attack wherein the attacker tries to guess the client's password by providing a fixed, known string as a server nonce. The server nonce, user's password, and client nonce are concatenated and hashed with SHA-1, then Base64 encoded. The Hive server will verify that, indeed, it provided the server nonce value, as failure to do so would allow for a replay attack.

If a request is received for which authorization is required, the service will return an error similar to the following:

HTTP/1.1 401 Unauthorized
WWW-Authenticate: hive
Content-Length: 0
Content-Type: application/xml

4. Requesting a Nonce

In order to get a nonce, a client must issue a request to the Nonce Service (https://hive.packetizer.com/nonce) with the following request:

GET /nonce HTTP/1.1
Accept: application/xml

The server will return a response with a nonce as follows:

HTTP/1.1 200 OK
Content-Length: 95
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<Nonce>MToxMTE4OToyOTg2ODEyMTQ4OjEyMTg0MTQwODA=</Nonce>

The nonce is Base64-encoded, but should just be treated as an opaque string of characters by clients. The format of the nonce is subject to change.

5. Locating a Hive Server

When a client uploads a file, it should determine which Hive server can accept the file using the Hive Locator Service. The reason for taking this step instead of simply using 307 to redirect the client to the actual Hive is to avoid transmitting data that will simply be discarded, which would be the case if we transmit a PUT to a server that will not actually store the file.

A client may also use this method to determine the primary server location of a currently active (stored) file, but it is not required. However, following these procedures should result in slightly more efficient queries, so we do recommend using the Hive Locator Service for GET requests.

This service should not be used for other requests, including DELETE, file listings, renaming files, etc.

In order determine the Hive server for a given file, the client issues a request to the Hive Locator Service (https://hive.packetizer.com/hive?file={file}&upload={upload}) to ask it for the hive server associated with a given file. The request should include an "upload" parameter set to "true" if the request is to upload a file, which is particularly important when a file with the same name already exists. Otherwise, the service would return the location of the current file, not the Hive server that could potentially handle a new upload request.

The request to determine the target Hive server looks like this:

GET /hive?file={file}&upload=true HTTP/1.1
Accept: application/xml
Authorization: <authorization string>

The corresponding response will look like similar to this:

HTTP/1.1 200 OK
Content-Length: 116
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<Hive>
    <HiveURL>https://hive23.hive.packetizer.com/storage?file={file}</HiveURL>
</Hive>

The HiveURL element may contain the URL template for any number of servers and the client must not assume that any fixed URL will be returned.

A request to the Hive Locator Service can fail for a number of reasons, including authorization failure (401), file not found (404), unacceptable filename (406), unacceptable method (405), and internal errors (500).

Note the form of the URL returned from the server. This is a URL template that must be properly filled by the client with the required information. A complete list of Service URLs and their template formats are given in Summary of Service URLs section

6. Acceptable filenames

Filenames may be any valid UTF-8 string. While operating systems like Windows will not accept certain characters (specifically, \/:*?"<>|), Hive does not care what a user names a file, as long as the name is a properly formatted UTF-8 character string.

Filenames are UTF-8 encoded, then Base64-encoded when passed to the Hive service routines as parameters to URLs. Since Base64 uses the / character and that is also the path separator used on HTTP servers, / characters must be converted to @ symbols.

When the file is received, the name will be pre-pended with a / character if one does not already exist to help establish a hierarchy for the user.

When filenames are transmitted within various XML messages, they are base64 encoded, but the / character is not replaced with an @ symbol.

7. Uploading Files

Files are uploaded to Hive using a simple PUT request that looks similar to the following:

PUT /storage?file={file} HTTP/1.1
Accept: application/xml
Authorization: <authorization string>
Content-Range: bytes 0-2/3
Content-Type: application/octet-stream
Content-Length: 3

abc

Upon successful receipt of the file, the Hive server will return a 201. Of course, a number of issues can arise, including authorization errors (401), invalid method (405), filename issues (406), extremely slow file upload (410), content range errors (416), file size that is too big (413), and internal server errors (500). The 410 error is of particular interest. Presently the Hive service will wait 24 hours to receive a file, but this time may be adjusted in the future. In practice, a single file chunk should not require more than a couple of hours to transmit and if a large file is to be transferred, it should be transferred in smaller chunks so as to keep the Hive system aware that the file upload is progressing. Due to the distributed nature of the system, the database containing the user's file information may not be aware of a separate, long-running file transfer process. So, if a file transfer is too long, when it completes it may discover that the allocated file identifier has been purged from the system. Thus, the 410 error.

Since it is not always possible to upload a complete file in one request and is not desirable for large files that take a long time to transfer, Hive allows files to be uploaded in chunks as indicated by the Content-Range values. In the above example, a complete file is uploaded and the Content-Range header is actually optional in this case. However, if only part of the file is uploaded, then the Content-Range values are used. When the next part of the file is received, but the whole file has not been received, a 202 response is returned. Only when the last file chunk is received is a 201 response returned.

If files are uploaded in chunks, they must be uploaded serially and failure to do so will result in terminating the file upload request. Any temporary files are removed from the system upon any error.

When uploading or downloading in chunks, the minimum file chunk size is 8192 octets. However, it is strongly recommended to make the chunk size much larger. A chunk size of 20MB or larger is recommended.

If a file with a given name already exists in the system and is being updated, the file will replace the old file only after it has been completely uploaded. In this way, if a file upload fails, the old file remains intact.

8. Downloading Files

Downloading a file works similarly to uploading files. A typical request looks like this:

GET /storage?file={file} HTTP/1.1
Accept: application/octet-stream
Authorization: <authorization string>
Range: bytes=1-2097152
Content-Type: application/octet-stream

Assuming the file was found and can be served by the responding Hive server, the server will return the specified file byte range to the client. In this way, the client may request only certain chunks of the file until it has received the entire file.

If the requested byte range goes beyond the length of the file, the Hive server will not treat this as an error as long as the lowest octet is within range of the file. The reason is that when a client initially makes a request, it is impossible to know the size of the file at the outset, unless some additional exchange is performed to obtain this information. No benefit is seen in adding this additional complexity, though a client developer could certainly request specific file information before sending the request to download the actual file. The server will respond with the file and the headers will contain the total number of octets in the file. Subsequent GET requests should utilize that information to avoid requesting a byte that is outside the range of the file.

If the file is successfully returned in its entirety or in part, Hive will return a 200 response code. If it is determined that the file is located on a different server, then the system will return a 307 response. If the requested file was not found, a 404 is returned. If an unacceptable method is received, a 405 is returned. If the requested byte range is entirely outside the file range, a 416 error code is returned. If there is some unusual system error, a 500 error is returned.

9. Deleting Files

A client deletes a file from the server by sending a DELETE request to Hive with the specified filename. The operation is very similar to a GET request:

DELETE /storage?file={file} HTTP/1.1
Accept: application/xml
Authorization: <authorization string>
Content-Type: application/octet-stream
Content-Length: 0

If the DELETE request is successful, a 200 will be returned to the client. If the file could not be found, a 404 will be returned. Other error codes, such as 400, 401, etc. are consistently used in all API responses and may be returned in this or any response.

A DELETE request may be issued to the Hive server identified by the Hive Locator service or may be issued directly to https://hive.packetizer.com/storage.

10. Requesting a File Listing

A client may request a list of files by sending a GET request to the File List Service (https://hive.packetizer.com/filelist). The request must contain the user's credentials, just as with any other server request. The request may contain the name of a single file to return. If the filename parameter is absent, then a complete file listing will be returned. Requesting a list for a single file looks like this:

GET /filelist?file={file} HTTP/1.1
Authorization: <authorization string>
Accept: application/xml

One reason for requesting information for a single file is to verify that the SHA-1 signature on the server matches the SHA-1 signature of the local file that was just uploaded or downloaded. A mismatch would indicate some kind of I/O error.

The File List Service will respond to the client with a list of zero or more files that looks like this:

HTTP/1.1 200 OK
Content-Length: nnn
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<FileList>
    <File>
        <Filename>L2dvb2Jlcg==</Filename>
        <PublicName>/goober</PublicName>
        <FileSize>10240</FileSize>
        <SHA1Hash>9eafd2bec90dc04a64ac77f92b3c29153c18fb60</SHA1Hash>
        <FileDate>1217054424</FileDate>
    </File>
    ...
    <File>
        ...
    </File>
</FileList>

Note that there may be multiple File elements within the FileList element. The server will not perform any filtering or sorting on the list and the client should assume there is no order.

11. Making Files Publicly Accessible

To make files publicly accessible, a client sends a request to the File Attribute Service. The request takes the following form:

PUT /fileattr?file={file} HTTP/1.1
Authorization: <authorization string>
Accept: application/xml
Content-Length: nnn

<?xml version="1.0" encoding="UTF-8"?>
<FileAttr>
    <PublicName>willy.txt</PublicName>
</FileAttr>

The filename that we wish to alter is provided as a parameter in the URL and is Base64 encoded, with '/' characters replaced with '@' characters. The body of the request contains an XML message that contains an element called PublicName. The public name must contain only a subset of ASCII characters from the set [/()+,-.0-9@A-Z_a-z~] (brackets not included and the first - character does not indicate a range, but simply the - character itself). Like the private name, the public name may use the path separator character (/) to provide structure. Upon successfully updating the file attributes, a response will be returned similar to the following:

HTTP/1.1 200 OK
Content-Length: nnn
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<FileAttr>
    <Filename>L2dvb2Jlcg==</Filename>
    <PublicName>/goober</PublicName>
</FileAttr>

If the requested file was not found, then a 404 error will be returned. Other appropriate error codes may also be returned.

If a client wishes to remove the public filename for a file, the same message should be submitted, but with an empty PublicName element. Once a public filename is provided, anybody in the world can access the file via the Download Service (https://hive.packetizer.com/users/{user}/{file}).

12. Renaming Files

To rename files, a client sends a request to the File Attribute Service. The request takes the following form:

PUT /fileattr?file={file} HTTP/1.1
Authorization: <authorization string>
Accept: application/xml
Content-Length: nnn

<?xml version="1.0" encoding="UTF-8"?>
<FileAttr>
    <Filename>L2dvb2Jlcg==</Filename>
</FileAttr>

The filename that we wish to alter is provided as a parameter in the URL and is Base64 encoded, with '/' characters replaced with '@' characters. The body of the request contains an XML message that contains the new filename. The filename in the XML message must also be base64 encoded, but as with all other XML bodies, '/' characters are not replaced with '@'. Upon successfully updating the file attributes, a response will be returned similar to the following:

HTTP/1.1 200 OK
Content-Length: nnn
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<FileAttr>
    <Filename>L2dvb2Jlcg==</Filename>
    <PublicName>/goober</PublicName>
</FileAttr>

If the requested file was not found, then a 404 error will be returned. Other appropriate error codes may also be returned.

13. Summary of Service URLs

Service URL
Hive Locator https://hive.packetizer.com/hive?file={file}&upload={upload}
    (where "{upload}" is set to eitder "true" or "false")
Storage Service https://hive.packetizer.com/storage?file={file}
File Attribute Service https://hive.packetizer.com/fileattr?file={file}
Nonce Service https://hive.packetizer.com/nonce
File List Service https://hive.packetizer.com/filelist?file={file}
    (where "{file}" is left blank to get a list of all files)
Download Service https://hive.packetizer.com/users/{user}/{file}

While the Service URLs are intended to be long-lived, it may become necessary to relocate one service or another. For this reason, clients must be able to receive and properly handle 3xx responses to requests sent to any Hive service URL.

14. Legal Statements

This document is Copyright © 2009 by Packetizer, Inc. No portion of this document may be replicated for any reason without the express written permission of Packetizer, Inc.

The Hive API is offered as-is and without any warranty. While Packetizer will strive to ensure that the Hive service is accessible as documented herein, we cannot and do not guarantee that this document is error-free.

Packetizer reserves the right to enhance or change the API at any time. We will make every effort to ensure that any changes to the API are wholly backward compatible with any previously published API. Even so, there may be reasons that necessitate a change that results in "breaking" the currently defined API.

The user of the API and the Hive service agrees to indemnify and hold harmless Packetizer and its partners, successors, or assigns from any and all liability arising from the use of the API and associated service.